Realtime AI News
Qwen Releases Qwen3-ASR-0.6B Automatic Speech Recognition Model
Qwen team launched Qwen3-ASR-0.6B-hf on Hugging Face, an automatic speech recognition model supporting Chinese, English, and Cantonese.
Qwen team published Qwen3-ASR-0.6B-hf on Hugging Face on June 26, an automatic speech recognition (ASR) model using the automatic-speech-recognition pipeline, built with the transformers library and safetensors format. It also carries a text-generation tag, indicating a hybrid architecture combining speech recognition with generative capabilities.
The model supports Chinese (zh), English (en), and Cantonese (yue), with tags covering transformers, safetensors, qwen3_asr, text-generation, and automatic-speech-recognition.
Unlike the Qwen3-ForcedAligner-0.6B released the same day, the ASR version is designed for end-to-end speech-to-text transcription scenarios, making it better suited for applications like voice assistants, meeting transcription, and real-time captioning.
Source: Qwen Hugging Face official model registry.
Why it matters
The Qwen3-ASR-0.6B release rounds out Qwen's speech model lineup, offering developers a complete voice pipeline from ASR to forced alignment.