Guozhen AIGlobal AI field notes and model intelligence

Realtime AI News

Qwen Releases Qwen3-ASR-0.6B Automatic Speech Recognition Model

Qwen team launched Qwen3-ASR-0.6B-hf on Hugging Face, an automatic speech recognition model supporting Chinese, English, and Cantonese.

Published/Reads 0

Qwen team published Qwen3-ASR-0.6B-hf on Hugging Face on June 26, an automatic speech recognition (ASR) model using the automatic-speech-recognition pipeline, built with the transformers library and safetensors format. It also carries a text-generation tag, indicating a hybrid architecture combining speech recognition with generative capabilities.

The model supports Chinese (zh), English (en), and Cantonese (yue), with tags covering transformers, safetensors, qwen3_asr, text-generation, and automatic-speech-recognition.

Unlike the Qwen3-ForcedAligner-0.6B released the same day, the ASR version is designed for end-to-end speech-to-text transcription scenarios, making it better suited for applications like voice assistants, meeting transcription, and real-time captioning.

Source: Qwen Hugging Face official model registry.

Why it matters

The Qwen3-ASR-0.6B release rounds out Qwen's speech model lineup, offering developers a complete voice pipeline from ASR to forced alignment.

QwenModel ReleaseASRHugging Face

Sources