Realtime AI News
Qwen Releases Qwen3-ForcedAligner-0.6B Model for Speech Alignment
Qwen team officially published Qwen3-ForcedAligner-0.6B-hf on Hugging Face, a forced alignment model supporting Chinese, English, Cantonese, and French.
Qwen team released Qwen3-ForcedAligner-0.6B-hf on Hugging Face on June 26, a forced alignment model built on the token-classification pipeline using the transformers library and safetensors format. It is designed for speech recognition alignment tasks.
The model's tags include transformers, safetensors, qwen3_asr, and token-classification, with language support for Chinese (zh), English (en), Cantonese (yue), and French (fr), signaling Qwen's systematic expansion of its speech technology stack.
Forced alignment is a critical component in speech recognition and speech synthesis, precisely matching audio signals to text along a timeline. This release gives developers a specialized tool for tasks such as speech dataset annotation and speech synthesis training data preparation.
Source: Qwen Hugging Face official model registry. Downloads: 0. Likes: 1.
Why it matters
This model fills a gap in Qwen's speech ecosystem for forced alignment, offering practical value for developers working on speech data annotation and TTS pipelines.