English translation
Integrate DeepSeek with Mureka to Generate High-Quality Music Instantly
As the barrier to AI music generation lowers, attention to usage boundaries becomes even more critical. A pleasant-sounding preview is only the first step—exportability, commercial licensing eligibility, lyric appropriateness, and potential similarity to existing works all require individual verification. The easier content becomes to generate, the more essential it is to retain human judgment.
During testing, fix a single theme and generate three versions in distinct musical styles. Record which version best suits short videos, course intros, or background music. Don’t rely solely on first impressions—rhythm suitability and copyright compliance matter far more in real-world applications.
DeepSeek’s R1 model, released two months ago, rapidly gained widespread attention in the AI community thanks to its exceptional reasoning capabilities—and notably lower GPU-hour training costs. More recently, DeepSeek unveiled Manus, an intelligent agent demonstrating remarkable generality and high levels of full automation—impressing many observers.
Last week, a social media post introduced Janus-Pro:7B, DeepSeek’s multimodal large language model. Several readers commented asking about text-to-music generation: What’s the current state of this technology? Which models or tools are practical and effective? And how might they integrate AI-generated music into their self-media content creation?
This article provides a comprehensive overview of recent advances in AI music generation—including powerful large models and practical tools—for anyone interested in this rapidly evolving field.
1 AI Music Generation
To prepare this article, I surveyed multiple mainstream models in this domain. First, let’s watch a viral AI music video that’s been circulating widely today:
“Mureka” — AI Musician MV Singer: Mureka; Entirely AI-generated. Music generated by Mureka; Video production supported by SkyReels.
Total runtime: 1 minute 11 seconds. Just by listening—can you tell whether this track was composed by a human or generated entirely by AI? Before publishing, I invited three friends to listen blind—they struggled to believe it was fully AI-generated. Music professionals: Can you distinguish it? Feel free to share your thoughts in the comments.
AI music generation has accelerated dramatically alongside advances in large-model reasoning. Today, Kunlun Tech announced Mureka O1, the world’s first music reasoning large model. The above MV is their official debut release—and I made no modifications whatsoever.
I thoroughly studied Mureka O1’s technical documentation, architecture, and implementation—and conducted in-depth hands-on testing. Below, I’ll walk you through exactly how it achieves such exceptional audio quality.
Mureka O1 was developed by Kunlun Tech. I knew they open-sourced an AI short-drama model in February—and it ranked in Hugging Face’s Top 10 for two consecutive weeks. As early as late 2023, I’d learned they were researching AI music generation. After upgrading Mureka to Version 6, they launched the O1 model.
2 Technical Challenges
Generating a high-quality MP3 song using AI is significantly more complex than generating text or images—because AI must replicate the holistic creative process of human songwriting: lyrics, melody, structural arc (intro–verse–chorus–bridge–outro), instrumentation choices, emotional progression, dynamic shifts, and seamless integration of all these elements.
Historically, AI-generated songs sounded “song-like” but lacked coherent structure, clear emotional arcs, and consistent instrumental layering.
Over the past two days, I reviewed Mureka O1’s associated research paper—which specifically addresses these core weaknesses: unclear musical structure and disorganized instrumentation.

The paper introduces “Chain-of-Musical-Thought” (MusiCoT)—a novel framework where the model first reasons about the full musical structure before generating raw audio.
For example: “Start with piano; intensify drums in the chorus; anchor rhythm with bass; fade out gradually at the end.” This is analogous to drafting a detailed arrangement blueprint first—then filling in melodic details bar-by-bar. The overall architectural flow looks like this:

MusiCoT builds upon CLAP—a contrastive pre-trained model aligning speech and text—and thus requires no manual annotation to generalize across diverse musical genres. It also supports reference-based input, enabling high-fidelity, interpretable, and controllable music generation.
By solving the twin problems of structural incoherence and chaotic instrumentation, MusiCoT marks the first time AI plans the entire composition before generating sound, dramatically improving musical continuity, logical flow, and user control.
Another prominent text-to-music model is Suno—especially strong for English-language songs. I tested its Chinese-song generation capability, but found Mureka O1’s output markedly superior. The key differentiator is MusiCoT’s architecture, which inherently supports multilingual generation and broad stylistic flexibility. Moreover—after extensive research—I confirmed Mureka O1 is currently the only model offering custom voice cloning: users can upload vocal samples to train personalized singer voices.
3 Hands-On Evaluation of Mureka O1
I was especially intrigued by Mureka O1’s custom voice cloning feature—its ability to generate melodies sung in a style closely resembling a chosen artist’s vocal timbre. My evaluation approach was simple: select a favorite song by a well-known artist, then prompt Mureka O1 to generate a new song with a similar melodic contour and vocal character. Below is a step-by-step summary—you’re welcome to follow along and create your own artist-inspired tracks.
Step 1: Open your browser and navigate to: https://mureka.ai Click the Create button on the left sidebar → then select Song:

Step 2: Click the Reference + button centered on the page:

In the pop-up window, click Upload audio, then select a song by your preferred artist:

For example, I uploaded Xu Wei’s classic “That Year”. The interface even allows trimming—but I used the full track without edits:

Step 3: Enter the following prompt into the text panel:

A gentle, lyrical rock song performed by a male vocalist, featuring blended acoustic and electric guitar arrangements. Emotionally sincere, with a flowing, memorable melody. Lyrics reflect nostalgic memories of youth and insights gained through personal growth.
Finally, click Create. Within seconds, two candidate tracks appear:

I’ve uploaded the resulting MP3 files—feel free to listen and assess the quality. The output sounds convincingly human-performed, and the melody bears strong resemblance to “That Year”:
“A Gentle, Lyrical Rock Song Performed by a Male Vocalist Featuring Blended Acoustic and Electric Guitar Arrangements” — Guo Zhen AI, 2 minutes
These impressive results stem primarily from MusiCoT—the first music-generation framework to embed a chain-of-thought mechanism. By first generating an analyzable structural plan—and only then synthesizing audio—the model achieves unprecedented coherence, precise instrumentation control, and expressive fidelity.
Traditionally, crafting a professional song required expert composition, arrangement, studio recording, and mixing—often taking several days. With AI, however, high-fidelity, human-level output like this now takes just over one minute using Mureka O1:

Summary
Today’s advanced music models—like Mureka O1—can produce complete, structurally sound, emotionally nuanced, and melodically captivating songs in roughly one minute, using only a text description—or even a reference audio clip (e.g., your favorite singer’s voice or a target melody).
The breakthrough enabling this leap is MusiCoT (“Music Chain-of-Thought”)—a paradigm-shifting mechanism that empowers AI to think like a professional musician: first conceptualizing the full composition, then meticulously realizing each section’s melody, harmony, and instrumentation. The result? Music that feels natural, intentional, and authentically human-crafted.
Whether you’re a self-media creator, short-video producer, music enthusiast, or simply seeking fresh inspiration for daily life—now is the perfect time to explore AI music generation.
One final note: Mureka offers a robust API, allowing developers, musicians, and game studios to seamlessly integrate its music-generation capabilities into their own products or platforms.
This article totals 2,989 words and includes 10 figures. If you found it valuable, please consider subscribing—and giving it a triple tap: Like, Share, and Bookmark. Bonus points if you add a ⭐️ star! Thank you for reading—and see you in the next one.
Continue