English translation
DeepSeek Mureka-O1 Music Generation Hands-On Test
This article is actually about Mureka-O1’s music generation—so I’ve corrected the outdated title accordingly. Evaluating AI-generated music shouldn’t rely solely on first impressions (“Is it stunning?”). Instead, we must assess structural completeness, lyrical relevance to the prompt, consistency of timbre reference, and—most importantly—whether the final output is practically usable within your own creative context.
When testing music models, I consistently generate multiple versions using the same theme, then compare how naturally each handles the intro, verse, chorus, and outro. Even if only one version meets publishing standards, further due diligence is essential: verifying licensing terms, proper attribution, and flexibility for post-generation editing. Never lower editorial standards just because a track was generated in under a minute.
Hello, I’m Guo Zhen.
DeepSeek’s R1 model—released two months ago—quickly went viral across the AI community thanks to its exceptional reasoning capabilities and remarkably efficient training (requiring far fewer GPU hours). More recently, DeepSeek unveiled Manus, an intelligent agent demonstrating broad applicability and highly autonomous operation—impressing many observers.
Last week, a tweet introduced DeepSeek’s multimodal large language model Janus-Pro:7B. In the comments, several readers asked me: “What’s the current state of text-to-music generation? Which models or tools are practical and reliable—and how can I integrate AI music into my content creation workflow?”
Today’s article provides a comprehensive overview of the latest advances in AI music generation—including top-performing large models and practical tools. If this topic interests you, read on.
1 AI Music Generation
To prepare this article, I surveyed numerous mainstream models. First, here’s a digital-human music MV currently trending online:
When evaluating music-generation tools, prioritize five criteria:
- Melodic completeness
- Lyrical fidelity to the prompt
- Editability (e.g., stem separation, tempo adjustment)
- Export quality (bitrate, format support)
- Copyright boundaries (commercial use rights, attribution requirements)
A heartfelt pop ballad sung by a female vocalist expressing deep longing for a lost love, Guo Zhen AI, 2 minutes
The full track runs ~2 minutes. Just from listening to the audio quality alone—can you tell whether it was composed by a human or generated entirely by AI? Before publishing, I invited three friends to blind-test it. None believed it was fully AI-generated. If you’re trained in music, could you distinguish it? Feel free to comment below!
AI music generation has accelerated dramatically alongside advances in large-language-model reasoning. Among today’s most popular music-generation models is Mureka, whose latest release—the O1 model—produced the track above in a single generation, with zero manual editing.
I’ve deeply studied Mureka-O1’s technical documentation, architecture, and usage patterns—and conducted extensive hands-on testing. Below, I’ll walk you through exactly how it produces such convincingly realistic MP3s.
Mureka-O1 was developed by Kunlun Tech. I recall they open-sourced an AI short-drama model in February—and it ranked in Hugging Face’s Top 10 for two consecutive weeks. As early as late 2023, I’d learned they were actively researching AI music generation. After upgrading Mureka to version V6, they launched the O1 model.
2 Technical Challenges
Generating a high-quality MP3 song with AI is vastly more complex than generating text or images. To compose like a human musician, AI must simultaneously handle:
- Lyrics (semantic coherence + poetic flow)
- Melody (pitch, rhythm, phrasing)
- Structural arc (intro → verse → chorus → bridge → outro)
- Instrumentation (which instruments play when, how they interact)
- Emotional progression (dynamic shifts, tension/release)
Integrating all these elements organically remains extremely difficult.
Before reading “Mureka-O1 Music Generation Hands-On Test”, align yourself with the questions, keywords, actions, and acceptance criteria shown in the diagram above. This will make the main text much easier to digest. After finishing, try explaining the entire process again—using your own project as the example.
Historically, AI-generated songs sounded “song-like” but lacked structural logic, clear emotional arcs, and coherent instrumentation.
Over the past two days, I reviewed Mureka-O1’s technical paper—which specifically addresses these very shortcomings: unclear structure and weak hierarchical organization in AI music generation.
The paper introduces “Chain-of-Musical-Thought” (MusiCoT)—a novel paradigm where the model first “thinks through” the full musical structure before generating any audio.
For example: “Start with piano; amplify drums in the chorus; sustain rhythm with bass; fade out at the end.” This is analogous to drafting a detailed arrangement blueprint first—then filling in melodic details bar-by-bar. The overall architectural flow looks like this:
Built upon CLAP—a contrastive speech-text pretraining model—MusiCoT requires no manual annotation to generalize across diverse musical genres. It also supports reference audio input, enabling high-fidelity, interpretable, and controllable music generation.
By solving the core problems of structural incoherence and chaotic instrumentation, MusiCoT marks the first time AI “plans the whole song before writing”—dramatically improving musical continuity, logical progression, and user control.
Another prominent text-to-music model is Suno, especially strong for English-language songs. I tested its Chinese-song generation—but found Mureka-O1 significantly more natural and expressive. Why? Because MusiCoT’s architecture inherently supports multilingual prompts and diverse stylistic adaptation. Moreover—after thorough research—I confirmed Mureka-O1 is currently the only model offering custom voice-timbre upload, letting users train on their preferred singer’s vocal characteristics.
3 Evaluating Mureka-O1
I was particularly intrigued by Mureka-O1’s ability to replicate a specific singer’s timbre while generating original melodies. My evaluation approach: pick a favorite artist’s song, then prompt Mureka-O1 to generate a new track matching its melodic contour and vocal tone. Below is the step-by-step workflow—feel free to replicate it with your own favorite artist.
Step 1: Open your browser and navigate to: https://mureka.ai Click the Create button on the left sidebar → then select Song:
Step 2: Click the Reference + button centered on the page:
In the pop-up window, click Upload audio, then select a song by your preferred artist:
For example, I uploaded Xu Wei’s “That Year”. The interface even supports trimming—but I used the full track unmodified:
Step 3: Enter the following prompt into the text panel:
A gentle, lyrical rock song performed by a male vocalist, arranged with acoustic and electric guitars. Emotionally sincere, with a flowing, memorable melody. Lyrics reflect nostalgic memories of youth and insights gained through personal growth.
Finally, click Create. Within seconds, two candidate tracks appear:
I’ve uploaded the resulting MP3s—give them a listen! The output sounds convincingly human-sung, and the melody strongly echoes “That Year”:
A gentle, lyrical rock song performed by a male vocalist, arranged with acoustic and electric guitars, Guo Zhen AI, 2 minutes
This impressive fidelity stems primarily from MusiCoT—the first music-generation framework to embed a “chain-of-thought” mechanism. By first producing an analyzable structural plan, then rendering audio, it greatly enhances musical coherence, arrangement precision, and user controllability.
Traditionally, crafting a professional song required expert composition, studio recording, mixing, and mastering—taking days or weeks. With AI, Mureka-O1 delivers broadcast-ready, human-level tracks in roughly one minute:
If you haven’t fully internalized “Mureka-O1 Music Generation Hands-On Test”, revisit the four key actions outlined on this card.
When reviewing “Mureka-O1 Music Generation Hands-On Test”, avoid jumping straight into large-scale projects. Start with one simple test case to verify whether the core workflow is clear and reproducible.
Summary
Today’s music LLMs—like Mureka-O1—can generate structurally sound, emotionally nuanced, melodically engaging full-length songs in ~60 seconds, using only a text prompt—or even a reference audio clip (e.g., your favorite singer’s voice or a target melody).
The breakthrough enabling this is MusiCoT—the “Musical Chain-of-Thought” mechanism. It empowers AI to think like a professional composer: first designing the complete musical architecture (key, form, instrumentation, dynamics), then executing precise melodic and harmonic details. The result? Music that feels organic, intentional, and authentically human-crafted.
Whether you’re a content creator, short-video producer, music enthusiast—or simply seeking fresh inspiration—now is the ideal moment to explore AI music generation.
One final note: Mureka offers a robust API, enabling developers, musicians, and game studios to seamlessly integrate its music-generation capabilities into custom applications or platforms.
This article totals 2,989 words and 10 figures. If you found it valuable, please consider subscribing—and giving it a triple tap: Like, Share, and “Read Later” (Watch). Bonus points if you add a ⭐️ star! Thank you for reading—and see you in the next one.
Continue