郭震 AI公众号:郭震AI

Realtime AI News

A-Evolve-Training: Autonomous Post-Training of a 30B Model Without Human Intervention

Researchers propose A-Evolve-Training, an autonomous system that post-trains a 30B Nemotron model through four rounds over weeks with no human in the loop, achieving a score of 0.86 versus the top human submission's 0.87.

Published/Reads 0

Researchers from NVIDIA and affiliated institutions have published a paper on arXiv titled "A-Evolve-Training: Autonomous Post-Training of a 30B Model," presenting a system that runs the entire post-training loop without any human intervention. Conventionally, post-training a frontier model requires weeks of human work: proposing data and recipe changes, launching runs, reading evaluations, and deciding what to keep. A-Evolve-Training automates this closed loop end-to-end.

The system was tested on a 30B-parameter Nemotron model across four iterative rounds spanning multiple weeks, with no human involvement. The autonomously produced model reached a held-out score of 0.86 on the public NVIDIA Nemotron-Reasoning Challenge, compared to the top human submission's 0.87 — a gap of just 0.01 points, demonstrating the significant potential of self-supervised post-training.

The research (arXiv paper 2606.20657) signals a shift toward dramatically reducing the human labor cost of continuous model improvement. If such autonomous methods scale, model iteration could shift from human-in-the-loop weekly cycles to continuous automatic optimization, potentially redefining how AI models are maintained and upgraded over time.

Why it matters

A-Evolve-Training demonstrates the possibility of AI autonomously improving AI, which could fundamentally reshape the cost structure and iteration speed of model post-training.

Autonomous AIPost-TrainingNVIDIANemotron

Sources