Realtime AI News
WinDOM: Self-Family Distillation for Small-Model GUI Grounding
New research proposes WinDOM, combining self-family distillation with reinforcement learning to achieve breakthrough GUI grounding performance in ~2B parameter small models.
A new paper on arXiv presents research on small GUI-grounding agents. The paper notes that small (~2B) GUI-grounding agents are attractive for on-device deployment, accessibility tooling, and low-cost iteration, but face two key challenges: obtaining bounding-box training data without expensive human annotation, and combining supervised fine-tuning with reinforcement learning.
WinDOM addresses both through self-family distillation, with the explicit goal of pushing small-model performance rather than scaling up. The study encompasses 54,425 rounds of training.
The paper appears under arXiv cs.AI, paper ID 2606.25964. As on-device AI deployment grows, achieving higher performance on GUI understanding tasks with small models has significant practical value.
Why it matters
WinDOM provides a low-cost, efficient training approach for small on-device AI agents' GUI interaction capabilities, advancing deployment in mobile devices and accessibility tools.