Guozhen AIGlobal AI field notes and model intelligence

Realtime AI News

WinDOM: Self-Family Distillation for Small-Model GUI Grounding

New research proposes WinDOM, combining self-family distillation with reinforcement learning to achieve breakthrough GUI grounding performance in ~2B parameter small models.

Published/Reads 0

A new paper on arXiv presents research on small GUI-grounding agents. The paper notes that small (~2B) GUI-grounding agents are attractive for on-device deployment, accessibility tooling, and low-cost iteration, but face two key challenges: obtaining bounding-box training data without expensive human annotation, and combining supervised fine-tuning with reinforcement learning.

WinDOM addresses both through self-family distillation, with the explicit goal of pushing small-model performance rather than scaling up. The study encompasses 54,425 rounds of training.

The paper appears under arXiv cs.AI, paper ID 2606.25964. As on-device AI deployment grows, achieving higher performance on GUI understanding tasks with small models has significant practical value.

Why it matters

WinDOM provides a low-cost, efficient training approach for small on-device AI agents' GUI interaction capabilities, advancing deployment in mobile devices and accessibility tools.

GUI AgentsKnowledge DistillationSmall Models

Sources