Realtime AI News
3D HAMSTER: Bridging Robot VLA Planning and Control with 3D Trajectory Guidance
A new paper introduces 3D HAMSTER, a framework that elevates trajectory guidance from 2D to 3D space in hierarchical Vision-Language-Action models for robot manipulation. This approach enables low-level policies operating in 3D metric space to receive richer spatial information, improving generalization to unseen environments.
A new paper posted on arXiv introduces 3D HAMSTER, a framework designed to bridge the gap between high-level planning and low-level control in hierarchical Vision-Language-Action (VLA) models for robot manipulation.
Current state-of-the-art approaches in this paradigm use a Vision-Language Model (VLM) to predict 2D end-effector trajectories as explicit guidance for a downstream policy. However, low-level control policies typically operate in 3D metric space on point clouds, and feeding them 2D guidance that lacks depth creates a dimensionality mismatch that degrades performance.

3D HAMSTER's core innovation is elevating trajectory guidance from 2D to 3D space. By providing depth-aware trajectories, the semantic gap between high-level visual planning and low-level physical control is reduced, enabling stronger generalization to objects and environments not seen during training.
The research sits within the broader hierarchical VLA paradigm, which decouples high-level task planning from low-level motor control to improve robot manipulation generalization. 3D HAMSTER contributes a critical dimensional alignment improvement to this decoupled architecture.
This work has practical significance for real-world robot deployment, where robots must handle novel objects and scenarios never encountered during training. The 2D-to-3D elevation could reduce failures caused by missing depth information in the planning-to-control pipeline.
What to watch next: whether the framework scales to more complex multi-step manipulation tasks and whether it gets adopted into mainstream robot learning platforms or open-source frameworks.
Why it matters
This research provides a critical dimensional alignment solution for hierarchical VLA models, with the potential to meaningfully improve robot manipulation generalization through the introduction of 3D spatial information.