Prime and Reach: Synthesising Body Motion for Gaze-Primed Object Reach
Abstract
A text-conditioned diffusion-based model is fine-tuned on gaze-primed human motion sequences to generate realistic reaching motions, achieving high reach and prime success rates.
Human motion generation is a challenging task that aims to create realistic motion imitating natural human behaviour. We focus on the well-studied behaviour of priming an object/___location for pick up or put down -- that is, the spotting of an object/___location from a distance, known as gaze priming, followed by the motion of approaching and reaching the target ___location. To that end, we curate, for the first time, 23.7K gaze-primed human motion sequences for reaching target object locations from five publicly available datasets, i.e., HD-EPIC, MoGaze, HOT3D, ADT, and GIMO. We pre-train a text-conditioned diffusion-based motion generation model, then fine-tune it conditioned on goal pose or ___location, on our curated sequences. Importantly, we evaluate the ability of the generated motion to imitate natural human movement through several metrics, including the 'Reach Success' and a newly introduced 'Prime Success' metric. On the largest dataset, HD-EPIC, our model achieves 60% prime success and 89% reach success when conditioned on the goal object ___location.
Get this paper in your agent:
hf papers read 2512.16456 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper