Ask, Pose, Unite: Scaling Data Acquisition for Close Interactions with Vision Language Models

Open in new window