receptacle
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.97)
- Information Technology > Artificial Intelligence > Natural Language (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- North America > Canada > Ontario > Kingston (0.04)
- (2 more...)
- Overview (0.67)
- Research Report > New Finding (0.67)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- (3 more...)
- Education (0.67)
- Leisure & Entertainment > Games (0.45)
- North America > United States > California (0.04)
- North America > United States > Michigan (0.04)
- Europe > Sweden > Skåne County > Malmö (0.04)
SupplementaryMaterialfor HandMeThat: Human-RobotCommunication inPhysicalandSocialEnvironments
In Section B, we summarize the statistics of the dataset. A.1 ObjectSpace Recall that HandMeThat uses an object-centric representation for states. Object hierarchy.HandMeThat classifies all categories into 5classes: location, receptacle, food, tool,andthing. Each class (except for"location") iscomposed ofmultiple subclasses, and each subclass contains several object categories. Intotal, there are155 object categories.
Push Smarter, Not Harder: Hierarchical RL-Diffusion Policy for Efficient Nonprehensile Manipulation
Caro, Steven, Smith, Stephen L.
Nonprehensile manipulation, such as pushing objects across cluttered environments, presents a challenging control problem due to complex contact dynamics and long-horizon planning requirements. In this work, we propose HeRD, a hierarchical reinforcement learning-diffusion policy that decomposes pushing tasks into two levels: high-level goal selection and low-level trajectory generation. We employ a high-level reinforcement learning (RL) agent to select intermediate spatial goals, and a low-level goal-conditioned diffusion model to generate feasible, efficient trajectories to reach them. This architecture combines the long-term reward maximizing behaviour of RL with the generative capabilities of diffusion models. We evaluate our method in a 2D simulation environment and show that it outperforms the state-of-the-art baseline in success rate, path efficiency, and generalization across multiple environment configurations. Our results suggest that hierarchical control with generative low-level planning is a promising direction for scalable, goal-directed nonprehensile manipulation. Code, documentation, and trained models are available: https://github.com/carosteven/HeRD.
- North America > United States (0.04)
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
BINDER: Instantly Adaptive Mobile Manipulation with Open-Vocabulary Commands
Cho, Seongwon, Ahn, Daechul, Shin, Donghyun, Choi, Hyeonbeom, Kim, San, Choi, Jonghyun
Open-vocabulary mobile manipulation (OVMM) requires robots to follow language instructions, navigate, and manipulate while updating their world representation under dynamic environmental changes. However, most prior approaches update their world representation only at discrete update points such as navigation targets, waypoints, or the end of an action step, leaving robots blind between updates and causing cascading failures: overlooked objects, late error detection, and delayed replanning. To address this limitation, we propose BINDER (Bridging INstant and DEliberative Reasoning), a dual process framework that decouples strategic planning from continuous environment monitoring. Specifically, BINDER integrates a Deliberative Response Module (DRM, a multimodal LLM for task planning) with an Instant Response Module (IRM, a VideoLLM for continuous monitoring). The two modules play complementary roles: the DRM performs strategic planning with structured 3D scene updates and guides what the IRM attends to, while the IRM analyzes video streams to update memory, correct ongoing actions, and trigger replanning when necessary. Through this bidirectional coordination, the modules address the trade off between maintaining awareness and avoiding costly updates, enabling robust adaptation under dynamic conditions. Evaluated in three real world environments with dynamic object placement, BINDER achieves substantially higher success and efficiency than SoTA baselines, demonstrating its effectiveness for real world deployment.