receptacle
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.97)
- Information Technology > Artificial Intelligence > Natural Language (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- North America > Canada > Ontario > Kingston (0.04)
- (2 more...)
- Overview (0.67)
- Research Report > New Finding (0.67)
Push Smarter, Not Harder: Hierarchical RL-Diffusion Policy for Efficient Nonprehensile Manipulation
Caro, Steven, Smith, Stephen L.
Nonprehensile manipulation, such as pushing objects across cluttered environments, presents a challenging control problem due to complex contact dynamics and long-horizon planning requirements. In this work, we propose HeRD, a hierarchical reinforcement learning-diffusion policy that decomposes pushing tasks into two levels: high-level goal selection and low-level trajectory generation. We employ a high-level reinforcement learning (RL) agent to select intermediate spatial goals, and a low-level goal-conditioned diffusion model to generate feasible, efficient trajectories to reach them. This architecture combines the long-term reward maximizing behaviour of RL with the generative capabilities of diffusion models. We evaluate our method in a 2D simulation environment and show that it outperforms the state-of-the-art baseline in success rate, path efficiency, and generalization across multiple environment configurations. Our results suggest that hierarchical control with generative low-level planning is a promising direction for scalable, goal-directed nonprehensile manipulation. Code, documentation, and trained models are available: https://github.com/carosteven/HeRD.
- North America > United States (0.04)
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
BINDER: Instantly Adaptive Mobile Manipulation with Open-Vocabulary Commands
Cho, Seongwon, Ahn, Daechul, Shin, Donghyun, Choi, Hyeonbeom, Kim, San, Choi, Jonghyun
Open-vocabulary mobile manipulation (OVMM) requires robots to follow language instructions, navigate, and manipulate while updating their world representation under dynamic environmental changes. However, most prior approaches update their world representation only at discrete update points such as navigation targets, waypoints, or the end of an action step, leaving robots blind between updates and causing cascading failures: overlooked objects, late error detection, and delayed replanning. To address this limitation, we propose BINDER (Bridging INstant and DEliberative Reasoning), a dual process framework that decouples strategic planning from continuous environment monitoring. Specifically, BINDER integrates a Deliberative Response Module (DRM, a multimodal LLM for task planning) with an Instant Response Module (IRM, a VideoLLM for continuous monitoring). The two modules play complementary roles: the DRM performs strategic planning with structured 3D scene updates and guides what the IRM attends to, while the IRM analyzes video streams to update memory, correct ongoing actions, and trigger replanning when necessary. Through this bidirectional coordination, the modules address the trade off between maintaining awareness and avoiding costly updates, enabling robust adaptation under dynamic conditions. Evaluated in three real world environments with dynamic object placement, BINDER achieves substantially higher success and efficiency than SoTA baselines, demonstrating its effectiveness for real world deployment.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- (3 more...)
- North America > United States > California (0.04)
- North America > United States > Michigan (0.04)
- Europe > Sweden > Skåne County > Malmö (0.04)
Supplementary Material for HandMeThat: Human-Robot Communication in Physical and Social Environments Y anming Wan
In Section A, we provide the detailed information for HandMeThat data generation and its textual interface. In Section B, we summarize the statistics of the dataset. Recall that HandMeThat uses an object-centric representation for states. "Location" consists of all non-movable entities. Each class (except for "location") is composed of multiple subclasses, and each subclass contains In total, there are 155 object categories. Each object category is also associated with several attributes.
NeSyPr: Neurosymbolic Proceduralization For Efficient Embodied Reasoning
Choi, Wonje, Kim, Jooyoung, Woo, Honguk
We address the challenge of adopting language models (LMs) for embodied tasks in dynamic environments, where online access to large-scale inference engines or symbolic planners is constrained due to latency, connectivity, and resource limitations. To this end, we present NeSyPr, a novel embodied reasoning framework that compiles knowledge via neurosymbolic proceduralization, thereby equipping LM-based agents with structured, adaptive, and timely reasoning capabilities. In NeSyPr, task-specific plans are first explicitly generated by a symbolic tool leveraging its declarative knowledge. These plans are then transformed into composable procedural representations that encode the plans' implicit production rules, enabling the resulting composed procedures to be seamlessly integrated into the LM's inference process. This neurosymbolic proceduralization abstracts and generalizes multi-step symbolic structured path-finding and reasoning into single-step LM inference, akin to human knowledge compilation. It supports efficient test-time inference without relying on external symbolic guidance, making it well suited for deployment in latency-sensitive and resource-constrained physical systems. We evaluate NeSyPr on the embodied benchmarks PDDLGym, VirtualHome, and ALFWorld, demonstrating its efficient reasoning capabilities over large-scale reasoning models and a symbolic planner, while using more compact LMs.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
- Research Report (0.64)
- Workflow (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (4 more...)