AITopics | handmethat

HandMeThat: Human-RobotCommunication inPhysicalandSocialEnvironments

Neural Information Processing SystemsFeb-8-2026, 20:52:53 GMT

While previousbenchmarks insimilar domains havebeenprimarily focusing onthelanguage grounding of object properties (e.g., "table"), relations (e.g., "on"), and planning (e.g., object search and manipulation) [6,7],inthispaper,wehighlights theadditional challenge forunderstanding human instructions withambiguities (i.e., recognizing the subgoal) based on physical states and human actionsandgoals. Each episode in HandMeThat contains twostages.

artificial intelligence, machine learning, object-oriented architecture, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.46)

Add feedback

HandMeThat: Human-Robot Communication in Physical and Social Environments

Neural Information Processing SystemsDec-24-2025, 04:36:42 GMT

We introduce HandMeThat, a benchmark for a holistic evaluation of instruction understanding and following in physical and social environments. While previous datasets primarily focused on language grounding and planning, HandMeThat considers the resolution of human instructions with ambiguities based on the physical (object states and relations) and social (human actions and goals) information. HandMeThat contains 10,000 episodes of human-robot interactions. In each episode, the robot first observes a trajectory of human actions towards her internal goal. Next, the robot receives a human instruction and should take actions to accomplish the subgoal set through the instruction. In this paper, we present a textual interface for our benchmark, where the robot interacts with a virtual environment through textual commands. We evaluate several baseline models on HandMeThat, and show that both offline and online reinforcement learning algorithms perform poorly on HandMeThat, suggesting significant room for future work on physical and social human-robot communications and interactions.

handmethat, human-robot communication, physical and social environment, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.61)

Add feedback

Supplementary Material for HandMeThat: Human-Robot Communication in Physical and Social Environments Y anming Wan

Neural Information Processing SystemsAug-14-2025, 18:21:31 GMT

In Section A, we provide the detailed information for HandMeThat data generation and its textual interface. In Section B, we summarize the statistics of the dataset. Recall that HandMeThat uses an object-centric representation for states. "Location" consists of all non-movable entities. Each class (except for "location") is composed of multiple subclasses, and each subclass contains In total, there are 155 object categories. Each object category is also associated with several attributes.

agent, category, dataset, (15 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Workflow (0.67)

Industry: Consumer Products & Services (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.41)

Add feedback

HandMeThat: Human-Robot Communication in Physical and Social Environments Y anming Wan

Neural Information Processing SystemsAug-14-2025, 18:21:27 GMT

HandMeThat contains 10,000 episodes of human-robot interactions.

handmethat, instruction, subgoal, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.67)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.61)

Add feedback

HandMeThat: Human-Robot Communication in Physical and Social Environments

Neural Information Processing SystemsOct-11-2024, 00:08:18 GMT

We introduce HandMeThat, a benchmark for a holistic evaluation of instruction understanding and following in physical and social environments. While previous datasets primarily focused on language grounding and planning, HandMeThat considers the resolution of human instructions with ambiguities based on the physical (object states and relations) and social (human actions and goals) information. HandMeThat contains 10,000 episodes of human-robot interactions. In each episode, the robot first observes a trajectory of human actions towards her internal goal. Next, the robot receives a human instruction and should take actions to accomplish the subgoal set through the instruction.

handmethat, human-robot communication, physical and social environment, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.71)

Add feedback

Infer Human's Intentions Before Following Natural Language Instructions

Wan, Yanming, Wu, Yue, Wang, Yiping, Mao, Jiayuan, Jaques, Natasha

arXiv.org Artificial IntelligenceSep-26-2024

For AI agents to be helpful to humans, they should be able to follow natural language instructions to complete everyday cooperative tasks in human environments. However, real human instructions inherently possess ambiguity, because the human speakers assume sufficient prior knowledge about their hidden goals and intentions. Standard language grounding and planning methods fail to address such ambiguities because they do not model human internal goals as additional partially observable factors in the environment. We propose a new framework, Follow Instructions with Social and Embodied Reasoning (FISER), aiming for better natural language instruction following in collaborative embodied tasks. Our framework makes explicit inferences about human goals and intentions as intermediate reasoning steps. We implement a set of Transformer-based models and evaluate them over a challenging benchmark, HandMeThat. We empirically demonstrate that using social reasoning to explicitly infer human intentions before making action plans surpasses purely end-to-end approaches. We also compare our implementation with strong baselines, including Chain of Thought prompting on the largest available pre-trained language models, and find that FISER provides better performance on the embodied social reasoning tasks under investigation, reaching the state-of-the-art on HandMeThat.

instruction, reasoning, robot, (15 more...)

arXiv.org Artificial Intelligence

2409.18073

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Consumer Products & Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback

HandMeThat: Human-Robot Communication in Physical and Social Environments

Wan, Yanming, Mao, Jiayuan, Tenenbaum, Joshua B.

arXiv.org Artificial IntelligenceOct-5-2023

We introduce HandMeThat, a benchmark for a holistic evaluation of instruction understanding and following in physical and social environments. While previous datasets primarily focused on language grounding and planning, HandMeThat considers the resolution of human instructions with ambiguities based on the physical (object states and relations) and social (human actions and goals) information. HandMeThat contains 10,000 episodes of human-robot interactions. In each episode, the robot first observes a trajectory of human actions towards her internal goal. Next, the robot receives a human instruction and should take actions to accomplish the subgoal set through the instruction. In this paper, we present a textual interface for our benchmark, where the robot interacts with a virtual environment through textual commands. We evaluate several baseline models on HandMeThat, and show that both offline and online reinforcement learning algorithms perform poorly on HandMeThat, suggesting significant room for future work on physical and social human-robot communications and interactions.

handmethat, human-robot communication, physical and social environment

arXiv.org Artificial Intelligence

2310.03779

Genre: Research Report (0.40)

Technology: