Goto

Collaborating Authors

 microwave


Large Language Models as Commonsense Knowledge for Large-Scale Task Planning Anonymous Author(s) Affiliation Address email Appendix 1 A Experimental environments 2 We use the VirtualHome simulator [

Neural Information Processing Systems

A.1 List of objects, containers, surfaces, and rooms in the apartment We list all the objects that are included in our experimental environment. We use the object rearrangement tasks for evaluation. The tasks are randomly sampled from different distributions. Simple: this task is to move one object in the house to the desired location. Novel Simple: this task is to move one object in the house to the desired location.



ReflexGrad: Three-Way Synergistic Architecture for Zero-Shot Generalization in LLM Agents

Kadu, Ankush, Krishnan, Ashwanth

arXiv.org Artificial Intelligence

Enabling agents to learn from experience and generalize across diverse tasks without task-specific training remains a fundamental challenge in reinforcement learning and decision-making. While recent approaches have explored episodic memory (Reflexion), gradient-based prompt optimization (TextGrad),and hierarchical task decomposition independently, their potential for synergistic integration remains unexplored. We introduce ReflexGrad, a novel architecture that tightly couples three complementary mechanisms: (1) LLM-based hierarchical TODO decomposition for strategic planning, (2) history-aware causal reflection that analyzes recent action patterns to identify failure root causes and enable within-trial learning, and (3) gradient-based optimization for systematic improvement. Unlike prior work relying on few-shot demonstrations, our system achieves true zero-shot generalization through pure LLM semantic reasoning,requiring no task-specific examples, fine-tuning, or hardcoded similarity metrics. Evaluated on ALFWorld benchmark tasks, ReflexGrad demonstrates 67% zero-shot success rate on Trial 0 without any prior task experience or demonstrations, establishing effective performance on first exposure. Through empirical analysis, we identify the architectural mechanisms underlying stable convergence (zero action loops) and effective cross-task transfer (67% to 78% improvement).Our work demonstrates that synergistic integration of complementary learning mechanisms enables robust zero-shot generalization that approaches few-shot baselines from prior work.


Supplementary Material for HandMeThat: Human-Robot Communication in Physical and Social Environments Y anming Wan

Neural Information Processing Systems

In Section A, we provide the detailed information for HandMeThat data generation and its textual interface. In Section B, we summarize the statistics of the dataset. Recall that HandMeThat uses an object-centric representation for states. "Location" consists of all non-movable entities. Each class (except for "location") is composed of multiple subclasses, and each subclass contains In total, there are 155 object categories. Each object category is also associated with several attributes.


Is microwave cooking nuking all the nutrients?

Popular Science

Is microwave cooking nuking all the nutrients? Micorwaves have been a kitchen staple since the late 1960s, but are they safe for our food? Breakthroughs, discoveries, and DIY tips sent every weekday. Originally used for radar and other technologies, the power of microwaves was first harnessed specifically for heating food in 1947 . By the late 1960s, commercial microwave ovens were small and inexpensive enough to become fixtures of the modern kitchen.




Large Language Models as Commonsense Knowledge for Large-Scale Task Planning Anonymous Author(s) Affiliation Address email Appendix 1 A Experimental environments 2 We use the VirtualHome simulator [

Neural Information Processing Systems

A.1 List of objects, containers, surfaces, and rooms in the apartment We list all the objects that are included in our experimental environment. We use the object rearrangement tasks for evaluation. The tasks are randomly sampled from different distributions. Simple: this task is to move one object in the house to the desired location. Novel Simple: this task is to move one object in the house to the desired location.


SafeMind: Benchmarking and Mitigating Safety Risks in Embodied LLM Agents

Chen, Ruolin, Sun, Yinqian, Wang, Jihang, Lv, Mingyang, Zhang, Qian, Zeng, Yi

arXiv.org Artificial Intelligence

Embodied agents powered by large language models (LLMs) inherit advanced planning capabilities; however, their direct interaction with the physical world exposes them to safety vulnerabilities. In this work, we identify four key reasoning stages where hazards may arise: Task Understanding, Environment Perception, High-Level Plan Generation, and Low-Level Action Generation. We further formalize three orthogonal safety constraint types (Factual, Causal, and Temporal) to systematically characterize potential safety violations. Building on this risk model, we present SafeMindBench, a multimodal benchmark with 5,558 samples spanning four task categories (Instr-Risk, Env-Risk, Order-Fix, Req-Align) across high-risk scenarios such as sabotage, harm, privacy, and illegal behavior. Extensive experiments on SafeMindBench reveal that leading LLMs (e.g., GPT-4o) and widely used embodied agents remain susceptible to safety-critical failures. To address this challenge, we introduce SafeMindAgent, a modular Planner-Executor architecture integrated with three cascaded safety modules, which incorporate safety constraints into the reasoning process. Results show that SafeMindAgent significantly improves safety rate over strong baselines while maintaining comparable task completion. Together, SafeMindBench and SafeMindAgent provide both a rigorous evaluation suite and a practical solution that advance the systematic study and mitigation of safety risks in embodied LLM agents.


Using Natural Language for Human-Robot Collaboration in the Real World

Lindes, Peter, Skiker, Kaoutar

arXiv.org Artificial Intelligence

We have a vision of a day when autonomous robots can collaborate with humans as assistants in performing complex tasks in the physical world. This vision includes that the robots will have the ability to communicate with their human collaborators using language that is natural to the humans. Traditional Interactive Task Learning (ITL) systems have some of this ability, but the language they can understand is very limited. The advent of large language models (LLMs) provides an opportunity to greatly improve the language understanding of robots, yet integrating the language abilities of LLMs with robots that operate in the real physical world is a challenging problem. In this chapter we first review briefly a few commercial robot products that work closely with humans, and discuss how they could be much better collaborators with robust language abilities. We then explore how an AI system with a cognitive agent that controls a physical robot at its core, interacts with both a human and an LLM, and accumulates situational knowledge through its experiences, can be a possible approach to reach that vision. We focus on three specific challenges of having the robot understand natural language, and present a simple proof-of-concept experiment using ChatGPT for each. Finally, we discuss what it will take to turn these simple experiments into an operational system where LLM-assisted language understanding is a part of an integrated robotic assistant that uses language to collaborate with humans.