environment description
Code-Driven Planning in Grid Worlds with Large Language Models
Aravindan, Ashwath Vaithinathan, Tang, Zhisheng, Kejriwal, Mayank
We propose an iterative programmatic planning (IPP) framework for solving grid-based tasks by synthesizing interpretable agent policies expressed in code using large language models (LLMs). Instead of relying on traditional search or reinforcement learning, our approach uses code generation as policy synthesis, where the LLM outputs executable programs that map environment states to action sequences. Our proposed architecture incorporates several prompting strategies, including direct code generation, pseudocode-conditioned refinement, and curriculum-based prompting, but also includes an iterative refinement mechanism that updates code based on task performance feedback. We evaluate our approach using six leading LLMs and two challenging grid-based benchmarks (GRASP and MiniGrid). Our IPP framework demonstrates improvements over direct code generation ranging from 10\% to as much as 10x across five of the six models and establishes a new state-of-the-art result for GRASP. IPP is found to significantly outperform direct elicitation of a solution from GPT-o3-mini (by 63\% on MiniGrid to 116\% on GRASP), demonstrating the viability of the overall approach. Computational costs of all code generation approaches are similar. While code generation has a higher initial prompting cost compared to direct solution elicitation (\$0.08 per task vs. \$0.002 per instance for GPT-o3-mini), the code can be reused for any number of instances, making the amortized cost significantly lower (by 400x on GPT-o3-mini across the complete GRASP benchmark).
Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning
Cherepanov, Egor, Kachaev, Nikita, Kovalev, Alexey K., Panov, Aleksandr I.
Memory is crucial for enabling agents to tackle complex tasks with temporal and spatial dependencies. While many reinforcement learning (RL) algorithms incorporate memory, the field lacks a universal benchmark to assess an agent's memory capabilities across diverse scenarios. This gap is particularly evident in tabletop robotic manipulation, where memory is essential for solving tasks with partial observability and ensuring robust performance, yet no standardized benchmarks exist. To address this, we introduce MIKASA (Memory-Intensive Skills Assessment Suite for Agents), a comprehensive benchmark for memory RL, with three key contributions: (1) we propose a comprehensive classification framework for memory-intensive RL tasks, (2) we collect MIKASA-Base - a unified benchmark that enables systematic evaluation of memory-enhanced agents across diverse scenarios, and (3) we develop MIKASA-Robo - a novel benchmark of 32 carefully designed memory-intensive tasks that assess memory capabilities in tabletop robotic manipulation. Our contributions establish a unified framework for advancing memory RL research, driving the development of more reliable systems for real-world applications. The code is available at https://sites.google.com/view/memorybenchrobots/.
Environment Descriptions for Usability and Generalisation in Reinforcement Learning
Soemers, Dennis J. N. J., Samothrakis, Spyridon, Driessens, Kurt, Winands, Mark H. M.
The majority of current reinforcement learning (RL) research involves training and deploying agents in environments that are implemented by engineers in general-purpose programming languages and more advanced frameworks such as CUDA or JAX. This makes the application of RL to novel problems of interest inaccessible to small organisations or private individuals with insufficient engineering expertise. This position paper argues that, to enable more widespread adoption of RL, it is important for the research community to shift focus towards methodologies where environments are described in user-friendly domain-specific or natural languages. Aside from improving the usability of RL, such language-based environment descriptions may also provide valuable context and boost the ability of trained agents to generalise to unseen environments within the set of all environments that can be described in any language of choice.
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
Gu, Xiangming, Zheng, Xiaosen, Pang, Tianyu, Du, Chao, Liu, Qian, Wang, Ye, Jiang, Jing, Lin, Min
A multimodal large language model (MLLM) agent can receive instructions, capture images, retrieve histories from memory, and decide which tools to use. Nonetheless, red-teaming efforts have revealed that adversarial images/prompts can jailbreak an MLLM and cause unaligned behaviors. In this work, we report an even more severe safety issue in multi-agent environments, referred to as infectious jailbreak. It entails the adversary simply jailbreaking a single agent, and without any further intervention from the adversary, (almost) all agents will become infected exponentially fast and exhibit harmful behaviors. To validate the feasibility of infectious jailbreak, we simulate multi-agent environments containing up to one million LLaVA-1.5 agents, and employ randomized pair-wise chat as a proof-of-concept instantiation for multi-agent interaction. Our results show that feeding an (infectious) adversarial image into the memory of any randomly chosen agent is sufficient to achieve infectious jailbreak. Finally, we derive a simple principle for determining whether a defense mechanism can provably restrain the spread of infectious jailbreak, but how to design a practical defense that meets this principle remains an open question to investigate. Our project page is available at https://sail-sg.github.io/Agent-Smith/.
Rodrigues da Silva
We propose a formal design framework to automatically synthesize coordination and control schemes for cooperative multi-agent systems by combining a top-down mission planning with a bottom-up motion planning. The multi-agent system is assigned a global mission, specified as regular languages over all the agents' capabilities, whereas basic motion controllers for each agent shall be designed with respect to given environment description. On one hand, a mission planning layer sits on the top of the proposed framework, decomposing the global mission into local tasks that are in consistency with each agent's individual capabilities, and compositionally verifying the joint effort of the agents via an assume guarantee paradigm. On the other hand, corresponding to these local missions, motion plans associated with each agent are synthesized by composing basic motion primitives, which are verified safe by differential dynamic logic (dL), through a Satisfiability Modulo Theories (SMT) solver that searches feasible solutions in face of constraints due to local task requirements and the environment description. It is shown that the proposed framework can handle changing environments as the motion primitives are reactive in nature, making the motion planning adaptive to local environmental changes. Furthermore, on-line mission reconfiguration can be triggered by the motion planning layer once no feasible solutions can be found through the SMT solver. The effectiveness of the overall design framework is demonstrated by an automated warehouse case study.
6-Layer Model for a Structured Description and Categorization of Urban Traffic and Environment
Scholtes, Maike, Westhofen, Lukas, Turner, Lara Ruth, Lotto, Katrin, Schuldes, Michael, Weber, Hendrik, Wagener, Nicolas, Neurohr, Christian, Bollmann, Martin, Körtke, Franziska, Hiller, Johannes, Hoss, Michael, Bock, Julian, Eckstein, Lutz
Verification and validation of automated driving functions impose large challenges. Currently, scenario-based approaches are investigated in research and industry, aiming at a reduction of testing efforts by specifying safety relevant scenarios. To define those scenarios and operate in a complex real-world design domain, a structured description of the environment is needed. Within the PEGASUS research project, the 6-Layer Model (6LM) was introduced for the description of highway scenarios. This paper refines the 6LM and extends it to urban traffic and environment. As defined in PEGASUS, the 6LM provides the possibility to categorize the environment and, therefore, functions as a structured basis for subsequent scenario description. The model enables a structured description and categorization of the general environment, without incorporating any knowledge or anticipating any functions of actors. Beyond that, there is a variety of other applications of the 6LM, which are elaborated in this paper. The 6LM includes a description of the road network and traffic guidance objects, roadside structures, temporary modifications of the former, dynamic objects, environmental conditions and digital information. The work at hand specifies each layer by categorizing its items. Guidelines are formulated and explanatory examples are given to standardize the application of the model for an objective environment description. In contrast to previous publications, the model and its design are described in far more detail. Finally, the holistic description of the 6LM presented includes remarks on possible future work when expanding the concept to machine perception aspects.