Goto

Collaborating Authors

 test agent


Supplementary Materials A Organization of Supplementary Materials

Neural Information Processing Systems

The supplementary materials consist of five main sections. In Appendix B, we give a detailed overview of the related literature. Proofs for Section 3. In Appendix C, we give the proofs of Theorem 1 and Proposition 1. Algorithm and Implementation Details. In Appendix D, we provide further details about the implementation and training procedure for PerSim and the RL methods we benchmark against. In Appendix E, we detail the setup used to run our experiments.


Supplementary Materials A Organization of Supplementary Materials

Neural Information Processing Systems

The supplementary materials consist of five main sections. In Appendix B, we give a detailed overview of the related literature. Proofs for Section 3. In Appendix C, we give the proofs of Theorem 1 and Proposition 1. Algorithm and Implementation Details. In Appendix D, we provide further details about the implementation and training procedure for PerSim and the RL methods we benchmark against. In Appendix E, we detail the setup used to run our experiments.


Cooperative Multi-agent Approach for Automated Computer Game Testing

Shirzadeh-hajimahmood, Samira, Prasteya, I. S. W. B., Dastani, Mehdi, Dignum, Frank

arXiv.org Artificial Intelligence

Automated testing of computer games is a challenging problem, especially when lengthy scenarios have to be tested. Automating such a scenario boils down to finding the right sequence of interactions given an abstract description of the scenario. Recent works have shown that an agent-based approach works well for the purpose, e.g. due to agents' reactivity, hence enabling a test agent to immediately react to game events and changing state. Many games nowadays are multi-player. This opens up an interesting possibility to deploy multiple cooperative test agents to test such a game, for example to speed up the execution of multiple testing tasks. This paper offers a cooperative multi-agent testing approach and a study of its performance based on a case study on a 3D game called Lab Recruits.


Large Language Models as Test Case Generators: Performance Evaluation and Enhancement

Li, Kefan, Yuan, Yuan

arXiv.org Artificial Intelligence

Code generation with Large Language Models (LLMs) has been extensively studied and achieved remarkable progress. As a complementary aspect to code generation, test case generation is of crucial importance in ensuring the quality and reliability of code. However, using LLMs as test case generators has been much less explored. Current research along this line primarily focuses on enhancing code generation with assistance from test cases generated by LLMs, while the performance of LLMs in test case generation alone has not been comprehensively examined. To bridge this gap, we conduct extensive experiments to study how well LLMs can generate high-quality test cases. We find that as the problem difficulty increases, state-of-the-art LLMs struggle to generate correct test cases, largely due to their inherent limitations in computation and reasoning. To mitigate this issue, we further propose a multi-agent framework called \emph{TestChain} that decouples the generation of test inputs and test outputs. Notably, TestChain uses a ReAct format conversation chain for LLMs to interact with a Python interpreter in order to provide more accurate test outputs. Our results indicate that TestChain outperforms the baseline by a large margin. Particularly, in terms of the accuracy of test cases, TestChain using GPT-4 as the backbone achieves a 13.84\% improvement over the baseline on the LeetCode-hard dataset.


Flow-Based Synthesis of Reactive Tests for Discrete Decision-Making Systems with Temporal Logic Specifications

Graebener, Josefine B., Badithela, Apurva S., Goktas, Denizalp, Ubellacker, Wyatt, Mazumdar, Eric V., Ames, Aaron D., Murray, Richard M.

arXiv.org Artificial Intelligence

Designing tests to evaluate if a given autonomous system satisfies complex specifications is challenging due to the complexity of these systems. This work proposes a flow-based approach for reactive test synthesis from temporal logic specifications, enabling the synthesis of test environments consisting of static and reactive obstacles and dynamic test agents. The temporal logic specifications describe desired test behavior, including system requirements as well as a test objective that is not revealed to the system. The synthesized test strategy places restrictions on system actions in reaction to the system state. The tests are minimally restrictive and accomplish the test objective while ensuring realizability of the system's objective without aiding it (semi-cooperative setting). Automata theory and flow networks are leveraged to formulate a mixed-integer linear program (MILP) to synthesize the test strategy. For a dynamic test agent, the agent strategy is synthesized for a GR(1) specification constructed from the solution of the MILP. If the specification is unrealizable by the dynamics of the test agent, a counterexample-guided approach is used to resolve the MILP until a strategy is found. This flow-based, reactive test synthesis is conducted offline and is agnostic to the system controller. Finally, the resulting test strategy is demonstrated in simulation and experimentally on a pair of quadrupedal robots for a variety of specifications.


Can Agents Run Relay Race with Strangers? Generalization of RL to Out-of-Distribution Trajectories

Lan, Li-Cheng, Zhang, Huan, Hsieh, Cho-Jui

arXiv.org Artificial Intelligence

In this paper, we define, evaluate, and improve the "relay-generalization" performance of reinforcement learning (RL) agents on the out-of-distribution "controllable" states. Ideally, an RL agent that generally masters a task should reach its goal starting from any controllable state of the environment instead of memorizing a small set of trajectories. For example, a self-driving system should be able to take over the control from humans in the middle of driving and must continue to drive the car safely. To practically evaluate this type of generalization, we start the test agent from the middle of other independently well-trained stranger agents' trajectories. With extensive experimental evaluation, we show the prevalence of generalization failure on controllable states from stranger agents. For example, in the Humanoid environment, we observed that a well-trained Proximal Policy Optimization (PPO) agent, with only 3.9% failure rate during regular testing, failed on 81.6% of the states generated by well-trained stranger PPO agents. To improve "relay generalization," we propose a novel method called Self-Trajectory Augmentation (STA), which will reset the environment to the agent's old states according to the Q function during training. After applying STA to the Soft Actor Critic's (SAC) training procedure, we reduced the failure rate of SAC under relay-evaluation by more than three times in most settings without impacting agent performance and increasing the needed number of environment interactions. Our code is available at https://github.com/lan-lc/STA. Generalization is critical for deploying reinforcement learning (RL) agents into real-world applications. A well-trained RL agent that can achieve high rewards under restricted settings may not be able to handle the enormous state space and complex environment variations in the real world. There are many different aspects regarding the generalization of RL agents.


MERMAIDE: Learning to Align Learners using Model-Based Meta-Learning

Banerjee, Arundhati, Phade, Soham, Ermon, Stefano, Zheng, Stephan

arXiv.org Artificial Intelligence

We study how a principal can efficiently and effectively intervene on the rewards of a previously unseen learning agent in order to induce desirable outcomes. This is relevant to many real-world settings like auctions or taxation, where the principal may not know the learning behavior nor the rewards of real people. Moreover, the principal should be few-shot adaptable and minimize the number of interventions, because interventions are often costly. We introduce MERMAIDE, a model-based meta-learning framework to train a principal that can quickly adapt to out-of-distribution agents with different learning strategies and reward functions. We validate this approach step-by-step. First, in a Stackelberg setting with a best-response agent, we show that meta-learning enables quick convergence to the theoretically known Stackelberg equilibrium at test time, although noisy observations severely increase the sample complexity. We then show that our model-based meta-learning approach is cost-effective in intervening on bandit agents with unseen explore-exploit strategies. Finally, we outperform baselines that use either meta-learning or agent behavior modeling, in both $0$-shot and $K=1$-shot settings with partial agent information.