rule-based agent
LLMs as Agentic Cooperative Players in Multiplayer UNO
Matinez, Yago Romano, Roberts, Jesse
Third, the current game state data--number of players, last played card, hand contents, next player, recent moves, and legal actions. Finally, the LLM was asked to choose the best action according to the specified prompting method. The game state information was extracted from RLCard and reformatted for readability. While RLCard encodes cards using shorthand (e.g., "r-5" for red 5), we expanded these into full descriptions to improve the model's comprehension. An example of the complete prompt format is shown in Figure 3. To drive the model's action selection, we applied two prompting strategies inspired by Moore et al. [17]: cloze prompting and counterfactual prompting. These methods determine how the model interprets the prompt and evaluates its legal actions during gameplay. Cloze Prompting: In this method, legal actions were labeled with sequential letters (A, B, C, etc.), and the LLM was instructed to choose the letter corresponding to the best move. Only one token was allowed in the output, and the highest-probability token from the set of allowable actions was selected as the action.
Imitation Learning for Intra-Day Power Grid Operation through Topology Actions
de Jong, Matthijs, Viebahn, Jan, Shapovalova, Yuliya
Power grid operation is becoming increasingly complex due to the increase in generation of renewable energy. The recent series of Learning To Run a Power Network (L2RPN) competitions have encouraged the use of artificial agents to assist human dispatchers in operating power grids. In this paper we study the performance of imitation learning for day-ahead power grid operation through topology actions. In particular, we consider two rule-based expert agents: a greedy agent and a N-1 agent. While the latter is more computationally expensive since it takes N-1 safety considerations into account, it exhibits a much higher operational performance. We train a fully-connected neural network (FCNN) on expert state-action pairs and evaluate it in two ways. First, we find that classification accuracy is limited despite extensive hyperparameter tuning, due to class imbalance and class overlap. Second, as a power system agent, the FCNN performs only slightly worse than expert agents. Furthermore, hybrid agents, which incorporate minimal additional simulations, match expert agents' performance with significantly lower computational cost. Consequently, imitation learning shows promise for developing fast, high-performing power grid agents, motivating its further exploration in future L2RPN studies.
Strategy Game-Playing with Size-Constrained State Abstraction
Xu, Linjie, Perez-Liebana, Diego, Dockhorn, Alexander
Playing strategy games is a challenging problem for artificial intelligence (AI). One of the major challenges is the large search space due to a diverse set of game components. In recent works, state abstraction has been applied to search-based game AI and has brought significant performance improvements. State abstraction techniques rely on reducing the search space, e.g., by aggregating similar states. However, the application of these abstractions is hindered because the quality of an abstraction is difficult to evaluate. Previous works hence abandon the abstraction in the middle of the search to not bias the search to a local optimum. This mechanism introduces a hyper-parameter to decide the time to abandon the current state abstraction. In this work, we propose a size-constrained state abstraction (SCSA), an approach that limits the maximum number of nodes being grouped together. We found that with SCSA, the abstraction is not required to be abandoned. Our empirical results on $3$ strategy games show that the SCSA agent outperforms the previous methods and yields robust performance over different games. Codes are open-sourced at \url{https://github.com/GAIGResearch/Stratega}.
Multi-Agent Training for Pommerman: Curriculum Learning and Population-based Self-Play Approach
Huynh, Nhat-Minh, Cao, Hoang-Giang, Wu, I-Chen
Pommerman is a multi-agent environment that has received considerable attention from researchers in recent years. This environment is an ideal benchmark for multi-agent training, providing a battleground for two teams with communication capabilities among allied agents. Pommerman presents significant challenges for model-free reinforcement learning due to delayed action effects, sparse rewards, and false positives, where opponent players can lose due to their own mistakes. This study introduces a system designed to train multi-agent systems to play Pommerman using a combination of curriculum learning and population-based self-play. We also tackle two challenging problems when deploying the multi-agent training system for competitive games: sparse reward and suitable matchmaking mechanism. Specifically, we propose an adaptive annealing factor based on agents' performance to adjust the dense exploration reward during training dynamically. Additionally, we implement a matchmaking mechanism utilizing the Elo rating system to pair agents effectively. Our experimental results demonstrate that our trained agent can outperform top learning agents without requiring communication among allied agents.
Middleware-based multi-agent development environment for building and testing distributed intelligent systems
Aguayo-Canela, Francisco José, Alaiz-Moretón, Héctor, García-Ordás, María Teresa, Benítez-Andrades, José Alberto, Benavides, Carmen, Novais, Paulo, García-Rodríguez, Isaías
The spread of the Internet of Things (IoT) is demanding new, powerful architectures for handling the huge amounts of data produced by the IoT devices. In many scenarios, many existing isolated solutions applied to IoT devices use a set of rules to detect, report and mitigate malware activities or threats. This paper describes a development environment that allows the programming and debugging of such rule-based multi-agent solutions. The solution consists of the integration of a rule engine into the agent, the use of a specialized, wrapping agent class with a graphical user interface for programming and testing purposes, and a mechanism for the incremental composition of behaviors. Finally, a set of examples and a comparative study were accomplished to test the suitability and validity of the approach. The JADE multi-agent middleware has been used for the practical implementation of the approach.
Enriched multi-agent middleware for building rule-based distributed security solutions for IoT environments
Aguayo-Canela, Francisco José, Alaiz-Moretón, Héctor, García-Ordás, María Teresa, Benítez-Andrades, José Alberto, Benavides, Carmen, García-Rodríguez, Isaías
The increasing number of connected devices and the complexity of Internet of Things (IoT) ecosystems are demanding new architectures for managing and securing these networked environments. Intrusion Detection Systems (IDS) are security solutions that help to detect and mitigate the threats that IoT systems face, but there is a need for new IDS strategies and architectures. This paper describes a development environment that allows the programming and debugging of distributed, rule-based multi-agent IDS solutions. The proposed solution consists in the integration of a rule engine into the agent, the use of a specialized, wrapping agent class with a graphical user interface for programming and debugging purposes, and a mechanism for the incremental composition of behaviors. A comparative study and an example IDS are used to test and show the suitability and validity of the approach. The JADE multi-agent middleware has been used for the practical implementations.
Sense, Imagine, Act: Multimodal Perception Improves Model-Based Reinforcement Learning for Head-to-Head Autonomous Racing
Shrestha, Elena, Reddy, Chetan, Wan, Hanxi, Zhuang, Yulun, Vasudevan, Ram
Model-based reinforcement learning (MBRL) techniques have recently yielded promising results for real-world autonomous racing using high-dimensional observations. MBRL agents, such as Dreamer, solve long-horizon tasks by building a world model and planning actions by latent imagination. This approach involves explicitly learning a model of the system dynamics and using it to learn the optimal policy for continuous control over multiple timesteps. As a result, MBRL agents may converge to sub-optimal policies if the world model is inaccurate. To improve state estimation for autonomous racing, this paper proposes a self-supervised sensor fusion technique that combines egocentric LiDAR and RGB camera observations collected from the F1TENTH Gym. The zero-shot performance of MBRL agents is empirically evaluated on unseen tracks and against a dynamic obstacle. This paper illustrates that multimodal perception improves robustness of the world model without requiring additional training data. The resulting multimodal Dreamer agent safely avoided collisions and won the most races compared to other tested baselines in zero-shot head-to-head autonomous racing.
Managing power grids through topology actions: A comparative study between advanced rule-based and reinforcement learning agents
Lehna, Malte, Viebahn, Jan, Scholz, Christoph, Marot, Antoine, Tomforde, Sven
The operation of electricity grids has become increasingly complex due to the current upheaval and the increase in renewable energy production. As a consequence, active grid management is reaching its limits with conventional approaches. In the context of the Learning to Run a Power Network challenge, it has been shown that Reinforcement Learning (RL) is an efficient and reliable approach with considerable potential for automatic grid operation. In this article, we analyse the submitted agent from Binbinchen and provide novel strategies to improve the agent, both for the RL and the rule-based approach. The main improvement is a N-1 strategy, where we consider topology actions that keep the grid stable, even if one line is disconnected. More, we also propose a topology reversion to the original grid, which proved to be beneficial. The improvements are tested against reference approaches on the challenge test sets and are able to increase the performance of the rule-based agent by 27%. In direct comparison between rule-based and RL agent we find similar performance. However, the RL agent has a clear computational advantage. We also analyse the behaviour in an exemplary case in more detail to provide additional insights. Here, we observe that through the N-1 strategy, the actions of the agents become more diversified.