agent learn
#IJCAI2025 distinguished paper: Combining MORL with restraining bolts to learn normative behaviour
Image provided by the authors – generated using Gemini. For many of us, artificial intelligence (AI) has become part of everyday life, and the rate at which we assign previously human roles to AI systems shows no signs of slowing down. AI systems are the crucial ingredients of many technologies -- e.g., self-driving cars, smart urban planning, digital assistants -- across a growing number of domains. At the core of many of these technologies are autonomous agents -- systems designed to act on behalf of humans and make decisions without direct supervision. In order to act effectively in the real world, these agents must be capable of carrying out a wide range of tasks despite possibly unpredictable environmental conditions, which often requires some form of machine learning (ML) for achieving adaptive behaviour.
Learning to flock in open space by avoiding collisions and staying together
Brambati, Martino, Celani, Antonio, Gherardi, Marco, Ginelli, Francesco
The synchronized flight of bird flocks, exemplified by starling murmurations, is perhaps the most striking example of collective behavior in natural systems, which fascinated scholars for quite a long time [1]. Evolutionary biologists, for instance, have long debated the advantages of living in groups [2], which should offer increased protection from predation by diluting the individual risk and 1 possibly confusing the attackers by the sheer size of the assembly. Flocking behavior involves a high degree of order in the individual directions of motion [3], and has been reproduced by minimal models of self-propelling particles (SPPs), such as Craig Reynolds Boids [4] or the celebrated Vicsek model [5] that has long captivated the attention of statistical physicists and played a pivotal role in the birth of the active matter research field. The essential ingredient of these models is the tendency of individual particles to align their direction of motion with those of their local neighbours, which is enough to promote long range order in systems with finite density (even in two spatial dimensions, due to the non-equilibrium nature of self-propelled particles) such as in toy models with periodic boundary conditions. In open systems, constituted by a finite number of individuals in an open, infinite space, purely alignment interactions are however not enough to maintain group cohesion.
Vision based driving agent for race car simulation environments
Bári, Gergely, Palkovics, László
In recent years, autonomous driving has become a popular field of study. As control at tire grip limit is essential during emergency situations, algorithms developed for racecars are useful for road cars too. This paper examines the use of Deep Reinforcement Learning (DRL) to solve the problem of "grip limit driving" in a simulated environment. Proximal Policy Optimization (PPO) method is used to train an agent to control the steering wheel and pedals of the vehicle, using only visual inputs to achieve professional human lap times. The paper outlines the formulation of the task of time optimal driving on a race track as a deep reinforcement learning problem, and explains the chosen observations, actions, and reward functions. The results demonstrate human-like learning and driving behavior that utilize maximum tire grip potential.
Deep Learning Agents Trained For Avoidance Behave Like Hawks And Doves
Reddi, Aryaman, Vinnicombe, Glenn
We present heuristically optimal strategies expressed by deep learning agents playing a simple avoidance game. We analyse the learning and behaviour of two agents within a symmetrical grid world that must cross paths to reach a target destination without crashing into each other or straying off of the grid world in the wrong direction. The agent policy is determined by one neural network that is employed in both agents. Our findings indicate that the fully trained network exhibits behaviour similar to that of the game Hawks and Doves, in that one agent employs an aggressive strategy to reach the target while the other learns how to avoid the aggressive agent.
Reinforcement-Learning based routing for packet-optical networks with hybrid telemetry
Navarro, A. L. García, Koneva, Nataliia, Sánchez-Macián, Alfonso, Hernández, José Alberto, de Dios, Óscar González, Rivas-Moscoso, J. M.
This article provides a methodology and open-source implementation of Reinforcement Learning algorithms for finding optimal routes in a packet-optical network scenario. The algorithm uses measurements provided by the physical layer (pre-FEC bit error rate and propagation delay) and the link layer (link load) to configure a set of latency-based rewards and penalties based on such measurements. Then, the algorithm executes Q-learning based on this set of rewards for finding the optimal routing strategies. It is further shown that the algorithm dynamically adapts to changing network conditions by re-calculating optimal policies upon either link load changes or link degradation as measured by pre-FEC BER.
Exploring the Benefits of Teams in Multiagent Learning
Radke, David, Larson, Kate, Brecht, Tim
For problems requiring cooperation, many multiagent systems implement solutions among either individual agents or across an entire population towards a common goal. Multiagent teams are primarily studied when in conflict; however, organizational psychology (OP) highlights the benefits of teams among human populations for learning how to coordinate and cooperate. In this paper, we propose a new model of multiagent teams for reinforcement learning (RL) agents inspired by OP and early work on teams in artificial intelligence. We validate our model using complex social dilemmas that are popular in recent multiagent RL and find that agents divided into teams develop cooperative pro-social policies despite incentives to not cooperate. Furthermore, agents are better able to coordinate and learn emergent roles within their teams and achieve higher rewards compared to when the interests of all agents are aligned.
Towards a Better Understanding of Learning with Multiagent Teams
Radke, David, Larson, Kate, Brecht, Tim, Tilbury, Kyle
While it has long been recognized that a team of individual learning agents can be greater than the sum of its parts, recent work has shown that larger teams are not necessarily more effective than smaller ones. In this paper, we study why and under which conditions certain team structures promote effective learning for a population of individual learning agents. We show that, depending on the environment, some team structures help agents learn to specialize into specific roles, resulting in more favorable global results. However, large teams create credit assignment challenges that reduce coordination, leading to large teams performing poorly compared to smaller ones. We support our conclusions with both theoretical analysis and empirical results.
A far-sighted approach to machine learning G.R. Jenkin & Associates
The players can cooperate to achieve an objective, and compete against other players with conflicting interests. Creating artificial intelligence agents that can learn to compete and cooperate as effectively as humans remains a thorny problem. A key challenge is enabling AI agents to anticipate future behaviors of other agents when they are all learning simultaneously. Because of the complexity of this problem, current approaches tend to be myopic; the agents can only guess the next few moves of their teammates or competitors, which leads to poor performance in the long run. Researchers from MIT, the MIT-IBM Watson AI Lab, and elsewhere have developed a new approach that gives AI agents a farsighted perspective.
Trajkovski
In this paper we explain how IETAL agents learn their environment, and how they build their intrinsic, internal representation of it, which they then use to build their expectations when on quest to satisfy its active drives. As environments change (with or without other agents present in them), the agents learn to new and "forget" irrelevant, "old" associations made. We discuss the concept of emotional context of associations, and show a gallery of simulations of behaviors in small multiagent societies.
IA : Deep Reinforcement learning. A mimicry of Human evolution?
DRL is an AI technique that aims to take appropriate actions to maximise reward in a certain situation (game/simulation/reality). Before further explaining, it is necessary to give some definitions: - Agent: It is the "player" of the game, the entity who's taking actions, he follows a strategy (called policy) to evolve in the environment. His ultimate goal is to maximize his reward. The environment is said to be in a state s at a given time - Policy: It is the strategy which drives the Agent actions, it is designed by a NN. The policy can change as the Agent learns from his experiences - Reward: A metric aiming to determine the performance of the Agent's actions within the environment Now let's take an example to illustrate the mecanisms of DRL: The famous card game of Poker Texas Hold'em (PTH). In PTH, the agents are the players and the environment is the set of rules of PTH (blinds, number of cards, minimum bet, playing order…).