Agents
Emergent bartering behaviour in multi-agent reinforcement learning
Advances in artificial intelligence often stem from the development of new environments that abstract real-world situations into a form where research can be done conveniently. This paper contributes such an environment based on ideas inspired by elementary Microeconomics. Agents learn to produce resources in a spatially complex world, trade them with one another, and consume those that they prefer. We show that the emergent production, consumption, and pricing behaviours respond to environmental conditions in the directions predicted by supply and demand shifts in Microeconomics. We also demonstrate settings where the agents' emergent prices for goods vary over space, reflecting the local abundance of goods.
Research Papers based on Multi Agent Systems
This brief aims to sensitize the reader to EGT based issues, results and prospects, which are accruing in importance for the modeling of minds with machines and the engineering of prosocial behaviours in dynamical MAS, with impact on our understanding of the emergence and stability of collective behaviours.
Distributed Control for a Robotic Swarm to Pass through a Curve Virtual Tube
Quan, Quan, Gao, Yan, Bai, Chenggang
Robotic swarm systems are now becoming increasingly attractive for many challenging applications. The main task for any robot is to reach the destination while keeping a safe separation from other robots and obstacles. In many scenarios, robots need to move within a narrow corridor, through a window or a doorframe. In order to guide all robots to move in a cluttered environment, a curve virtual tube with no obstacle inside is carefully designed in this paper. There is no obstacle inside the tube, namely the area inside the tube can be seen as a safety zone. Then, a distributed swarm controller is proposed with three elaborate control terms: a line approaching term, a robot avoidance term and a tube keeping term. Formal analysis and proofs are made to show that the curve virtual tube passing problem can be solved in a finite time. For the convenience in practical use, a modified controller with an approximate control performance is put forward. Finally, the effectiveness of the proposed method is validated by numerical simulations and real experiments. To show the advantages of the proposed method, the comparison between our method and the control barrier function method is also presented in terms of calculation speed.
A Few Queries Go a Long Way: Information-Distortion Tradeoffs in Matching
Amanatidis, Georgios (University of Essex) | Birmpas, Georgios (Sapienza University of Rome) | Filos-Ratsikas, Aris (University of Liverpool) | Voudouris, Alexandros A. (University of Essex)
We consider the One-Sided Matching problem, where n agents have preferences over n items, and these preferences are induced by underlying cardinal valuation functions. The goal is to match every agent to a single item so as to maximize the social welfare. Most of the related literature, however, assumes that the values of the agents are not a priori known, and only access to the ordinal preferences of the agents over the items is provided. Consequently, this incomplete information leads to loss of efficiency, which is measured by the notion of distortion. In this paper, we further assume that the agents can answer a small number of queries, allowing us partial access to their values. We study the interplay between elicited cardinal information (measured by the number of queries per agent) and distortion for One-Sided Matching, as well as a wide range of well-studied related problems. Qualitatively, our results show that with a limited number of queries, it is possible to obtain significant improvements over the classic setting, where only access to ordinal information is given.
Artificial Intelligence to Support UAVs in Healthcare
Unmanned aerial vehicles (UAVs), or simply drones, are used in a plethora of civil applications due to their ease of deployment, low maintenance cost, high mobility, and ability to hover. A main advantage of drones is that, in contrast to other vehicles, they are not restricted to traveling over a road network and thus, can swiftly move over disperse locations. Such vehicles are utilized for many applications such as the real-time monitoring of road traffic, civil infrastructure inspection, wireless coverage, delivery of goods, security and surveillance, precision agriculture, and healthcare. Regarding the latter, drones can be utilized in natural disaster relief, as search and rescue units, as transfer units, and to support telemedicine. For drones to be efficient in such applications, their scheduled and coordinated flying is crucial. Moreover, given that drones typically use an electric motor and store the required energy in batteries, their scheduled charging is crucial to maximizing their availability.Controlling drones demands efficient algorithms that can solve problems that involve a large number of heterogeneous entities (e.g., drones’ owners), each one having its own goals, needs, and incentives (e.g., amount of goods to transport), while they operate in highly dynamic environments (e.g., variable number of drones) and having to deal with a number of uncertainties (e.g., future requests, emergency situations). In this context, artificial intelligence (AI) techniq...
Proactive Dynamic Distributed Constraint Optimization Problems
Hoang, Khoi D. | Fioretto, Ferdinando (Syracuse University) | Hou, Ping (Uber Advanced Technologies Group) | Yeoh, William (Washington University in St. Louis) | Yokoo, Makoto (Kyushu University) | Zivan, Roie (Ben-Gurion University of the Negev)
The Distributed Constraint Optimization Problem (DCOP) formulation is a powerful tool for modeling multi-agent coordination problems. To solve DCOPs in a dynamic environment, Dynamic DCOPs (D-DCOPs) have been proposed to model the inherent dynamism present in many coordination problems. D-DCOPs solve a sequence of static problems by reacting to changes in the environment as the agents observe them. Such reactive approaches ignore knowledge about future changes of the problem. To overcome this limitation, we introduce Proactive Dynamic DCOPs (PD-DCOPs), a novel formalism to model D-DCOPs in the presence of exogenous uncertainty. In contrast to reactive approaches, PD-DCOPs are able to explicitly model possible changes of the problem and take such information into account when solving the dynamically changing problem in a proactive manner. The additional expressivity of this formalism allows it to model a wider variety of distributed optimization problems. Our work presents both theoretical and practical contributions that advance current dynamic DCOP models: (i) We introduce Proactive Dynamic DCOPs (PD-DCOPs), which explicitly model how the DCOP will change over time; (ii) We develop exact and heuristic algorithms to solve PD-DCOPs in a proactive manner; (iii) We provide theoretical results about the complexity of this new class of DCOPs; and (iv) We empirically evaluate both proactive and reactive algorithms to determine the trade-offs between the two classes. The final contribution is important as our results are the first that identify the characteristics of the problems that the two classes of algorithms excel in.
Fair Division of Indivisible Goods for a Class of Concave Valuations
Chaudhury, Bhaskar Ray (MPI for Informatics, Saarland Informatics Campus) | Cheung, Yun Kuen (Royal Holloway University of London) | Garg, Jugal (University of Illinois at Urbana-Champaign) | Garg, Naveen (IIT Delhi) | Hoefer, Martin (Goethe-Universität Frankfurt am Main) | Mehlhorn, Kurt (MPI for Informatics, Saarland Informatics Campus)
We study the fair and efficient allocation of a set of indivisible goods among agents, where each good has several copies, and each agent has an additively separable concave valuation function with a threshold. These valuations capture the property of diminishing marginal returns, and they are more general than the well-studied case of additive valuations. We present a polynomial-time algorithm that approximates the optimal Nash social welfare (NSW) up to a factor of e1/e ≈ 1.445. This matches with the state-of-the-art approximation factor for additive valuations. The computed allocation also satisfies the popular fairness guarantee of envy-freeness up to one good (EF1) up to a factor of 2 + ε. For instances without thresholds, it is also approximately Pareto-optimal. For instances satisfying a large market property, we show an improved approximation factor. Lastly, we show that the upper bounds on the optimal NSW introduced in Cole and Gkatzelis (2018) and Barman et al. (2018) have the same value.
Avoiding Negative Side Effects of Autonomous Systems in the Open World
Saisubramanian, Sandhya (Oregon State University) | Kamar, Ece (Microsoft Research) | Zilberstein, Shlomo (University of Massachusetts Amherst)
Autonomous systems that operate in the open world often use incomplete models of their environment. Model incompleteness is inevitable due to the practical limitations in precise model specification and data collection about open-world environments. Due to the limited fidelity of the model, agent actions may produce negative side effects (NSEs) when deployed. Negative side effects are undesirable, unmodeled effects of agent actions on the environment. NSEs are inherently challenging to identify at design time and may affect the reliability, usability and safety of the system. We present two complementary approaches to mitigate the NSE via: (1) learning from feedback, and (2) environment shaping. The solution approaches target settings with different assumptions and agent responsibilities. In learning from feedback, the agent learns a penalty function associated with a NSE. We investigate the efficiency of different feedback mechanisms, including human feedback and autonomous exploration. The problem is formulated as a multi-objective Markov decision process such that optimizing the agent’s assigned task is prioritized over mitigating NSE. A slack parameter denotes the maximum allowed deviation from the optimal expected reward for the agent’s task in order to mitigate NSE. In environment shaping, we examine how a human can assist an agent, beyond providing feedback, and utilize their broader scope of knowledge to mitigate the impacts of NSE. We formulate the problem as a human-agent collaboration with decoupled objectives. The agent optimizes its assigned task and may produce NSE during its operation. The human assists the agent by performing modest reconfigurations of the environment so as to mitigate the impacts of NSE, without affecting the agent’s ability to complete its assigned task. We present an algorithm for shaping and analyze its properties. Empirical evaluations demonstrate the trade-offs in the performance of different approaches in mitigating NSE in different settings.
Massive Twinning to Enhance Emergent Intelligence
Yuan, Siyu, Han, Bin, Krummacker, Dennis, Schotten, Hans D.
As a complement to conventional AI solutions, emergent intelligence (EI) exhibits competitiveness in 6G IIoT scenario for its various outstanding features including robustness, protection to privacy, and scalability. However, despite the low computational complexity, EI is challenged by its high demand of data traffic in massive deployment. We propose to leverage massive twinning, which 6G is envisaged to support, to reduce the data traffic in EI and therewith enhance its performance.