Agents
Information Preferences of Individual Agents in Linear-Quadratic-Gaussian Network Games
We consider linear-quadratic-Gaussian (LQG) network games in which agents have quadratic payoffs that depend on their individual and neighbors' actions, and an unknown payoff-relevant state. An information designer determines the fidelity of information revealed to the agents about the payoff state to maximize the social welfare. Prior results show that full information disclosure is optimal under certain assumptions on the payoffs, i.e., it is beneficial for the average individual. In this paper, we provide conditions based on the strength of the dependence of payoffs on neighbors' actions, i.e., competition, under which a rational agent is expected to benefit, i.e., receive higher payoffs, from full information disclosure. We find that all agents benefit from information disclosure for the star network structure when the game is symmetric and submodular or supermodular. We also identify that the central agent benefits more than a peripheral agent from full information disclosure unless the competition is strong and the number of peripheral agents is small enough. Despite the fact that all agents expect to benefit from information disclosure ex-ante, a central agent can be worse-off from information disclosure in many realizations of the payoff state under strong competition, indicating that a risk-averse central agent can prefer uninformative signals ex-ante.
Onto4MAT: A Swarm Shepherding Ontology for Generalised Multi-Agent Teaming
Hepworth, Adam J., Baxter, Daniel P., Abbass, Hussein A.
Research in multi-agent teaming has increased substantially over recent years, with knowledge-based systems to support teaming processes typically focused on delivering functional (communicative) solutions for a team to act meaningfully in response to direction. Enabling humans to effectively interact and team with a swarm of autonomous cognitive agents is an open research challenge in Human-Swarm Teaming research, partially due to the focus on developing the enabling architectures to support these systems. Typically, bi-directional transparency and shared semantic understanding between agents has not prioritised a designed mechanism in Human-Swarm Teaming, potentially limiting how a human and a swarm team can share understanding and information\textemdash data through concepts and contexts\textemdash to achieve a goal. To address this, we provide a formal knowledge representation design that enables the swarm Artificial Intelligence to reason about its environment and system, ultimately achieving a shared goal. We propose the Ontology for Generalised Multi-Agent Teaming, Onto4MAT, to enable more effective teaming between humans and teams through the biologically-inspired approach of shepherding.
Model-based Multi-agent Reinforcement Learning: Recent Progress and Prospects
Wang, Xihuai, Zhang, Zhicheng, Zhang, Weinan
Significant advances have recently been achieved in Multi-Agent Reinforcement Learning (MARL) which tackles sequential decision-making problems involving multiple participants. However, MARL requires a tremendous number of samples for effective training. On the other hand, model-based methods have been shown to achieve provable advantages of sample efficiency. However, the attempts of model-based methods to MARL have just started very recently. This paper presents a review of the existing research on model-based MARL, including theoretical analyses, algorithms, and applications, and analyzes the advantages and potential of model-based MARL. Specifically, we provide a detailed taxonomy of the algorithms and point out the pros and cons for each algorithm according to the challenges inherent to multi-agent scenarios. We also outline promising directions for future development of this field.
Storytelling AI to improve wellbeing of people with dementia
An artificial intelligence (AI) companion for people with dementia is being developed in research involving the University of Strathclyde. The technology will aid memory recollection, boost confidence and combat depression in people living with Alzheimer's Disease and other types of dementia. Memory loss in people with Alzheimer's Disease occurs in reverse chronological order, with pockets of long-term memory remaining accessible even as the disease progresses. While most current rehabilitative care methods focus on physical aids and repetitive reminding techniques, the new project, named AMPER (Agent-based Memory Prosthesis to Encourage Reminiscing) will take an AI-driven, user-centred approach and will focus on personalised storytelling to help bring a patient's memories back to the surface. The research team is led at Heriot-Watt University and the National Robotarium, a partnership between Heriot-Watt University and the University of Edinburgh.
Strategic Maneuver and Disruption with Reinforcement Learning Approaches for Multi-Agent Coordination
Asher, Derrik E., Basak, Anjon, Fernandez, Rolando, Sharma, Piyush K., Zaroukian, Erin G., Hsu, Christopher D., Dorothy, Michael R., Mahre, Thomas, Galindo, Gerardo, Frerichs, Luke, Rogers, John, Fossaceca, John
Reinforcement learning (RL) approaches can illuminate emergent behaviors that facilitate coordination across teams of agents as part of a multi-agent system (MAS), which can provide windows of opportunity in various military tasks. Technologically advancing adversaries pose substantial risks to a friendly nation's interests and resources. Superior resources alone are not enough to defeat adversaries in modern complex environments because adversaries create standoff in multiple domains against predictable military doctrine-based maneuvers. Therefore, as part of a defense strategy, friendly forces must use strategic maneuvers and disruption to gain superiority in complex multi-faceted domains such as multi-domain operations (MDO). One promising avenue for implementing strategic maneuver and disruption to gain superiority over adversaries is through coordination of MAS in future military operations. In this paper, we present overviews of prominent works in the RL domain with their strengths and weaknesses for overcoming the challenges associated with performing autonomous strategic maneuver and disruption in military contexts.
Approximating Perfect Recall when Model Checking Strategic Abilities: Theory and Applications
Belardinelli, Francesco (Imperial College London, United Kingdom and Laboratoire IBISC, Université d'Evry, France) | Lomuscio, Alessio (Imperial College London United Kingdom) | Malvone, Vadim | Yu, Emily ( Johannes Kepler University Linz, Austria)
The model checking problem for multi-agent systems against specifications in the alternating-time temporal logic ATL, hence ATL∗, under perfect recall and imperfect information is known to be undecidable. To tackle this problem, in this paper we investigate a notion of bounded recall under incomplete information. We present a novel three-valued semantics for ATL∗ in this setting and analyse the corresponding model checking problem. We show that the three-valued semantics here introduced is an approximation of the classic two-valued semantics, then give a sound, albeit partial, algorithm for model checking two-valued perfect recall via its approximation as three-valued bounded recall. Finally, we extend MCMAS, an open-source model checker for ATL and other agent specifications, to incorporate bounded recall; we illustrate its use and present experimental results.
Natural Language Communication with a Teachable Agent
Love, Rachel, Law, Edith, Cohen, Philip R., Kulić, Dana
Conversational teachable agents offer a promising platform to support learning, both in the classroom and in remote settings. In this context, the agent takes the role of the novice, while the student takes on the role of teacher. This framing is significant for its ability to elicit the Prot\'eg\'e effect in the student-teacher, a pedagogical phenomenon known to increase engagement in the teaching task, and also improve cognitive outcomes. In prior work, teachable agents often take a passive role in the learning interaction, and there are few studies in which the agent and student engage in natural language dialogue during the teaching task. This work investigates the effect of teaching modality when interacting with a virtual agent, via the web-based teaching platform, the Curiosity Notebook. A method of teaching the agent by selecting sentences from source material is compared to a method paraphrasing the source material and typing text input to teach. A user study has been conducted to measure the effect teaching modality on the learning outcomes and engagement of the participants. The results indicate that teaching via paraphrasing and text input has a positive effect on learning outcomes for the material covered, and also on aspects of affective engagement. Furthermore, increased paraphrasing effort, as measured by the similarity between the source material and the material the teacher conveyed to the robot, improves learning outcomes for participants.
LIGS: Learnable Intrinsic-Reward Generation Selection for Multi-Agent Learning
Mguni, David Henry, Jafferjee, Taher, Wang, Jianhong, Slumbers, Oliver, Perez-Nieves, Nicolas, Tong, Feifei, Yang, Li, Zhu, Jiangcheng, Yang, Yaodong, Wang, Jun
Efficient exploration is important for reinforcement learners to achieve high rewards. In multi-agent systems, coordinated exploration and behaviour is critical for agents to jointly achieve optimal outcomes. In this paper, we introduce a new general framework for improving coordination and performance of multi-agent reinforcement learners (MARL). Our framework, named Learnable Intrinsic-Reward Generation Selection algorithm (LIGS) introduces an adaptive learner, Generator that observes the agents and learns to construct intrinsic rewards online that coordinate the agents' joint exploration and joint behaviour. Using a novel combination of MARL and switching controls, LIGS determines the best states to learn to add intrinsic rewards which leads to a highly efficient learning process. LIGS can subdivide complex tasks making them easier to solve and enables systems of MARL agents to quickly solve environments with sparse rewards. LIGS can seamlessly adopt existing MARL algorithms and, our theory shows that it ensures convergence to policies that deliver higher system performance. We demonstrate its superior performance in challenging tasks in Foraging and StarCraft II.
Electric-field-coupled oscillators for collective electrochemical perception in underwater robotics
This work explores the application of nonlinear oscillators coupled by electric field in water for collective tasks in underwater robotics. Such coupled oscillators operate in clear and colloidal (mud, bottom silt) water and represent a collective electrochemical sensor that is sensitive to global environmental parameters, geometry of common electric field and spatial dynamics of autonomous underwater vehicles (AUVs). Implemented in hardware and software, this approach can be used to create global awareness in the group of robots, which possess limited sensing and communication capabilities. Using oscillators from different AUVs enables extending the range limitations related to electric dipole of a single AUV. Applications of this technique are demonstrated for detecting the number of AUVs, distances between them, perception of dielectric objects, synchronization of behavior and discrimination between 'collective self' and 'collective non-self' through an 'electrical mirror'. These approaches have been implemented in several research projects with AUVs in fresh and salt water.
The Multi-Agent Pickup and Delivery Problem: MAPF, MARL and Its Warehouse Applications
Lau, Tim Tsz-Kit, Sengupta, Biswa
We study two state-of-the-art solutions to the multi-agent pickup and delivery (MAPD) problem based on different principles -- multi-agent path-finding (MAPF) and multi-agent reinforcement learning (MARL). Specifically, a recent MAPF algorithm called conflict-based search (CBS) and a current MARL algorithm called shared experience actor-critic (SEAC) are studied. While the performance of these algorithms is measured using quite different metrics in their separate lines of work, we aim to benchmark these two methods comprehensively in a simulated warehouse automation environment.