Agents
Are You Doing What I Think You Are Doing? Criticising Uncertain Agent Models
Albrecht, Stefano V., Ramamoorthy, S.
The key for effective interaction in many multiagent applications is to reason explicitly about the behaviour of other agents, in the form of a hypothesised behaviour. While there exist several methods for the construction of a behavioural hypothesis, there is currently no universal theory which would allow an agent to contemplate the correctness of a hypothesis. In this work, we present a novel algorithm which decides this question in the form of a frequentist hypothesis test. The algorithm allows for multiple metrics in the construction of the test statistic and learns its distribution during the interaction process, with asymptotic correctness guarantees. We present results from a comprehensive set of experiments, demonstrating that the algorithm achieves high accuracy and scalability at low computational costs.
Analysis of the Synergy between Modularity and Autonomy in an Artificial Intelligence Based Fleet Competition
Li, Xingyu, Mitra, Mainak, Epureanu, Bogdan I.
A novel approach is provided for evaluating the benefits and burdens from vehicle modularity in fleets/units through the analysis of a game theoretical model of the competition between autonomous vehicle fleets in an attacker-defender game. We present an approach to obtain the heuristic operational strategies through fitting a decision tree on high-fidelity simulation results of an intelligent agent-based model. A multi-stage game theoretical model is also created for decision making considering military resources and impacts of past decisions. Nash equilibria of the operational strategy are revealed, and their characteristics are explored. The benefits of fleet modularity are also analyzed by comparing the results of the decision making process under diverse operational situations.
Voting-Based Multi-Agent Reinforcement Learning
Xu, Yue, Deng, Zengde, Wang, Mengdi, Xu, Wenjun, So, Anthony Man-Cho, Cui, Shuguang
The recent success of single-agent reinforcement learning (RL) encourages the exploration of multi-agent reinforcement learning (MARL), which is more challenging due to the interactions among different agents. In this paper, we consider a voting-based MARL problem, in which the agents vote to make group decisions and the goal is to maximize the globally averaged returns. To this end, we formulate the MARL problem based on the linear programming form of the policy optimization problem and propose a distributed primal-dual algorithm to obtain the optimal solution. We also propose a voting mechanism through which the distributed learning achieves the same sub-linear convergence rate as centralized learning. In other words, the distributed decision making does not slow down the global consensus to optimal. We also verify the convergence of our proposed algorithm with numerical simulations and conduct case studies in practical multi-agent systems.
Adaptive Music Composition for Games
Hutchings, Patrick, McCormack, Jon
The generation of music that adapts dynamically to content and actions has an important role in building more immersive, memorable and emotive game experiences. To date, the development of adaptive music systems for video games is limited by both the nature of algorithms used for real-time music generation and the limited modelling of player action, game world context and emotion in current games. We propose that these issues must be addressed in tandem for the quality and flexibility of adaptive game music to significantly improve. Cognitive models of knowledge organisation and emotional affect are integrated with multi-modal, multi-agent composition techniques to produce a novel Adaptive Music System (AMS). The system is integrated into two stylistically distinct games. Gamers reported an overall higher immersion and correlation of music with game-world concepts with the AMS than with the original game soundtracks in both games.
Artificial Intelligence: A Child's Play
We discuss the objectives of any endeavor in creating artificial intelligence, AI, and provide a possible alternative. Intelligence might be an unintended consequence of curiosity left to roam free, best exemplified by a frolicking infant. This suggests that our attempts at AI could have been misguided; what we actually need to strive for can be termed artificial curiosity, AC, and intelligence happens as a consequence of those efforts. For this unintentional yet welcome aftereffect to set in a foundational list of guiding principles needs to be present. We discuss what these essential doctrines might be and why their establishment is required to form connections, possibly growing, between a knowledge store that has been built up and new pieces of information that curiosity will bring back. As more findings are acquired and more bonds are fermented, we need a way to, periodically, reduce the amount of data; in the sense, it is important to capture the critical characteristics of what has been accumulated or produce a summary of what has been gathered. We start with the intuition for this line of reasoning and formalize it with a series of models (and iterative improvements) that will be necessary to make the incubation of intelligence a reality. Our discussion provides conceptual modifications to the Turing Test and to Searle's Chinese room argument. We discuss the future implications for society as AI becomes an integral part of life.
Learning to Interactively Learn and Assist
Woodward, Mark, Finn, Chelsea, Hausman, Karol
When deploying autonomous agents in the real world, we need effective ways of communicating objectives to them. Traditional skill learning has revolved around reinforcement and imitation learning, each with rigid constraints on the format of information exchanged between the human and the agent. While scalar rewards carry little information, demonstrations require significant effort to provide and may carry more information than is necessary. Furthermore, rewards and demonstrations are often defined and collected before training begins, when the human is most uncertain about what information would help the agent. In contrast, when humans communicate objectives with each other, they make use of a large vocabulary of informative behaviors, including non-verbal communication, and often communicate throughout learning, responding to observed behavior. In this way, humans communicate intent with minimal effort. In this paper, we propose such interactive learning as an alternative to reward or demonstration-driven learning. To accomplish this, we introduce a multi-agent training framework that enables an agent to learn from another agent who knows the current task. Through a series of experiments, we demonstrate the emergence of a variety of interactive learning behaviors, including information-sharing, information-seeking, and question-answering. Most importantly, we find that our approach produces an agent that is capable of learning interactively from a human user, without a set of explicit demonstrations or a reward function, and achieving significantly better performance cooperatively with a human than a human performing the task alone.
Learning World Graphs to Accelerate Hierarchical Reinforcement Learning
Shang, Wenling, Trott, Alex, Zheng, Stephan, Xiong, Caiming, Socher, Richard
In many real-world scenarios, an autonomous agent often encounters various tasks within a single complex environment. We propose to build a graph abstraction over the environment structure to accelerate the learning of these tasks. Here, nodes are important points of interest (pivotal states) and edges represent feasible traversals between them. Our approach has two stages. First, we jointly train a latent pivotal state model and a curiosity-driven goal-conditioned policy in a task-agnostic manner. Second, provided with the information from the world graph, a high-level Manager quickly finds solution to new tasks and expresses subgoals in reference to pivotal states to a low-level Worker. The Worker can then also leverage the graph to easily traverse to the pivotal states of interest, even across long distance, and explore non-locally. We perform a thorough ablation study to evaluate our approach on a suite of challenging maze tasks, demonstrating significant advantages from the proposed framework over baselines that lack world graph knowledge in terms of performance and efficiency.
Training an Interactive Helper
Woodward, Mark, Finn, Chelsea, Hausman, Karol
Developing agents that can quickly adapt their behavior to new tasks remains a challenge. Meta-learning has been applied to this problem, but previous methods require either specifying a reward function which can be tedious or providing demonstrations which can be inefficient. In this paper, we investigate if, and how, a "helper" agent can be trained to interactively adapt their behavior to maximize the reward of another agent, whom we call the "prime" agent, without observing their reward or receiving explicit demonstrations. To this end, we propose to meta-learn a helper agent along with a prime agent, who, during training, observes the reward function and serves as a surrogate for a human prime. We introduce a distribution of multi-agent cooperative foraging tasks, in which only the prime agent knows the objects that should be collected. We demonstrate that, from the emerged physical communication, the trained helper rapidly infers and collects the correct objects.
Collaboration of AI Agents via Cooperative Multi-Agent Deep Reinforcement Learning
Balachandar, Niranjan, Dieter, Justin, Ramachandran, Govardana Sachithanandam
There are many AI tasks involving multiple interacting agents where agents should learn to cooperate and collaborate to effectively perform the task. Here we develop and evaluate various multi-agent protocols to train agents to collaborate with teammates in grid soccer. We train and evaluate our multi-agent methods against a team operating with a smart hand-coded policy. As a baseline, we train agents concurrently and independently, with no communication. Our collaborative protocols were parameter sharing, coordinated learning with communication, and counterfactual policy gradients. Against the hand-coded team, the team trained with parameter sharing and the team trained with coordinated learning performed the best, scoring on 89.5% and 94.5% of episodes respectively when playing against the hand-coded team. Against the parameter sharing team, with adversarial training the coordinated learning team scored on 75% of the episodes, indicating it is the most adaptable of our methods. The insights gained from our work can be applied to other domains where multi-agent collaboration could be beneficial.
FVA: Modeling Perceived Friendliness of Virtual Agents Using Movement Characteristics
Randhavane, Tanmay, Bera, Aniket, Kapsaskis, Kyra, Gray, Kurt, Manocha, Dinesh
We present a new approach for improving the friendliness and warmth of a virtual agent in an AR environment by generating appropriate movement characteristics. Our algorithm is based on a novel data-driven friendliness model that is computed using a user-study and psychological characteristics. We use our model to control the movements corresponding to the gaits, gestures, and gazing of friendly virtual agents (FVAs) as they interact with the user's avatar and other agents in the environment. We have integrated FVA agents with an AR environment using with a Microsoft HoloLens. Our algorithm can generate plausible movements at interactive rates to increase the social presence. We also investigate the perception of a user in an AR setting and observe that an FVA has a statistically significant improvement in terms of the perceived friendliness and social presence of a user compared to an agent without the friendliness modeling. We observe an increment of 5.71% in the mean responses to a friendliness measure and an improvement of 4.03% in the mean responses to a social presence measure.