Agents
Object-Oriented Dynamics Learning through Multi-Level Abstraction
Zhu, Guangxiang, Wang, Jianhao, Ren, Zhizhou, Zhang, Chongjie
Object-based approaches for learning action-conditioned dynamics has demonstrated promise for generalization and interpretability. However, existing approaches suffer from structural limitations and optimization difficulties for common environments with multiple dynamic objects. In this paper, we present a novel self-supervised learning framework, called Multi-level Abstraction Object-oriented Predictor (MAOP), which employs a three-level learning architecture that enables efficient object-based dynamics learning from raw visual observations. We also design a spatial-temporal relational reasoning mechanism for MAOP to support instance-level dynamics learning and handle partial observability. Our results show that MAOP significantly outperforms previous methods in terms of sample efficiency and generalization over novel environments for learning environment models. We also demonstrate that learned dynamics models enable efficient planning in unseen environments, comparable to true environment models. In addition, MAOP learns semantically and visually interpretable disentangled representations.
Learning 3D Navigation Protocols on Touch Interfaces with Cooperative Multi-Agent Reinforcement Learning
Debard, Quentin, Dibangoye, Jilles Steeve, Canu, Stรฉphane, Wolf, Christian
Using touch devices to navigate in virtual 3D environments such as computer assisted design (CAD) models or geographical information systems (GIS) is inherently difficult for humans, as the 3D operations have to be performed by the user on a 2D touch surface. This ill-posed problem is classically solved with a fixed and handcrafted interaction protocol, which must be learned by the user. We propose to automatically learn a new interaction protocol allowing to map a 2D user input to 3D actions in virtual environments using reinforcement learning (RL). A fundamental problem of RL methods is the vast amount of interactions often required, which are difficult to come by when humans are involved. To overcome this limitation, we make use of two collaborative agents. The first agent models the human by learning to perform the 2D finger trajectories. The second agent acts as the interaction protocol, interpreting and translating to 3D operations the 2D finger trajectories from the first agent. We restrict the learned 2D trajectories to be similar to a training set of collected human gestures by first performing state representation learning, prior to reinforcement learning. This state representation learning is addressed by projecting the gestures into a latent space learned by a variational auto encoder (VAE).
A Solution for Dynamic Spectrum Management in Mission-Critical UAV Networks
Shamsoshoara, Alireza, Khaledi, Mehrdad, Afghah, Fatemeh, Razi, Abolfazl, Ashdown, Jonathan, Turck, Kurt
In this paper, we study the problem of spectrum scarcity in a network of unmanned aerial vehicles (UAVs) during mission-critical applications such as disaster monitoring and public safety missions, where the pre-allocated spectrum is not sufficient to offer a high data transmission rate for real-time video-streaming. In such scenarios, the UAV network can lease part of the spectrum of a terrestrial licensed network in exchange for providing relaying service. In order to optimize the performance of the UAV network and prolong its lifetime, some of the UAVs will function as a relay for the primary network while the rest of the UAVs carry out their sensing tasks. Here, we propose a team reinforcement learning algorithm performed by the UAV's controller unit to determine the optimum allocation of sensing and relaying tasks among the UAVs as well as their relocation strategy at each time. We analyze the convergence of our algorithm and present simulation results to evaluate the system throughput in different scenarios.
Distributed Bandit Learning: How Much Communication is Needed to Achieve (Near) Optimal Regret
Wang, Yuanhao, Hu, Jiachen, Chen, Xiaoyu, Wang, Liwei
We study the communication complexity of distributed multi-armed bandits (MAB) and distributed linear bandits for regret minimization. We propose communication protocols that achieve near-optimal regret bounds and result in optimal speed-up under mild conditions. We measure the communication cost of protocols by the total number of communicated numbers. For multi-armed bandits, we give two protocols that require little communication cost, one is independent of the time horizon $ T $ and the other is independent of the number of arms $ K $. In particular, for a distributed $K$-armed bandit with $M$ agents, our protocols achieve near-optimal regret $O(\sqrt{MKT\log T})$ with $O\left(M\log T\right)$ and $O\left(MK\log M\right)$ communication cost respectively. We also propose two protocols for $d$-dimensional distributed linear bandits that achieve near-optimal regret with $O(M^{1.5}d^3)$ and $O\left((Md+d\log\log d)\log T\right)$ communication cost respectively. The communication cost can be independent of $T$, or almost linear in $d$.
Interaction-aware Decision Making with Adaptive Strategies under Merging Scenarios
Hu, Yeping, Nakhaei, Alireza, Tomizuka, Masayoshi, Fujimura, Kikuo
In order to drive safely and efficiently under merging scenarios, autonomous vehicles should be aware of their surroundings and make decisions by interacting with other road participants. Moreover, different strategies should be made when the autonomous vehicle is interacting with drivers having different level of cooperativeness. Whether the vehicle is on the merge-lane or main-lane will also influence the driving maneuvers since drivers will behave differently when they have the right-of-way than otherwise. Many traditional methods have been proposed to solve decision making problems under merging scenarios. However, these works either are incapable of modeling complicated interactions or require implementing hand-designed rules which cannot properly handle the uncertainties in real-world scenarios. In this paper, we proposed an interaction-aware decision making with adaptive strategies (IDAS) approach that can let the autonomous vehicle negotiate the road with other drivers by leveraging their cooperativeness under merging scenarios. A single policy is learned under the multi-agent reinforcement learning (MARL) setting via the curriculum learning strategy, which enables the agent to automatically infer other drivers' various behaviors and make decisions strategically. A masking mechanism is also proposed to prevent the agent from exploring states that violate common sense of human judgment and increase the learning efficiency. An exemplar merging scenario was used to implement and examine the proposed method.
Two Body Problem: Collaborative Visual Task Completion
Jain, Unnat, Weihs, Luca, Kolve, Eric, Rastegari, Mohammad, Lazebnik, Svetlana, Farhadi, Ali, Schwing, Alexander, Kembhavi, Aniruddha
Collaboration is a necessary skill to perform tasks that are beyond one agent's capabilities. Addressed extensively in both conventional and modern AI, multi-agent collaboration has often been studied in the context of simple grid worlds. We argue that there are inherently visual aspects to collaboration which should be studied in visually rich environments. A key element in collaboration is communication that can be either explicit, through messages, or implicit, through perception of the other agents and the visual world. Learning to collaborate in a visual environment entails learning (1) to perform the task, (2) when and what to communicate, and (3) how to act based on these communications and the perception of the visual world. In this paper we study the problem of learning to collaborate directly from pixels in AI2-THOR and demonstrate the benefits of explicit and implicit modes of communication to perform visual tasks. Refer to our project page for more details: https://prior.allenai.org/projects/two-body-problem
Is Two Better than One? Effects of Multiple Agents on User Persuasion
Kantharaju, Reshmashree B., De Franco, Dominic, Pease, Alison, Pelachaud, Catherine
Virtual humans need to be persuasive in order to promote behaviour change in human users. While several studies have focused on understanding the numerous aspects that influence the degree of persuasion, most of them are limited to dyadic interactions. In this paper, we present an evaluation study focused on understanding the effects of multiple agents on user's persuasion. Along with gender and status (authoritative & peer), we also look at type of focus employed by the agent i.e., user-directed where the agent aims to persuade by addressing the user directly and vicarious where the agent aims to persuade the user, who is an observer, indirectly by engaging another agent in the discussion. Participants were randomly assigned to one of the 12 conditions and presented with a persuasive message by one or several virtual agents. A questionnaire was used to measure perceived interpersonal attitude, credibility and persuasion. Results indicate that credibility positively affects persuasion. In general, multiple agent setting, irrespective of the focus, was more persuasive than single agent setting. Although, participants favored user-directed setting and reported it to be persuasive and had an increased level of trust in the agents, the actual change in persuasion score reflects that vicarious setting was the most effective in inducing behaviour change. In addition to this, the study also revealed that authoritative agents were the most persuasive.
Learning Trajectory Prediction with Continuous Inverse Optimal Control via Langevin Sampling of Energy-Based Models
Xu, Yifei, Zhao, Tianyang, Baker, Chris, Zhao, Yibiao, Wu, Ying Nian
Autonomous driving is a challenging multiagent domain which requires optimizing complex, mixed cooperative-competitive interactions. Learning to predict contingent distributions over other vehicles' trajectories simplifies the problem, allowing approximate solutions by trajectory optimization with dynamic constraints. We take a model-based approach to prediction, in order to make use of structured prior knowledge of vehicle kinematics, and the assumption that other drivers plan trajectories to minimize an unknown cost function. We introduce a novel inverse optimal control (IOC) algorithm to learn other vehicles' cost functions in an energy-based generative model. Langevin Sampling, a Monte Carlo based sampling algorithm, is used to directly sample the control sequence. Our algorithm provides greater flexibility than standard IOC methods, and can learn higher-level, non-Markovian cost functions defined over entire trajectories. We extend weighted feature-based cost functions with neural networks to obtain NN-augmented cost functions, which combine the advantages of both model-based and model-free learning. Results show that model-based IOC can achieve state-of-the-art vehicle trajectory prediction accuracy, and naturally take scene information into account.
Artificial Intelligence I: Basics and Games in Java
This course is about the fundamental concepts of artificial intelligence. This topic is getting very hot nowadays because these learning algorithms can be used in several fields from software engineering to investment banking. Learning algorithms can recognize patterns which can help detecting cancer for example. We may construct algorithms that can have a very good guess about stock price movement in the market. In the first chapter we are going to talk about the basic graph algorithms.
Distributed stochastic optimization with gradient tracking over strongly-connected networks
Xin, Ran, Sahu, Anit Kumar, Khan, Usman A., Kar, Soummya
In this paper, we study distributed stochastic optimization to minimize a sum of smooth and strongly-convex local cost functions over a network of agents, communicating over a strongly-connected graph. Assuming that each agent has access to a stochastic first-order oracle ($\mathcal{SFO}$), we propose a novel distributed method, called $\mathcal{S}$-$\mathcal{AB}$, where each agent uses an auxiliary variable to asymptotically track the gradient of the global cost in expectation. The $\mathcal{S}$-$\mathcal{AB}$ algorithm employs row- and column-stochastic weights simultaneously to ensure both consensus and optimality. Since doubly-stochastic weights are not used, $\mathcal{S}$-$\mathcal{AB}$ is applicable to arbitrary strongly-connected graphs. We show that under a sufficiently small constant step-size, $\mathcal{S}$-$\mathcal{AB}$ converges linearly (in expected mean-square sense) to a neighborhood of the global minimizer. We present numerical simulations based on real-world data sets to illustrate the theoretical results.