Agents
An Option and Agent Selection Policy with Logarithmic Regret for Multi Agent Multi Armed Bandit Problems on Random Graphs
Pankayaraj, Pathmanathan, Maithripala, D. H. S.
Existing studies of the Multi Agent Multi Armed Bandit (MAMAB) problem, with the exception of a very few, consider the case where the agents observe their neighbors according to a static network graph. They also mostly rely on a running consensus for the estimation of the option rewards. Two of the exceptions consider a problem where agents observe instantaneous rewards and actions of their neighbors through an iid ER graph process based communication strategy. In this paper we propose a UCB based option allocation rule that guarantees logarithmic regret even if the graph depends on the history of choices made by the agents. The paper also proposes a novel communication strategy that significantly outperforms the iid ER graph based communication strategy. In both the ER graph and the dependent graph strategy, the regret is shown to depend on the connectivity of the graph in a particularly interesting way where there exists an optimal connectivity of the graph that is less than the full connectivity of the graph.
Challenges of Human-Aware AI Systems
From its inception, AI has had a rather ambivalent relationship to humans---swinging between their augmentation and replacement. Now, as AI technologies enter our everyday lives at an ever increasing pace, there is a greater need for AI systems to work synergistically with humans. To do this effectively, AI systems must pay more attention to aspects of intelligence that helped humans work with each other---including social intelligence. I will discuss the research challenges in designing such human-aware AI systems, including modeling the mental states of humans in the loop, recognizing their desires and intentions, providing proactive support, exhibiting explicable behavior, giving cogent explanations on demand, and engendering trust. I will survey the progress made so far on these challenges, and highlight some promising directions. I will also touch on the additional ethical quandaries that such systems pose. I will end by arguing that the quest for human-aware AI systems broadens the scope of AI enterprise, necessitates and facilitates true inter-disciplinary collaborations, and can go a long way towards increasing public acceptance of AI technologies.
Multiagent Rollout Algorithms and Reinforcement Learning
We consider finite and infinite horizon dynamic programming problems, where the control at each stage consists of several distinct decisions, each one made by one of several agents. We introduce an algorithm, whereby at every stage, each agent's decision is made by executing a local rollout algorithm that uses a base policy, together with some coordinating information from the other agents. The amount of local computation required at every stage by each agent is independent of the number of agents, while the amount of global computation (over all agents) grows linearly with the number of agents. By contrast, with the standard rollout algorithm, the amount of global computation grows exponentially with the number of agents. Despite the drastic reduction in required computation, we show that our algorithm has the fundamental cost improvement property of rollout: an improved performance relative to the base policy. We also explore related reinforcement learning and approximate policy iteration algorithms, and we discuss how this cost improvement property is affected when we attempt to improve further the method's computational efficiency through parallelization of the agents' computations.
Optimal Clustering from Noisy Binary Feedback
Ariu, Kaito, Ok, Jungseul, Proutiere, Alexandre, Yun, Se-Young
We study the problem of recovering clusters from binary user feedback. Items are grouped into initially unknown non-overlapping clusters. To recover these clusters, the learner sequentially presents to users a finite list of items together with a question with a binary answer selected from a fixed finite set. For each of these items, the user provides a random answer whose expectation is determined by the item cluster and the question and by an item-specific parameter characterizing the hardness of classifying the item. The objective is to devise an algorithm with a minimal cluster recovery error rate. We derive problem-specific information-theoretical lower bounds on the error rate satisfied by any algorithm, for both uniform and adaptive (list, question) selection strategies. For uniform selection, we present a simple algorithm built upon K-means whose performance almost matches the fundamental limits. For adaptive selection, we develop an adaptive algorithm that is inspired by the derivation of the information-theoretical error lower bounds, and in turn allocates the budget in an efficient way. The algorithm learns to select items hard to cluster and relevant questions more often. We compare numerically the performance of our algorithms with or without adaptive selection strategy, and illustrate the gain achieved by being adaptive. Our inference problems are motivated by the problem of solving large-scale labeling tasks with minimal effort put on the users. For example, in some of the recent CAPTCHA systems, users clicks (binary answers) can be used to efficiently label images, by optimally finding the best questions to present.
Visual Hide and Seek
Chen, Boyuan, Song, Shuran, Lipson, Hod, Vondrick, Carl
V ISUAL HIDE AND SEEK Boyuan Chen Columbia University Shuran Song Columbia University Hod Lipson Columbia University Carl V ondrick Columbia University A BSTRACT We train embodied agents to play Visual Hide and Seek where a prey must navigate in a simulated environment in order to avoid capture from a predator. We place a variety of obstacles in the environment for the prey to hide behind, and we only give the agents partial observations of their environment using an egocentric perspective. Although we train the model to play this game from scratch, experiments and visualizations suggest that the agent learns to predict its own visibility in the environment. Furthermore, we quantitatively analyze how agent weaknesses, such as slower speed, effect the learned policy. Our results suggest that, although agent weaknesses make the learning problem more challenging, they also cause more useful features to be learned. Our project website is available at: http://www.cs.columbia.edu/ We designed this game to mimic the typical dynamics between predator and prey. For example, we place a variety of obstacles inside the environment, which create occlusions that the agent can leverage to hide behind. We also only give the agents access to the first-person perspective of their three-dimensional environment. Consequently, this task is a substantial challenge for reinforcement learning because the state is both visual (pixel input) and partially observable (due to occlusions).
Federated Transfer Reinforcement Learning for Autonomous Driving
Liang, Xinle, Liu, Yang, Chen, Tianjian, Liu, Ming, Yang, Qiang
Xinle Liang 1, Y ang Liu 1, Tianjian Chen 1, Ming Liu 2 and Qiang Y ang 1 Abstract -- Reinforcement learning (RL) is widely used in autonomous driving tasks and training RL models typically involves in a multi-step process: pre-training RL models on simulators, uploading the pre-trained model to real-life robots, and fine-tuning the weight parameters on robot vehicles. This sequential process is extremely time-consuming and more importantly, knowledge from the fine-tuned model stays local and can not be reused or leveraged collaboratively. T o tackle this problem, we present an online federated RL transfer process for real-time knowledge extraction where all the participant agents make corresponding actions with the knowledge learned by others, even when they are acting in very different environments. T o validate the effectiveness of the proposed approach, we constructed a real-life collision avoidance system with Microsoft Airsim simulator and NVIDIA JetsonTX2 car agents, which cooperatively learn from scratch to avoid collisions in indoor environment with obstacle objects. We demonstrate that with the proposed framework, the simulator car agents can transfer knowledge to the RC cars in real-time, with 27% increase in the average distance with obstacles and 42% decrease in the collision counts. I. INTRODUCTION Recent Reinforcement Learning (RL) researches in autonomous robots have achieved significant performance improvement by employing distributed architecture for decentralized agents [1], [2], which is termed as Distributed Reinforcement Learning (DRL). However, most existing DRL frameworks consider only synchronous learning with a constant environment.
Investorideas.com Newswire - AI Stock News: GBT (OTCPINK: GTCH) Implementing New Approach within its Intelligent Agent
Newswire) GBT Technologies Inc. (OTCPINK: GTCH) ("GBT", or the "Company"), a company specializing in the development of Internet of Things (IoT) and Artificial Intelligence (AI) enabled networking and tracking technologies, including its GopherInsight wireless mesh network technology platform for both mobile and fixed solutions, announced that it is now implementing a new approach within its intelligent agent, recurrent relational reasoning (RRN). The new set of algorithms enables GBT's AI system to explicitly consider relations between objects (Static, moving), or abstract ideas. The RRN methodology will be implemented within Avant! AI within the next months, enabling it with logic analysis boost to handle vast information and data interpretation complexity. One of the key reasons for implementing this new method is to achieve outstanding image-based reasoning tasks for Avant!
On the Utility of Learning about Humans for Human-AI Coordination
Carroll, Micah, Shah, Rohin, Ho, Mark K., Griffiths, Thomas L., Seshia, Sanjit A., Abbeel, Pieter, Dragan, Anca
While we would like agents that can coordinate with humans, current algorithms such as self-play and population-based training create agents that can coordinate with themselves. Agents that assume their partner to be optimal or similar to them can converge to coordination protocols that fail to understand and be understood by humans. To demonstrate this, we introduce a simple environment that requires challenging coordination, based on the popular game Overcooked, and learn a simple model that mimics human play. We evaluate the performance of agents trained via self-play and population-based training. These agents perform very well when paired with themselves, but when paired with our human model, they are significantly worse than agents designed to play with the human model. An experiment with a planning algorithm yields the same conclusion, though only when the human-aware planner is given the exact human model that it is playing with. A user study with real humans shows this pattern as well, though less strongly. Qualitatively, we find that the gains come from having the agent adapt to the human's gameplay. Given this result, we suggest several approaches for designing agents that learn about humans in order to better coordinate with them. Code is available at https://github.com/HumanCompatibleAI/overcooked_ai.
Learning Everywhere: A Taxonomy for the Integration of Machine Learning and Simulations
We present a taxonomy of research on Machine Learning (ML) applied to enhance simulations together with a catalog of some activities. We cover eight patterns for the link of ML to the simulations or systems plus three algorithmic areas: particle dynamics, agent-based models and partial differential equations. The patterns are further divided into three action areas: Improving simulation with Configurations and Integration of Data, Learn Structure, Theory and Model for Simulation, and Learn to make Surrogates.
Deep Crowd-Flow Prediction in Built Environments
Sohn, Samuel S., Moon, Seonghyeon, Zhou, Honglu, Yoon, Sejong, Pavlovic, Vladimir, Kapadia, Mubbasir
Predicting the behavior of crowds in complex environments is a key requirement in a multitude of application areas, including crowd and disaster management, architectural design, and urban planning. Given a crowd's immediate state, current approaches simulate crowd movement to arrive at a future state. However, most applications require the ability to predict hundreds of possible simulation outcomes (e.g., under different environment and crowd situations) at real-time rates, for which these approaches are prohibitively expensive. In this paper, we propose an approach to instantly predict the long-term flow of crowds in arbitrarily large, realistic environments. Central to our approach is a novel CAGE representation consisting of Capacity, Agent, Goal, and Environment-oriented information, which efficiently encodes and decodes crowd scenarios into compact, fixed-size representations that are environmentally lossless. We present a framework to facilitate the accurate and efficient prediction of crowd flow in never-before-seen crowd scenarios. We conduct a series of experiments to evaluate the efficacy of our approach and showcase positive results.