Goto

Collaborating Authors

 Agents


Social Learning in Multi Agent Multi Armed Bandits

arXiv.org Machine Learning

In this paper, we introduce a distributed version of the classical stochastic Multi-Arm Bandit (MAB) problem. Our setting consists of a large number of agents $n$ that collaboratively and simultaneously solve the same instance of $K$ armed MAB to minimize the average cumulative regret over all agents. The agents can communicate and collaborate among each other \emph{only} through a pairwise asynchronous gossip based protocol that exchange a limited number of bits. In our model, agents at each point decide on (i) which arm to play, (ii) whether to, and if so (iii) what and whom to communicate with. Agents in our model are decentralized, namely their actions only depend on their observed history in the past. We develop a novel algorithm in which agents, whenever they choose, communicate only arm-ids and not samples, with another agent chosen uniformly and independently at random. The per-agent regret scaling achieved by our algorithm is $O \left( \frac{\lceil\frac{K}{n}\rceil+\log(n)}{\Delta} \log(T) + \frac{\log^3(n) \log \log(n)}{\Delta^2} \right)$. Furthermore, any agent in our algorithm communicates only a total of $\Theta(\log(T))$ times over a time interval of $T$. We compare our results to two benchmarks - one where there is no communication among agents and one corresponding to complete interaction. We show both theoretically and empirically, that our algorithm experiences a significant reduction both in per-agent regret when compared to the case when agents do not collaborate and in communication complexity when compared to the full interaction setting which requires $T$ communication attempts by an agent over $T$ arm pulls.


Defensive Escort Teams via Multi-Agent Deep Reinforcement Learning

arXiv.org Machine Learning

-- Coordinated defensive escorts can aid a navigating payload by positioning themselves in order to maintain the safety of the payload from obstacles. In this paper, we present a novel, end-to-end solution for coordinating an escort team for protecting high-value payloads. Our solution employs deep reinforcement learning (RL) in order to train a team of escorts to maintain payload safety while navigating alongside the payload. This is done in a distributed fashion, relying only on limited range positional information of other escorts, the payload, and the obstacles. When compared to a state-of-art algorithm for obstacle avoidance, our solution with a single escort increases navigation success up to 31%. Additionally, escort teams increase success rate by up to 75% percent over escorts in static formations. We also show that this learned solution is general to several adaptations in the scenario including: a changing number of escorts in the team, changing obstacle density, and changes in payload conformation. Successful navigation in crowded scenarios often requires assuming a nonzero collision probability between the agent and stochastic obstacles [1]. This required assumption of risk is potentially frightening given the value of cargo that modern autonomous agents will be transporting, e.g., human life.


Toward a Computational Theory of Evidence-Based Reasoning for Instructable Cognitive Agents

arXiv.org Artificial Intelligence

Evidence-based reasoning is at the core of ma ny problem - solving and decision-making tasks in a wide variety of domains. Generalizing from the research and development of cognitive agents in several such domains, this paper presents progress toward a computational theory for the development of instructable cognitive agents for evide nce-based reasoning tasks. The paper also illustrates the application of this theory to the development of four prototype cognitive agents in domains that are critical to the government and the public sector . Two agents function as cognitive assistants, one in intelligence analysis, and the other in science education . The other two agents operate autonomously, one in cybersecurity and the other in intelligence, surveillance, and reconnaissance. The paper concludes with the directions of future research on th e proposed computational theory.


ICART 2020 12th International Conference on Agents and Artificial Intelligence (Valletta, Malta - February 22-24, 2020) - ResearchAndMarkets.com

#artificialintelligence

The "ICART 2020 12th International Conference on Agents and Artificial Intelligence" conference has been added to ResearchAndMarkets.com's offering. The purpose of the International Conference on Agents and Artificial Intelligence is to bring together researchers, engineers, and practitioners interested in the theory and applications in the areas of Agents and Artificial Intelligence. Two simultaneous related tracks will be held, covering both applications and current research work. One track focuses on Agents, Multi-Agent Systems and Software Platforms, Distributed Problem Solving and Distributed AI in general. The other track focuses mainly on Artificial Intelligence, Knowledge Representation, Planning, Learning, Scheduling, Perception Reactive AI Systems, and Evolutionary Computing and other topics related to Intelligent Systems and Computational Intelligence.


Distributed Attack-Robust Submodular Maximization for Multi-Robot Planning

arXiv.org Artificial Intelligence

We aim to guard swarm-robotics applications against denial-of-service (DoS) failures/attacks that result in withdrawals of robots. We focus on applications requiring the selection of actions for each robot, among a set of available ones, e.g., which trajectory to follow. Such applications are central in large-scale robotic/control applications, e.g., multi-robot motion planning for target tracking. But the current attack-robust algorithms are centralized, and scale quadratically with the problem size (e.g., number of robots). Thus, in this paper, we propose a general-purpose distributed algorithm towards robust optimization at scale, with local communications only. We name it distributed robust maximization (DRM). DRM proposes a divide-and-conquer approach that distributively partitions the problem among K cliques of robots. The cliques optimize in parallel, independently of each other. That way, DRM also offers significant computational speed-ups up to 1/K^2 the running time of its centralized counterparts. K depends on the robots' communication range, which is given as input to DRM. DRM also achieves a close-to-optimal performance, equal to the guaranteed performance of its centralized counterparts. We demonstrate DRM's performance in both Gazebo and MATLAB simulations, in scenarios of active target tracking with swarms of robots. We observe DRM achieves significant computational speed-ups (it is 3 to 4 orders faster) and, yet, nearly matches the tracking performance of its centralized counterparts.


Leverage AI to Create Autonomous Policies that Adapts without Human Intervention

#artificialintelligence

Policies are the foundation for any successful organization. Policies are the rules, or laws, of an organization. Heck, one could argue that an organization's culture is better defined by its policies than it is by the character of its leadership team. Unfortunately, the management, creation and execution of policies haven't changed much since the days of "time-and-motion studies". In many cases, policies are nothing more than a static list of what-if rules that govern what workers are to do in well-defined situations.


AI agent becomes top employee at Manpower France

#artificialintelligence

The French division of Manpower, one of the largest recruitment and staffing companies in the world, has deployed an AI agent developed by Sidetrade to help optimize credit management in its financial department. The software, called Aimie, has passed the probation with flying colors โ€“ after nine months of testing, effectiveness of recovery actions grew 12 percent โ€“ and is now handling thousands of accounts. "We started Aimie off with two customer portfolios for a period of two months," said Laurent Bueno, credit director of Manpower France. "Encouraged by the results, we ramped up our use of Aimie. Within four months, Aimie was managing nearly 60 percent of single-site customers, which represents over 5,000 accounts, and nearly 10,000 follow-up actions per month."


Biased Aggregation, Rollout, and Enhanced Policy Improvement for Reinforcement Learning

arXiv.org Artificial Intelligence

We propose a new aggregation framework for approximate dynamic programming, which provides a connection with rollout algorithms, approximate policy iteration, and other single and multistep lookahead methods. The central novel characteristic is the use of a bias function $V$ of the state, which biases the values of the aggregate cost function towards their correct levels. The classical aggregation framework is obtained when $V\equiv0$, but our scheme works best when $V$ is a known reasonably good approximation to the optimal cost function $J^*$. When $V$ is equal to the cost function $J_{\mu}$ of some known policy $\mu$ and there is only one aggregate state, our scheme is equivalent to the rollout algorithm based on $\mu$ (i.e., the result of a single policy improvement starting with the policy $\mu$). When $V=J_{\mu}$ and there are multiple aggregate states, our aggregation approach can be used as a more powerful form of improvement of $\mu$. Thus, when combined with an approximate policy evaluation scheme, our approach can form the basis for a new and enhanced form of approximate policy iteration. When $V$ is a generic bias function, our scheme is equivalent to approximation in value space with lookahead function equal to $V$ plus a local correction within each aggregate state. The local correction levels are obtained by solving a low-dimensional aggregate DP problem, yielding an arbitrarily close approximation to $J^*$, when the number of aggregate states is sufficiently large. Except for the bias function, the aggregate DP problem is similar to the one of the classical aggregation framework, and its algorithmic solution by simulation or other methods is nearly identical to one for classical aggregation, assuming values of $V$ are available when needed.


The Emergence of Cooperative and Competitive AI Agents - KDnuggets

#artificialintelligence

Collaboration and competition are two of the key pillars on the evolution of human societies and essential to our evolution as species. Billions of people inhabit our planet grouped in millions of communities, each with their own beliefs about politics, economics, religion, social justice, or sports. While those beliefs make each of us unique, they haven't prevented us from coming together to achieve amazing things. Those group efforts are typically guided by the cooperative and competitive dynamics between its members who constitute the foundation of collective intelligence. From this perspective, every area of human knowledge can be traced back to a collaborative and/or competitive dynamic in a specific community.


Towards Deployment of Robust AI Agents for Human-Machine Partnerships

arXiv.org Artificial Intelligence

We study the problem of designing AI agents that can robustly cooperate with people in human-machine partnerships. Our work is inspired by real-life scenarios in which an AI agent, e.g., a virtual assistant, has to cooperate with new users after its deployment. We model this problem via a parametric MDP framework where the parameters correspond to a user's type and characterize her behavior. In the test phase, the AI agent has to interact with a user of unknown type. Our approach to designing a robust AI agent relies on observing the user's actions to make inferences about the user's type and adapting its policy to facilitate efficient cooperation. We show that without being adaptive, an AI agent can end up performing arbitrarily bad in the test phase. We develop two algorithms for computing policies that automatically adapt to the user in the test phase. We demonstrate the effectiveness of our approach in solving a two-agent collaborative task.