AITopics

2002.0391

Country:

North America > United States > Ohio > Portage County > Kent (0.04)
North America > United States > New York (0.04)
North America > United States > Maryland > Baltimore (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.49)
Government (0.48)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.54)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.34)

Francos, Roee M., Bruckstein, Alfred M.

Search for Smart Evaders with Sweeping Agents

arXiv.org Artificial IntelligenceFeb-10-2020

Suppose that in a given planar circular region, there are some smart mobile evaders and we would like to find them using sweeping agents. We assume that the sweeping agents are in a line formation whose total length is 2r. We propose procedures for designing a sweeping process that ensures the successful completion of the task, thereby deriving conditions on the sweeping velocity of the linear formation and its path. Successful completion of the task means that evaders with a given limit on their velocity cannot escape the sweeping agents. A simpler task for the sweeping formation is the confinement of the evaders to their initial domain. The feasibility of completing these tasks depends on geometric and dynamic constraints that impose a lower bound on the velocity that the sweeper line formation must have. This critical velocity is derived to ensure the satisfaction of the confinement task. Increasing the velocity above the lower bound enables the agents to complete the search task as well. We present results on the total search time as a function of the sweeping velocity of the formation given the initial conditions on the size of the search region and the maximal velocity of the evaders.

agent, critical velocity, evader region, (14 more...)

doi: 10.1017/S0263574721000291

1905.04006

Country:

North America > United States > New York (0.04)
Atlantic Ocean > North Atlantic Ocean > English Channel (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

arXiv.org Machine LearningFeb-10-2020

Q-Learning for Mean-Field Controls

Gu, Haotian, Guo, Xin, Wei, Xiaoli, Xu, Renyuan

Multi-agent reinforcement learning (MARL) has been applied to many challenging problems including two-team computer games, autonomous drivings, and real-time biddings. Despite the empirical success, there is a conspicuous absence of theoretical study of different MARL algorithms: this is mainly due to the curse of dimensionality caused by the exponential growth of the joint state-action space as the number of agents increases. Mean-field controls (MFC) with infinitely many agents and deterministic flows, meanwhile, provide good approximations to $N$-agent collaborative games in terms of both game values and optimal strategies. In this paper, we study the collaborative MARL under an MFC approximation framework: we develop a model-free kernel-based Q-learning algorithm (CDD-Q) and show that its convergence rate and sample complexity are independent of the number of agents. Our empirical studies on MFC examples demonstrate strong performances of CDD-Q. Moreover, the CDD-Q algorithm can be applied to a general class of Markov decision problems (MDPs) with deterministic dynamics and continuous state-action space.

algorithm, assumption 3, theorem 3, (13 more...)

2002.04131

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > California > Santa Clara County > Stanford (0.04)
(2 more...)

Genre: Research Report (0.40)

Industry:

Information Technology (0.87)
Transportation > Ground > Road (0.87)
Leisure & Entertainment > Games (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)

arXiv.org Artificial IntelligenceFeb-7-2020

Student/Teacher Advising through Reward Augmentation

Reid, Cameron

Transfer learning is an important new subfield of multiagent reinforcement learning that aims to help an agent learn about a problem by using knowledge that it has gained solving another problem, or by using knowledge that is communicated to it by an agent who already knows the problem. This is useful when one wishes to change the architecture or learning algorithm of an agent (so that the new knowledge need not be built "from scratch"), when new agents are frequently introduced to the environment with no knowledge, or when an agent must adapt to similar but different problems. Great progress has been made in the agent-to-agent case using the Teacher/Student framework proposed by (Torrey and Taylor 2013). However, that approach requires that learning from a teacher be treated differently from learning in every other reinforcement learning context. In this paper, I propose a method which allows the teacher/student framework to be applied in a way that fits directly and naturally into the more general reinforcement learning framework by integrating the teacher feedback into the reward signal received by the learning agent. I show that this approach can significantly improve the rate of learning for an agent playing a one-player stochastic game; I give examples of potential pitfalls of the approach; and I propose further areas of research building on this framework.

agent, optimal action, reinforcement, (15 more...)

2002.02938

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Parker-Holder, Jack, Nguyen, Vu, Roberts, Stephen

One-Shot Bayes Opt with Probabilistic Population Based Training

arXiv.org Machine LearningFeb-6-2020

Selecting optimal hyperparameters is a key challenge in machine learning. An exciting recent result showed it is possible to learn high-performing hyperparameter schedules on the fly in a single training run through methods inspired by Evolutionary Algorithms. These approaches have been shown to increase performance across a wide variety of machine learning tasks, ranging from supervised (SL) to reinforcement learning (RL). However, since they remain primarily evolutionary, they act in a greedy fashion, thus require a combination of vast computational resources and carefully selected meta-parameters to effectively explore the hyperparameter space. To address these shortcomings we look to Bayesian Optimization (BO), where a Gaussian Process surrogate model is combined with an acquisition function to produce a principled mechanism to trade off exploration vs exploitation. Our approach, which we call Probabilistic Population-Based Training ($\mathrm{P2BT}$), is able to transfer sample efficiency of BO to the online setting, making it possible to achieve these traits in a single training run. We show that $\mathrm{P2BT}$ is able to achieve high performance with only a small population size, making it useful for all researchers regardless of their computational resources.

hyperparameter, optimization, p2bt, (15 more...)

2002.02518

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceFeb-6-2020

Transfer Heterogeneous Knowledge Among Peer-to-Peer Teammates: A Model Distillation Approach

Xue, Zeyue, Luo, Shuang, Wu, Chao, Zhou, Pan, Bian, Kaigui, Du, Wei

Peer-to-peer knowledge transfer in distributed environments has emerged as a promising method since it could accelerate learning and improve team-wide performance without relying on pre-trained teachers in deep reinforcement learning. However, for traditional peer-to-peer methods such as action advising, they have encountered difficulties in how to efficiently expressed knowledge and advice. As a result, we propose a brand new solution to reuse experiences and transfer value functions among multiple students via model distillation. But it is still challenging to transfer Q-function directly since it is unstable and not bounded. To address this issue confronted with existing works, we adopt Categorical Deep Q-Network. We also describe how to design an efficient communication protocol to exploit heterogeneous knowledge among multiple distributed agents. Our proposed framework, namely Learning and Teaching Categorical Reinforcement (LTCR), shows promising performance on stabilizing and accelerating learning progress with improved team-wide reward in four typical experimental environments.

agent, distillation, model distillation, (14 more...)

2002.02202

Country: North America > United States > Arkansas > Washington County > Fayetteville (0.04)

Genre: Research Report (0.84)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Ahmadi, Mohamadreza, Viswanathan, Arun A., Ingham, Michel D., Tan, Kymie, Ames, Aaron D.

Partially Observable Games for Secure Autonomy

arXiv.org Artificial IntelligenceFeb-5-2020

Technology development efforts in autonomy and cyber-defense have been evolving independently of each other, over the past decade. In this paper, we report our ongoing effort to integrate these two presently distinct areas into a single framework. To this end, we propose the two-player partially observable stochastic game formalism to capture both high-level autonomous mission planning under uncertainty and adversarial decision making subject to imperfect information. We show that synthesizing sub-optimal strategies for such games is possible under finite-memory assumptions for both the autonomous decision maker and the cyber-adversary. We then describe an experimental testbed to evaluate the efficacy of the proposed framework.

artificial intelligence, machine learning, posg, (18 more...)

2002.01969

Country:

North America > United States > California > Los Angeles County > Pasadena (0.05)
Europe > Netherlands > Gelderland > Nijmegen (0.04)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Government (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

Lin, Tianyi, Jin, Chi, Jordan, Michael. I.

Near-Optimal Algorithms for Minimax Optimization

arXiv.org Machine LearningFeb-5-2020

Current stateof-the-art first-order algorithms find an approximate Nash equilibrium using Õ(κ x κ y) [Tseng, 1995] or Õ(min{κ x κy, κ x κ y }) [Alkousa et al., 2019] gradient evaluations, where κ x and κ y are the condition numbers for the strong-convexity and strong-concavity assumptions. A gap remains between these results and the best existing lower bound Ω( κ x κ y) due to Zhang et al. [2019]. This paper presents the first algorithm with Õ( κ x κ y) gradient complexity, matching the lower bound up to logarithmic factors. Our new algorithm is designed based on an accelerated proximal point method and an accelerated solver for minimax proximal steps. It can be easily extended to the settings of strongly-convex-concave, convex-concave, nonconvex-strongly-concave, and nonconvexconcave functions. This paper also presents algorithms that match or outperform all existing methods in these settings in terms of gradient complexity, up to logarithmic factors.

algorithm, complexity, gradient evaluation, (14 more...)

2002.02417

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.62)

arXiv.org Machine LearningFeb-5-2020

Mutual Information-based State-Control for Intrinsically Motivated Reinforcement Learning

Zhao, Rui, Tresp, Volker, Xu, Wei

In reinforcement learning, an agent learns to reach a set of goals by means of an external reward signal. In the natural world, intelligent organisms learn from internal drives, bypassing the need for external signals, which is beneficial for a wide range of tasks. Motivated by this observation, we propose to formulate an intrinsic objective as the mutual information between the goal states and the controllable states. This objective encourages the agent to take control of its environment. Subsequently, we derive a surrogate objective of the proposed reward function, which can be optimized efficiently. Lastly, we evaluate the developed framework in different robotic manipulation and navigation tasks and demonstrate the efficacy of our approach. A video showing experimental results is available at \url{https://youtu.be/CT4CKMWBYz0}.

agent, equation, mutual information-based state-control, (11 more...)

2002.01963

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > California > Santa Clara County > Cupertino (0.04)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Ehsan, Upol, Riedl, Mark O.

Human-centered Explainable AI: Towards a Reflective Sociotechnical Approach

arXiv.org Artificial IntelligenceFeb-5-2020

Explanations--a form of post-hoc interpretability--play an instrumental role in making systems accessible as AI continues to proliferate complex and sensitive sociotechnical systems. In this paper, we introduce Human-centered Explainable AI (HCXAI) as an approach that puts the human at the center of technology design. It develops a holistic understanding of "who" the human is by considering the interplay of values, interpersonal dynamics, and the socially situated nature of AI systems. In particular, we advocate for a reflective sociotechnical approach. We illustrate HCXAI through a case study of an explanation system for nontechnical end-users that shows how technical advancements and the understanding of human factors co-evolve. Building on the case study, we lay out open research questions pertaining to further refining our understanding of "who" the human is and extending beyond 1-to-1 human-computer interactions. Finally, we propose that a reflective HCXAI paradigm--mediated through the perspective of Critical Technical Practice and supplemented with strategies from HCI, such as value-sensitive design and participatory design--not only helps us understand our intellectual blind spots, but it can also open up new design and research spaces.

case study, explanation, rationale, (15 more...)

2002.01092

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > California (0.04)

Genre:

Research Report > Experimental Study (0.48)
Research Report > New Finding (0.34)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)