Goto

Collaborating Authors

 Agents


Scalable Model-based Policy Optimization for Decentralized Networked Systems

arXiv.org Artificial Intelligence

Reinforcement learning algorithms require a large amount of samples; this often limits their real-world applications on even simple tasks. Such a challenge is more outstanding in multi-agent tasks, as each step of operation is more costly requiring communications or shifting or resources. This work aims to improve data efficiency of multi-agent control by model-based learning. We consider networked systems where agents are cooperative and communicate only locally with their neighbors, and propose the decentralized model-based policy optimization framework (DMPO). In our method, each agent learns a dynamic model to predict future states and broadcast their predictions by communication, and then the policies are trained under the model rollouts. To alleviate the bias of model-generated data, we restrain the model usage for generating myopic rollouts, thus reducing the compounding error of model generation. To pertain the independence of policy update, we introduce extended value function and theoretically prove that the resulting policy gradient is a close approximation to true policy gradients. We evaluate our algorithm on several benchmarks for intelligent transportation systems, which are connected autonomous vehicle control tasks (Flow and CACC) and adaptive traffic signal control (ATSC). Empirically results show that our method achieves superior data efficiency and matches the performance of model-free methods using true models.


A Technique to Create Weaker Abstract Board Game Agents via Reinforcement Learning

arXiv.org Artificial Intelligence

Board games, with the exception of solo games, need at least one other player to play. Because of this, we created Artificial Intelligent (AI) agents to play against us when an opponent is missing. These AI agents are created in a number of ways, but one challenge with these agents is that an agent can have superior ability compared to us. In this work, we describe how to create weaker AI agents that play board games. We use Tic-Tac-Toe, Nine-Men's Morris, and Mancala, and our technique uses a Reinforcement Learning model where an agent uses the Q-learning algorithm to learn these games. We show how these agents can learn to play the board game perfectly, and we then describe our approach to making weaker versions of these agents. Finally, we provide a methodology to compare AI agents.


Artificial Intelligence, Critical Systems, and the Control Problem - HS Today

#artificialintelligence

Artificial Intelligence (AI) is transforming our way of life from new forms of social organization and scientific discovery to defense and intelligence. This explosive progress is especially apparent in the subfield of machine learning (ML), where AI systems learn autonomously by identifying patterns in large volumes of data.[1] Indeed, over the last five years, the fields of AI and ML have witnessed stunning advancements in computer vision (e.g., object recognition), speech recognition, and scientific discovery.[2], Experts are increasingly voicing concerns over AI risk from misuse by state and non-state actors, principally in the areas of cybersecurity and disinformation propagation. However, issues of control – for example, how advanced AI decision-making aligns with human goals – are not as prominent in the discussion of risk and could ultimately be equally or more dangerous than threats from nefarious actors.


Are we measuring trust correctly in explainability, interpretability, and transparency research?

arXiv.org Artificial Intelligence

This paper presents an argument for why we are not measuring trust sufficiently in explainability, interpretability, and transparency research. Most studies ask participants to complete a trust scale to rate their trust of a model that has been explained/interpreted. If the trust is increased, we consider this a positive. However, there are two issues with this. First, we usually have no way of knowing whether participants should trust the model. Trust should surely decrease if a model is of poor quality. Second, these scales measure perceived trust rather than demonstrated trust. This paper showcases three methods that do a good job at measuring perceived and demonstrated trust. It is intended to be starting point for discussion on this topic, rather than to be the final say. The author invites critique and discussion.


Correct-by-Construction Runtime Enforcement in AI -- A Survey

arXiv.org Artificial Intelligence

Runtime enforcement refers to the theories, techniques, and tools for enforcing correct behavior with respect to a formal specification of systems at runtime. In this paper, we are interested in techniques for constructing runtime enforcers for the concrete application domain of enforcing safety in AI. We discuss how safety is traditionally handled in the field of AI and how more formal guarantees on the safety of a self-learning agent can be given by integrating a runtime enforcer. We survey a selection of work on such enforcers, where we distinguish between approaches for discrete and continuous action spaces. The purpose of this paper is to foster a better understanding of advantages and limitations of different enforcement techniques, focusing on the specific challenges that arise due to their application in AI. Finally, we present some open challenges and avenues for future work.


A further exploration of deep Multi-Agent Reinforcement Learning with Hybrid Action Space

arXiv.org Artificial Intelligence

The research of extending deep reinforcement learning (drl) to multi-agent field has solved many complicated problems and made great achievements. However, almost all these studies only focus on discrete or continuous action space and there are few works having ever used multi-agent deep reinforcement learning to real-world environment problems which mostly have a hybrid action space. Therefore, in this paper, we propose two algorithms: deep multi-agent hybrid soft actor-critic (MAHSAC) and multi-agent hybrid deep deterministic policy gradients (MAHDDPG) to fill this gap. This two algorithms follow the centralized training and decentralized execution (CTDE) paradigm and could handle hybrid action space problems. Our experiences are running on multi-agent particle environment which is an easy multi-agent particle world, along with some basic simulated physics. The experimental results show that these algorithms have good performances.


Beyond Greedy Search: Tracking by Multi-Agent Reinforcement Learning-based Beam Search

arXiv.org Artificial Intelligence

To track the target in a video, current visual trackers usually adopt greedy search for target object localization in each frame, that is, the candidate region with the maximum response score will be selected as the tracking result of each frame. However, we found that this may be not an optimal choice, especially when encountering challenging tracking scenarios such as heavy occlusion and fast motion. To address this issue, we propose to maintain multiple tracking trajectories and apply beam search strategy for visual tracking, so that the trajectory with fewer accumulated errors can be identified. Accordingly, this paper introduces a novel multi-agent reinforcement learning based beam search tracking strategy, termed BeamTracking. It is mainly inspired by the image captioning task, which takes an image as input and generates diverse descriptions using beam search algorithm. Accordingly, we formulate the tracking as a sample selection problem fulfilled by multiple parallel decision-making processes, each of which aims at picking out one sample as their tracking result in each frame. Each maintained trajectory is associated with an agent to perform the decision-making and determine what actions should be taken to update related information. When all the frames are processed, we select the trajectory with the maximum accumulated score as the tracking result. Extensive experiments on seven popular tracking benchmark datasets validated the effectiveness of the proposed algorithm.


Learning Equilibria in Mean-Field Games: Introducing Mean-Field PSRO

arXiv.org Artificial Intelligence

Recent advances in multiagent learning have seen the introduction ofa family of algorithms that revolve around the population-based trainingmethod PSRO, showing convergence to Nash, correlated and coarse corre-lated equilibria. Notably, when the number of agents increases, learningbest-responses becomes exponentially more difficult, and as such ham-pers PSRO training methods. The paradigm of mean-field games pro-vides an asymptotic solution to this problem when the considered gamesare anonymous-symmetric. Unfortunately, the mean-field approximationintroduces non-linearities which prevent a straightforward adaptation ofPSRO. Building upon optimization and adversarial regret minimization,this paper sidesteps this issue and introduces mean-field PSRO, an adap-tation of PSRO which learns Nash, coarse correlated and correlated equi-libria in mean-field games. The key is to replace the exact distributioncomputation step by newly-defined mean-field no-adversarial-regret learn-ers, or by black-box optimization. We compare the asymptotic complexityof the approach to standard PSRO, greatly improve empirical bandit con-vergence speed by compressing temporal mixture weights, and ensure itis theoretically robust to payoff noise. Finally, we illustrate the speed andaccuracy of mean-field PSRO on several mean-field games, demonstratingconvergence to strong and weak equilibria.


Partition-Tolerant and Byzantine-Tolerant Decision-Making for Distributed Robotic Systems with IOTA and ROS 2

arXiv.org Artificial Intelligence

With the increasing ubiquity of autonomous robotic solutions, the interest in their connectivity and in the cooperation within multi-robot systems is rising. Two aspects that are a matter of current research are robot security and secure multi-robot collaboration robust to byzantine agents. Blockchain and other distributed ledger technologies (DLTs) have been proposed to address the challenges in both domains. Nonetheless, some key challenges include scalability and deployment within real-world networks. This paper presents an approach to integrating IOTA and ROS 2 for more scalable DLT-based robotic systems while allowing for network partition tolerance after deployment. This is, to the best of our knowledge, the first implementation of IOTA smart contracts for robotic systems, and the first integrated design with ROS 2. This is in comparison to the vast majority of the literature which relies on Ethereum. We present a general IOTA+ROS 2 architecture leading to partition-tolerant decision-making processes that also inherit byzantine tolerance properties from the embedded blockchain structures. We demonstrate the effectiveness of the proposed framework for a cooperative mapping application in a system with intermittent network connectivity. We show both superior performance with respect to Ethereum in the presence of network partitions, and a low impact in terms of computational resource utilization. These results open the path for wider integration of blockchain solutions in distributed robotic systems with less stringent connectivity and computational requirements.


Effective Integration of Weighted Cost-to-go and Conflict Heuristic within Suboptimal CBS

arXiv.org Artificial Intelligence

Conflict-Based Search (CBS) is a popular multi-agent path finding (MAPF) solver that employs a low-level single agent planner and a high-level constraint tree to resolve conflicts. The vast majority of modern MAPF solvers focus on improving CBS by reducing the size of this tree through various strategies with few methods modifying the low level planner. Typically low level planners in existing CBS methods use an unweighted cost-to-go heuristic, with suboptimal CBS methods also using a conflict heuristic to help the high level search. In this paper, we show that, contrary to prevailing CBS beliefs, a weighted cost-to-go heuristic can be used effectively alongside the conflict heuristic in two possible variants. In particular, one of these variants can obtain large speedups, 2-100x, across several scenarios and suboptimal CBS methods. Importantly, we discover that performance is related not to the weighted cost-to-go heuristic but rather to the relative conflict heuristic weight's ability to effectively balance low-level and high-level work, implying that existing suboptimal CBS work misses this subtlety. Additionally, to the best of our knowledge, we show the first theoretical relation of prioritized planning and bounded suboptimal CBS and demonstrate that our methods are their natural generalization.