AITopics

2110.01517

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)

Genre:

Research Report (1.00)
Workflow (0.67)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)

arXiv.org Artificial IntelligenceOct-1-2021

Divergence-Regularized Multi-Agent Actor-Critic

Su, Kefan, Lu, Zongqing

Entropy regularization is a popular method in reinforcement learning (RL). Although it has many advantages, it alters the RL objective and makes the converged policy deviate from the optimal policy of the original Markov Decision Process. Though divergence regularization has been proposed to settle this problem, it cannot be trivially applied to cooperative multi-agent reinforcement learning (MARL). In this paper, we investigate divergence regularization in cooperative MARL and propose a novel off-policy cooperative MARL framework, divergence-regularized multi-agent actor-critic (DMAC). Mathematically, we derive the update rule of DMAC which is naturally off-policy, guarantees a monotonic policy improvement and is not biased by the regularization. DMAC is a flexible framework and can be combined with many existing MARL algorithms. We evaluate DMAC in a didactic stochastic game and StarCraft Multi-Agent Challenge and empirically show that DMAC substantially improves the performance of existing MARL algorithms.

dmac, international conference, regularization, (14 more...)

2110.00304

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Emergence of Theory of Mind Collaboration in Multiagent Systems

Yuan, Luyao, Fu, Zipeng, Zhou, Linqi, Yang, Kexin, Zhu, Song-Chun

Currently, in the study of multiagent systems, the intentions of agents are usually ignored. Nonetheless, as pointed out by Theory of Mind (ToM), people regularly reason about other's mental states, including beliefs, goals, and intentions, to obtain performance advantage in competition, cooperation or coalition. However, due to its intrinsic recursion and intractable modeling of distribution over belief, integrating ToM in multiagent planning and decision making is still a challenge. In this paper, we incorporate ToM in multiagent partially observable Markov decision process (POMDP) and propose an adaptive training algorithm to develop effective collaboration between agents with ToM. We evaluate our algorithms with two games, where our algorithm surpasses all previous decentralized execution algorithms without modeling ToM.

agent, algorithm, ingredient, (15 more...)

2110.00121

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Decentralized Graph-Based Multi-Agent Reinforcement Learning Using Reward Machines

Hu, Jueming, Xu, Zhe, Wang, Weichang, Qu, Guannan, Pang, Yutian, Liu, Yongming

In multi-agent reinforcement learning (MARL), it is challenging for a collection of agents to learn complex temporally extended tasks. The difficulties lie in computational complexity and how to learn the high-level ideas behind reward functions. We study the graph-based Markov Decision Process (MDP) where the dynamics of neighboring agents are coupled. We use a reward machine (RM) to encode each agent's task and expose reward function internal structures. RM has the capacity to describe high-level knowledge and encode non-Markovian reward functions. We propose a decentralized learning algorithm to tackle computational complexity, called decentralized graph-based reinforcement learning using reward machines (DGRM), that equips each agent with a localized policy, allowing agents to make decisions independently, based on the information available to the agents. DGRM uses the actor-critic structure, and we introduce the tabular Q-function for discrete state problems. We show that the dependency of Q-function on other agents decreases exponentially as the distance between them increases. Furthermore, the complexity of DGRM is related to the local information size of the largest $\kappa$-hop neighborhood, and DGRM can find an $O(\rho^{\kappa+1})$-approximation of a stationary point of the objective function. To further improve efficiency, we also propose the deep DGRM algorithm, using deep neural networks to approximate the Q-function and policy function to solve large-scale or continuous state problems. The effectiveness of the proposed DGRM algorithm is evaluated by two case studies, UAV package delivery and COVID-19 pandemic mitigation. Experimental results show that local information is sufficient for DGRM and agents can accomplish complex tasks with the help of RM. DGRM improves the global accumulated reward by 119% compared to the baseline in the case of COVID-19 pandemic mitigation.

agent, algorithm, reward machine, (13 more...)

2110.00096

Country:

North America > United States > Maryland > Baltimore (0.04)
North America > United States > Arizona (0.04)
Europe > United Kingdom (0.04)
(16 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Health Care Providers & Services (0.68)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.55)
Health & Medicine > Therapeutic Area > Immunology (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Scalable Online Planning via Reinforcement Learning Fine-Tuning

Fickinger, Arnaud, Hu, Hengyuan, Amos, Brandon, Russell, Stuart, Brown, Noam

Lookahead search has been a critical component of recent AI successes, such as in the games of chess, go, and poker. However, the search methods used in these games, and in many other settings, are tabular. Tabular search methods do not scale well with the size of the search space, and this problem is exacerbated by stochasticity and partial observability. In this work we replace tabular search with online model-based fine-tuning of a policy neural network via reinforcement learning, and show that this approach outperforms state-of-the-art search algorithms in benchmark settings. In particular, we use our search algorithm to achieve a new state-of-the-art result in self-play Hanabi, and show the generality of our algorithm by also showing that it outperforms tabular search in the Atari game Ms. Pacman.

algorithm, blueprint, rl search, (14 more...)

2109.15316

Country:

North America > United States > New York > New York County > New York City (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New York > Richmond County > New York City (0.04)
(6 more...)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Games > Chess (0.68)
Leisure & Entertainment > Games > Computer Games (0.55)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Trumpp, Raphael, Bayerlein, Harald, Gesbert, David

Modeling Interactions of Autonomous Vehicles and Pedestrians with Deep Multi-Agent Reinforcement Learning for Collision Avoidance

Reliable pedestrian crash avoidance mitigation (PCAM) systems are crucial components of safe autonomous vehicles (AVs). The sequential nature of the vehicle-pedestrian interaction, i.e., where immediate decisions of one agent directly influence the following decisions of the other agent, is an often neglected but important aspect. In this work, we model the corresponding interaction sequence as a Markov decision process (MDP) that is solved by deep reinforcement learning (DRL) algorithms to define the PCAM system's policy. The simulated driving scenario is based on an AV acting as a DRL agent driving along an urban street, facing a pedestrian at an unmarked crosswalk who tries to cross. Since modeling realistic crossing behavior of the pedestrian is challenging, we introduce two levels of intelligent pedestrian behavior: While the baseline model follows a predefined strategy, our advanced model captures continuous learning and the inherent uncertainty in human behavior by defining the pedestrian as a second DRL agent, i.e., we introduce a deep multi-agent reinforcement learning (DMARL) problem. The presented PCAM system with different levels of intelligent pedestrian behavior is benchmarked according to the agents' collision rate and the resulting traffic flow efficiency. In this analysis, our focus lies on evaluating the influence of observation noise on the decision making of the agents. The results show that the AV is able to completely mitigate collisions under the majority of the investigated conditions and that the DRL-based pedestrian model indeed learns a more human-like crossing behavior.

agent, pcam system, pedestrian, (12 more...)

2109.15266

Country:

North America > United States > California (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > France > Provence-Alpes-Côte d'Azur (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Transportation > Ground > Road (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

#artificialintelligenceSep-29-2021, 20:27:48 GMT

Unsupervised Machine Learning Hidden Markov Models in Python

Created by Lazy Programmer Inc. Understand and enumerate the various applications of Markov Models and Hidden Markov Models Understand how Markov Models work Write a Markov Model in code Apply Markov Models to any sequence of data Understand the mathematics behind Markov chains Apply Markov models to language Apply Markov models to website analytics Understand how Google's PageRank works Understand Hidden Markov Models Write a Hidden Markov Model in Code Write a Hidden Markov Model using Theano Understand how gradient descent, which is normally used in deep learning, can be used for HMMs Learn how to create state of the art neural networks for deep learning with Facebook's PyTorch Deep Learning library!

hidden markov model, machine learning hidden markov model, markov model, (10 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Köhs, Lukas, Alt, Bastian, Koeppl, Heinz

Variational Inference for Continuous-Time Switching Dynamical Systems

arXiv.org Machine LearningSep-29-2021

Switching dynamical systems provide a powerful, interpretable modeling framework for inference in time-series data in, e.g., the natural sciences or engineering applications. Since many areas, such as biology or discrete-event systems, are naturally described in continuous time, we present a model based on an Markov jump process modulating a subordinated diffusion process. We provide the exact evolution equations for the prior and posterior marginal densities, the direct solutions of which are however computationally intractable. Therefore, we develop a new continuous-time variational inference algorithm, combining a Gaussian process approximation on the diffusion level with posterior inference for Markov jump processes. By minimizing the path-wise Kullback-Leibler divergence we obtain (i) Bayesian latent state estimates for arbitrary points on the real axis and (ii) point estimates of unknown system parameters, utilizing variational expectation maximization. We extensively evaluate our algorithm under the model assumption and for real-world examples.

differential equation, equation, hybrid process, (15 more...)

arXiv.org Machine Learning

2109.14492

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
North America > United States > California (0.04)
Africa > Comoros > Grande Comore > Moroni (0.04)

Genre:

Research Report (0.64)
Overview (0.46)

Industry:

Energy (0.67)
Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Sadeghi, Jonathan, Rogers, Blaine, Gunn, James, Saunders, Thomas, Samangooei, Sina, Dokania, Puneet Kumar, Redford, John

A Step Towards Efficient Evaluation of Complex Perception Tasks in Simulation

arXiv.org Artificial IntelligenceSep-28-2021

There has been increasing interest in characterising the error behaviour of systems which contain deep learning models before deploying them into any safety-critical scenario. However, characterising such behaviour usually requires large-scale testing of the model that can be extremely computationally expensive for complex real-world tasks. For example, tasks involving compute intensive object detectors as one of their components. In this work, we propose an approach that enables efficient large-scale testing using simplified low-fidelity simulators and without the computational cost of executing expensive deep learning models. Our approach relies on designing an efficient surrogate model corresponding to the compute intensive components of the task under test. We demonstrate the efficacy of our methodology by evaluating the performance of an autonomous driving task in the Carla simulator with reduced computational expense by training efficient surrogate models for PIXOR and CenterPoint LiDAR detectors, whilst demonstrating that the accuracy of the simulation is maintained.

detector, simulation, surrogate model, (16 more...)

2110.02739

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New York (0.04)
North America > United States > District of Columbia > Washington (0.04)

Genre: Research Report (0.82)

Industry:

Automobiles & Trucks (0.89)
Transportation > Ground > Road (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Kalagarla, Krishna C., Jain, Rahul, Nuzzo, Pierluigi

Model-Free Reinforcement Learning for Optimal Control of MarkovDecision Processes Under Signal Temporal Logic Specifications

arXiv.org Artificial IntelligenceSep-27-2021

We present a model-free reinforcement learning algorithm to find an optimal policy for a finite-horizon Markov decision process while guaranteeing a desired lower bound on the probability of satisfying a signal temporal logic (STL) specification. We propose a method to effectively augment the MDP state space to capture the required state history and express the STL objective as a reachability objective. The planning problem can then be formulated as a finite-horizon constrained Markov decision process (CMDP). For a general finite horizon CMDP problem with unknown transition probability, we develop a reinforcement learning scheme that can leverage any model-free RL algorithm to provide an approximately optimal policy out of the general space of non-stationary randomized policies. We illustrate the effectiveness of our approach in the context of robotic motion planning for complex missions under uncertainty and performance objectives.

algorithm, optimal policy, probability, (15 more...)

2109.13377

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Middle East > Republic of Türkiye > Aksaray Province > Aksaray (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.89)