AITopics | Undirected Networks

Collaborating Authors

Undirected Networks

News Overviews Instructional Materials AI-Alerts Classics

Towards Multi-Agent Reinforcement Learning using Quantum Boltzmann Machines

Müller, Tobias, Roch, Christoph, Schmid, Kyrill, Altmann, Philipp

arXiv.org Artificial IntelligenceSep-22-2021

Reinforcement learning has driven impressive advances in machine learning. Simultaneously, quantum-enhanced machine learning algorithms using quantum annealing underlie heavy developments. Recently, a multi-agent reinforcement learning (MARL) architecture combining both paradigms has been proposed. This novel algorithm, which utilizes Quantum Boltzmann Machines (QBMs) for Q-value approximation has outperformed regular deep reinforcement learning in terms of time-steps needed to converge. However, this algorithm was restricted to single-agent and small 2x2 multi-agent grid domains. In this work, we propose an extension to the original concept in order to solve more challenging problems. Similar to classic DQNs, we add an experience replay buffer and use different networks for approximating the target and policy values. The experimental results show that learning becomes more stable and enables agents to find optimal policies in grid-domains with higher complexity. Additionally, we assess how parameter sharing influences the agents behavior in multi-agent domains. Quantum sampling proves to be a promising method for reinforcement learning tasks, but is currently limited by the QPU size and therefore by the size of the input and Boltzmann machine.

agent, architecture, reinforcement, (14 more...)

arXiv.org Artificial Intelligence

2109.109

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > Promising Solution (0.68)

Industry:

Health & Medicine (0.68)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)

Add feedback

A Reinforcement Learning Benchmark for Autonomous Driving in Intersection Scenarios

Liu, Yuqi, Zhang, Qichao, Zhao, Dongbin

arXiv.org Artificial IntelligenceSep-22-2021

In recent years, control under urban intersection scenarios becomes an emerging research topic. In such scenarios, the autonomous vehicle confronts complicated situations since it must deal with the interaction with social vehicles timely while obeying the traffic rules. Generally, the autonomous vehicle is supposed to avoid collisions while pursuing better efficiency. The existing work fails to provide a framework that emphasizes the integrity of the scenarios while being able to deploy and test reinforcement learning(RL) methods. Specifically, we propose a benchmark for training and testing RL-based autonomous driving agents in complex intersection scenarios, which is called RL-CIS. Then, a set of baselines are deployed consists of various algorithms. The test benchmark and baselines are to provide a fair and comprehensive training and testing platform for the study of RL for autonomous driving in the intersection scenario, advancing the progress of RL-based methods for intersection autonomous driving control. The code of our proposed framework can be found at https://github.com/liuyuqi123/ComplexUrbanScenarios.

scenario, social vehicle, vehicle, (16 more...)

arXiv.org Artificial Intelligence

2109.10557

Country: Asia > China > Beijing > Beijing (0.05)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Deep Learning: Recurrent Neural Networks in Python

#artificialintelligenceSep-21-2021, 01:42:51 GMT

The Recurrent Neural Network (RNN) has been used to obtain state-of-the-art results in sequence modeling. This includes time series analysis, forecasting and natural language processing (NLP). Learn about why RNNs beat old-school machine learning algorithms like Hidden Markov Models. The basics of machine learning and neurons (just a review to get you warmed up!) Neural networks for classification and regression (just a review to get you warmed up!) How to predict stock prices and stock returns with LSTMs in Tensorflow 2 (hint: it's not what you think!) All of the materials required for this course can be downloaded and installed for FREE.

deep learning, recurrent neural network, tensorflow 2, (6 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)

Add feedback

Distributed Mission Planning of Complex Tasks for Heterogeneous Multi-Robot Teams

Ferreira, Barbara Arbanas, Petrović, Tamara, Bogdan, Stjepan

arXiv.org Artificial IntelligenceSep-21-2021

In this paper, we propose a distributed multi-stage optimization method for planning complex missions for heterogeneous multi-robot teams. This class of problems involves tasks that can be executed in different ways and are associated with cross-schedule dependencies that constrain the schedules of the different robots in the system. The proposed approach involves a multi-objective heuristic search of the mission, represented as a hierarchical tree that defines the mission goal. This procedure outputs several favorable ways to fulfill the mission, which directly feed into the next stage of the method. We propose a distributed metaheuristic based on evolutionary computation to allocate tasks and generate schedules for the set of chosen decompositions. The method is evaluated in a simulation setup of an automated greenhouse use case, where we demonstrate the method's ability to adapt the planning strategy depending on the available robots and the given optimization criteria.

procedure, robot, task allocation, (16 more...)

arXiv.org Artificial Intelligence

2109.10106

Country:

Europe > Germany > Baden-Württemberg > Freiburg (0.04)
Europe > Croatia > Zagreb County > Zagreb (0.04)
North America > United States (0.04)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment > Games (0.61)
Government > Military (0.51)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.94)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Generalization in Text-based Games via Hierarchical Reinforcement Learning

Xu, Yunqiu, Fang, Meng, Chen, Ling, Du, Yali, Zhang, Chengqi

arXiv.org Artificial IntelligenceSep-21-2021

Deep reinforcement learning provides a promising approach for text-based games in studying natural language communication between humans and artificial agents. However, the generalization still remains a big challenge as the agents depend critically on the complexity and variety of training tasks. In this paper, we address this problem by introducing a hierarchical framework built upon the knowledge graph-based RL agent. In the high level, a meta-policy is executed to decompose the whole game into a set of subtasks specified by textual goals, and select one of them based on the KG. Then a sub-policy in the low level is executed to conduct goal-conditioned reinforcement learning. We carry out experiments on games with various difficulty levels and show that the proposed method enjoys favorable generalizability.

bathroom, cookbook, onion, (15 more...)

arXiv.org Artificial Intelligence

2109.09968

Country:

Europe > Netherlands > North Brabant > Eindhoven (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Synthesizing Policies That Account For Human Execution Errors Caused By State-Aliasing In Markov Decision Processes

Gopalakrishnan, Sriram, Verma, Mudit, Kambhampati, Subbarao

arXiv.org Artificial IntelligenceSep-20-2021

When humans are given a policy to execute, there can be policy execution errors and deviations in execution if there is uncertainty in identifying a state. So an algorithm that computes a policy for a human to execute ought to consider these effects in its computations. An optimal MDP policy that is poorly executed (because of a human agent) maybe much worse than another policy that is executed with fewer errors. In this paper, we consider the problems of erroneous execution and execution delay when computing policies for a human agent that would act in a setting modeled by a Markov Decision Process. We present a framework to model the likelihood of policy execution errors and likelihood of non-policy actions like inaction (delays) due to state uncertainty. This is followed by a hill climbing algorithm to search for good policies that account for these errors. We then use the best policy found by hill climbing with a branch and bound algorithm to find the optimal policy. We show experimental results in a Gridworld domain and analyze the performance of the two algorithms. We also present human studies that verify if our assumptions on policy execution by humans under state-aliasing are reasonable.

likelihood, non-policy action, probability, (17 more...)

arXiv.org Artificial Intelligence

2109.07436

Country: North America > United States > Arizona > Maricopa County > Tempe (0.04)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.89)

Add feedback

Optimal Path Planning of Autonomous Marine Vehicles in Stochastic Dynamic Ocean Flows using a GPU-Accelerated Algorithm

Chowdhury, Rohit, Subramani, Deepak

arXiv.org Artificial IntelligenceSep-20-2021

Autonomous marine vehicles play an essential role in many ocean science and engineering applications. Planning time and energy optimal paths for these vehicles to navigate in stochastic dynamic ocean environments is essential to reduce operational costs. In some missions, they must also harvest solar, wind, or wave energy (modeled as a stochastic scalar field) and move in optimal paths that minimize net energy consumption. Markov Decision Processes (MDPs) provide a natural framework for sequential decision-making for robotic agents in such environments. However, building a realistic model and solving the modeled MDP becomes computationally expensive in large-scale real-time applications, warranting the need for parallel algorithms and efficient implementation. In the present work, we introduce an efficient end-to-end GPU-accelerated algorithm that (i) builds the MDP model (computing transition probabilities and expected one-step rewards); and (ii) solves the MDP to compute an optimal policy. We develop methodical and algorithmic solutions to overcome the limited global memory of GPUs by (i) using a dynamic reduced-order representation of the ocean flows, (ii) leveraging the sparse nature of the state transition probability matrix, (iii) introducing a neighbouring sub-grid concept and (iv) proving that it is sufficient to use only the stochastic scalar field's mean to compute the expected one-step rewards for missions involving energy harvesting from the environment; thereby saving memory and reducing the computational effort. We demonstrate the algorithm on a simulated stochastic dynamic environment and highlight that it builds the MDP model and computes the optimal policy 600-1000x faster than conventional CPU implementations, making it suitable for real-time use.

agent, algorithm, realization, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/JSYST.2019.2950627

2109.00857

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New Jersey (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
(4 more...)

Genre: Research Report (0.50)

Industry:

Transportation (1.00)
Energy > Renewable > Ocean Energy (0.54)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Reinforcement Learning for Finite-Horizon Restless Multi-Armed Multi-Action Bandits

Xiong, Guojun, Li, Jian, Singh, Rahul

arXiv.org Machine LearningSep-20-2021

We study a finite-horizon restless multi-armed bandit problem with multiple actions, dubbed R(MA)^2B. The state of each arm evolves according to a controlled Markov decision process (MDP), and the reward of pulling an arm depends on both the current state of the corresponding MDP and the action taken. The goal is to sequentially choose actions for arms so as to maximize the expected value of the cumulative rewards collected. Since finding the optimal policy is typically intractable, we propose a computationally appealing index policy which we call Occupancy-Measured-Reward Index Policy. Our policy is well-defined even if the underlying MDPs are not indexable. We prove that it is asymptotically optimal when the activation budget and number of arms are scaled up, while keeping their ratio as a constant. For the case when the system parameters are unknown, we develop a learning algorithm. Our learning algorithm uses the principle of optimism in the face of uncertainty and further uses a generative model in order to fully exploit the structure of Occupancy-Measured-Reward Index Policy. We call it the R(MA)^2B-UCB algorithm. As compared with the existing algorithms, R(MA)^2B-UCB performs close to an offline optimum policy, and also achieves a sub-linear regret with a low computational complexity. Experimental results show that R(MA)^2B-UCB outperforms the existing algorithms in both regret and run time.

algorithm, index policy, omr index policy, (13 more...)

arXiv.org Machine Learning

2109.09855

Country:

North America > United States > New York > Broome County > Binghamton (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Add feedback

EM Algorithm

#artificialintelligenceSep-19-2021, 00:10:59 GMT

EM (Expectation-Maximisation) Algorithm is the go to algorithm whenever we have to do parameter estimation with hidden variables, such as in hidden Markov Chains. For some reason, it is often poorly explained and students end up confused as to what exactly are we maximising in the E-step and M-steps. Here is my attempt at a (hopefully) clear and step by step explanation on exactly how EM Algorithm works.

algorithm

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.73)

Add feedback

Unsupervised Machine Learning Hidden Markov Models in Python

#artificialintelligenceSep-18-2021, 11:09:18 GMT

The Hidden Markov Model or HMM is all about learning sequences. A lot of the data that would be very useful for us to model is in sequences. Stock prices are sequences of prices. Language is a sequence of words. Credit scoring involves sequences of borrowing and repaying money, and we can use those sequences to predict whether or not you're going to default.

hidden markov model, sequence, student and professional interested, (12 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry: Banking & Finance (0.56)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback