AITopics

In this thesis, we draw inspiration from both classical system identification and modern machine learning in order to solve estimation problems for real-world, physical systems. The main approach to estimation and learning adopted is optimization based. Concepts such as regularization will be utilized for encoding of prior knowledge and basis-function expansions will be used to add nonlinear modeling power while keeping data requirements practical. The thesis covers a wide range of applications, many inspired by applications within robotics, but also extending outside this already wide field. Usage of the proposed methods and algorithms are in many cases illustrated in the real-world applications that motivated the research. Topics covered include dynamics modeling and estimation, model-based reinforcement learning, spectral estimation, friction modeling and state estimation and calibration in robotic machining. In the work on modeling and identification of dynamics, we develop regularization strategies that allow us to incorporate prior domain knowledge into flexible, overparameterized models. We make use of classical control theory to gain insight into training and regularization while using flexible tools from modern deep learning. A particular focus of the work is to allow use of modern methods in scenarios where gathering data is associated with a high cost. In the robotics-inspired parts of the thesis, we develop methods that are practically motivated and ensure that they are implementable also outside the research setting. We demonstrate this by performing experiments in realistic settings and providing open-source implementations of all proposed methods and algorithms.

deep learning, identification and regularization, neural network, (25 more...)

1906.02003

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.45)
Europe > Sweden (0.28)
North America > United States > Massachusetts > Middlesex County (0.14)
(9 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Education (0.67)
Government > Regional Government (0.45)
Energy > Oil & Gas (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(4 more...)

Khangura, Sukhpreet Kaur, Akın, Sami

Measurement-based Online Available Bandwidth Estimation employing Reinforcement Learning

An accurate and fast estimation of the available bandwidth in a network with varying cross-traffic is a challenging task. The accepted probing tools, based on the fluid-flow model of a bottleneck link with first-in, first-out multiplexing, estimate the available bandwidth by measuring packet dispersions. The estimation becomes more difficult if packet dispersions deviate from the assumptions of the fluid-flow model in the presence of non-fluid bursty cross-traffic, multiple bottleneck links, and inaccurate time-stamping. This motivates us to explore the use of machine learning tools for available bandwidth estimation. Hence, we consider reinforcement learning and implement the single-state multi-armed bandit technique, which follows the $\epsilon$-greedy algorithm to find the available bandwidth. Our measurements and tests reveal that our proposed method identifies the available bandwidth with high precision. Furthermore, our method converges to the available bandwidth under a variety of notoriously difficult conditions, such as heavy traffic burstiness, different cross-traffic intensities, multiple bottleneck links, and in networks where the tight link and the bottleneck link are not same. Compared to the piece-wise linear network a model-based direct probing technique that employs a Kalman filter, our method shows more accurate estimates and faster convergence in certain network scenarios and does not require measurement noise statistics.

artificial intelligence, available bandwidth, reinforcement learning, (15 more...)

1906.07095

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

arXiv.org Artificial IntelligenceJun-5-2019

Escaping the State of Nature: A Hobbesian Approach to Cooperation in Multi-agent Reinforcement Learning

Long, William

Cooperation is a phenomenon that has been widely studied across many different disciplines. In the field of computer science, the modularity and robustness of multi-agent systems offer significant practical advantages over individual machines. At the same time, agents using standard reinforcement learning algorithms often fail to achieve long-term, cooperative strategies in unstable environments when there are short-term incentives to defect. Political philosophy, on the other hand, studies the evolution of cooperation in humans who face similar incentives to act individualistically, but nevertheless succeed in forming societies. Thomas Hobbes in Leviathan provides the classic analysis of the transition from a pre-social State of Nature, where consistent defection results in a constant state of war, to stable political community through the institution of an absolute Sovereign. This thesis argues that Hobbes's natural and moral philosophy are strikingly applicable to artificially intelligent agents and aims to show that his political solutions are experimentally successful in producing cooperation among modified Q-Learning agents. Cooperative play is achieved in a novel Sequential Social Dilemma called the Civilization Game, which models the State of Nature by introducing the Hobbesian mechanisms of opponent learning awareness and majoritarian voting, leading to the establishment of a Sovereign.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1906.09874

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Games (1.00)
Law (1.00)
Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.67)

D'Arcy, Laura, Corcoran, Padraig, Preece, Alun

Deep Q-Learning for Directed Acyclic Graph Generation

We present a method to generate directed acyclic graphs (DAGs) using deep reinforcement learning, specifically deep Q-learning. Generating graphs with specified structures is an important and challenging task in various application fields, however most current graph generation methods produce graphs with undirected edges. We demonstrate that this method is capable of generating DAGs with topology and node types satisfying specified criteria in highly sparse reward environments.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

1906.0228

Country:

North America > United States (0.29)
Europe > United Kingdom > Wales > Cardiff (0.04)

Genre: Research Report (0.52)

Industry: Government (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Kamalaruban, Parameswaran, Devidze, Rati, Cevher, Volkan, Singla, Adish

Interactive Teaching Algorithms for Inverse Reinforcement Learning

arXiv.org Artificial IntelligenceJun-5-2019

We study the problem of inverse reinforcement learning (IRL) with the added twist that the learner is assisted by a helpful teacher. More formally, we tackle the following algorithmic question: How could a teacher provide an informative sequence of demonstrations to an IRL learner to speed up the learning process? We present an interactive teaching framework where a teacher adaptively chooses the next demonstration based on learner's current policy. In particular, we design teaching algorithms for two concrete settings: an omniscient setting where a teacher has full knowledge about the learner's dynamics and a blackbox setting where the teacher has minimal knowledge. Then, we study a sequential variant of the popular MCE-IRL learner and prove convergence guarantees of our teaching algorithm in the omniscient setting. Extensive experiments with a car driving simulator environment show that the learning progress can be speeded up drastically as compared to an uninformative teacher.

learner, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1905.11867

Genre: Research Report (0.82)

Industry: Education > Educational Setting > Online (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Böhmer, Wendelin, Rashid, Tabish, Whiteson, Shimon

Exploration with Unreliable Intrinsic Reward in Multi-Agent Reinforcement Learning

arXiv.org Artificial IntelligenceJun-5-2019

This paper investigates the use of intrinsic reward to guide exploration in multi-agent reinforcement learning. We discuss the challenges in applying intrinsic reward to multiple collaborative agents and demonstrate how unreliable reward can prevent decentralized agents from learning the optimal policy. We address this problem with a novel framework, Independent Centrally-assisted Q-learning (ICQL), in which decentralized agents share control and an experience replay buffer with a centralized agent. Only the centralized agent is intrinsically rewarded, but the decentralized agents still benefit from improved exploration, without the distraction of unreliable incentives.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

1906.02138

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.91)

Industry: Leisure & Entertainment > Games (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.88)

Kraev, Egor, Harley, Mark

Probabilistic hypergraph grammars for efficient molecular optimization

We present an approach to make molecular optimization more efficient. We infer a hypergraph replacement grammar from the ChEMBL database, count the frequencies of particular rules being used to expand particular nonterminals in other rules, and use these as conditional priors for the policy model. Simulating random molecules from the resulting probabilistic grammar, we show that conditional priors result in a molecular distribution closer to the training set than using equal rule probabilities or unconditional priors. We then treat molecular optimization as a reinforcement learning problem, using a novel modification of the policy gradient algorithm - batch-advantage: using individual rewards minus the batch average reward to weight the log probability loss. The reinforcement learning agent is tasked with building molecules using this grammar, with the goal of maximizing benchmark scores available from the literature. To do so, the agent has policies both to choose the next node in the graph to expand and to select the next grammar rule to apply. The policies are implemented using the Transformer architecture with the partially expanded graph as the input at each step. We show that using the empirical priors as the starting point for a policy eliminates the need for pre-training, and allows us to reach optima faster. We achieve competitive performance on common benchmarks from the literature, such as penalized logP and QED, with only hundreds of training steps on a budget GPU instance.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

1906.01845

Genre: Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Serrino, Jack, Kleiman-Weiner, Max, Parkes, David C., Tenenbaum, Joshua B.

Finding Friend and Foe in Multi-Agent Games

Recent breakthroughs in AI for multi-agent games like Go, Poker, and Dota, have seen great strides in recent years. Yet none of these games address the real-life challenge of cooperation in the presence of unknown and uncertain teammates. This challenge is a key game mechanism in hidden role games. Here we develop the DeepRole algorithm, a multi-agent reinforcement learning agent that we test on The Resistance: Avalon, the most popular hidden role game. DeepRole combines counterfactual regret minimization (CFR) with deep value networks trained through self-play. Our algorithm integrates deductive reasoning into vector-form CFR to reason about joint beliefs and deduce partially observable actions. We augment deep value networks with constraints that yield interpretable representations of win probabilities. These innovations enable DeepRole to scale to the full Avalon game. Empirical game-theoretic methods show that DeepRole outperforms other hand-crafted and learned agents in five-player Avalon. DeepRole played with and against human players on the web in hybrid human-agent teams. We find that DeepRole outperforms human players as both a cooperator and a competitor.

deeprole, machine learning, reinforcement learning, (20 more...)

1906.0233

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Continuous Control for Automated Lane Change Behavior Based on Deep Deterministic Policy Gradient Algorithm

Wang, Pin, Li, Hanhan, Chan, Ching-Yao

Lane change is a challenging task which requires delicate actions to ensure safety and comfort. Some recent studies have attempted to solve the lane-change control problem with Reinforcement Learning (RL), yet the action is confined to discrete action space. To overcome this limitation, we formulate the lane change behavior with continuous action in a model-free dynamic driving environment based on Deep Deterministic Policy Gradient (DDPG). The reward function, which is critical for learning the optimal policy, is defined by control values, position deviation status, and maneuvering time to provide the RL agent informative signals. The RL agent is trained from scratch without resorting to any prior knowledge of the environment and vehicle dynamics since they are not easy to obtain. Seven models under different hyperparameter settings are compared. A video showing the learning progress of the driving behavior is available. It demonstrates the RL vehicle agent initially runs out of road boundary frequently, but eventually has managed to smoothly and stably change to the target lane with a success rate of 100% under diverse driving situations in simulation.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

1906.02275

Country: North America > United States > California (0.46)

Genre: Research Report (1.00)

Industry: Transportation > Ground > Road (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Agazzi, Andrea, Lu, Jianfeng

Temporal-difference learning for nonlinear value function approximation in the lazy training regime

In recent years, deep reinforcement learning has pushed the boundaries of Artificial Intelligence to an unprecedented level, achieving what was expected to be possible only in a decade and outperforming human intelligence in a number of highly complex tasks. Paramount examples of this potential have appeared over the past few years, with such algorithms mastering games and tasks of increasing complexity, from playing Atari to learning to walk and beating world grandmasters at the game of Go [16, 23, 24, 31-33]. Such impressive success would be impossible without using neural networks to approximate value functions and / or policy functions in reinforcement learning algorithms. While neural networks, in particular deep neural networks, provide a powerful and versatile tool to approximate high dimensional functions [4, 12, 17], their intrinsic nonlinearity might also lead to trouble in training, in particular in the context of reinforcement learning. For example, it is well known that nonlinear approximation to value function might cause divergence in the classical temporal-difference learning due to instability [40].

artificial intelligence, machine learning, reinforcement learning, (18 more...)

1905.10917

Country: North America > United States (0.67)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Go (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.41)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)