AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

lexfridman/deeptraffic

#artificialintelligenceDec-30-2019, 04:45:51 GMT

DeepTraffic is a deep reinforcement learning competition hosted as part of the MIT Deep Learning courses. The goal is to create a neural network that drives a vehicle (or multiple vehicles) as fast as possible through dense highway traffic. Top 10 submissions are listed on the leaderboard and you'll be able to visualize your submission in the following way: To get started right away, this repository provides a code snippet to insert into the code box on the DeepTraffic site. We'll add additional agents as the course progresses: A basic network that achieves 66.8mph. And now let's return to the problem of traffic: "Americans will put up with anything provided it doesn't block traffic." - Dan Rather In the U.S. alone, we spend 6.9 billion hours sitting in traffic each year [1] -- roughly 10,000 human lifetimes [2]. Autonomous vehicles will be able to alleviate part (but not all) of the problem.

algorithm, deeptraffic, vehicle, (15 more...)

#artificialintelligence

Country: North America > United States (0.25)

Industry: Transportation > Ground > Road (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.59)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.56)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.38)

Add feedback

Information Theoretic Model Predictive Q-Learning

Bhardwaj, Mohak, Handa, Ankur, Fox, Dieter, Boots, Byron

arXiv.org Machine LearningDec-30-2019

Model-free Reinforcement Learning (RL) algorithms work well in sequential decision-making problems when experience can be collected cheaply and model-based RL is effective when system dynamics can be modeled accurately. However, both of these assumptions can be violated in real world problems such as robotics, where querying the system can be prohibitively expensive and real-world dynamics can be difficult to model accurately. Although sim-to-real approaches such as domain randomization attempt to mitigate the effects of biased simulation, they can still suffer from optimization challenges such as local minima and hand-designed distributions for randomization, making it difficult to learn an accurate global value function or policy that directly transfers to the real world. In contrast to RL, Model Predictive Control (MPC) algorithms use a simulator to optimize a simple policy class online, constructing a closed-loop controller that can effectively contend with real-world dynamics. MPC performance is usually limited by factors such as model bias and the limited horizon of optimization. In this work, we present a novel theoretical connection between information theoretic MPC and entropy regularized RL and develop a Q-learning algorithm that can leverage biased models. We validate the proposed algorithm on sim-to-sim control tasks to demonstrate the improvements over optimal control and reinforcement learning from scratch. Our approach paves the way for deploying reinforcement learning algorithms on real-robots in a systematic manner.

artificial intelligence, upstream oil & gas, value function, (17 more...)

arXiv.org Machine Learning

2001.02153

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment (0.93)
Education > Educational Setting > Online (0.54)
Energy > Oil & Gas > Upstream (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

World Programs for Model-Based Learning and Planning in Compositional State and Action Spaces

Segler, Marwin H. S.

arXiv.org Machine LearningDec-30-2019

Some of the most important tasks take place in environments which lack cheap and perfect simulators, thus hampering the application of model-free reinforcement learning (RL). While model-based RL aims to learn a dynamics model, in a more general case the learner does not know a priori what the action space is. Here we propose a formalism where the learner induces a world program by learning a dynamics model and the actions in graph-based compositional environments by observing state-state transition examples. Then, the learner can perform RL with the world program as the simulator for complex planning tasks. We highlight a recent application, and propose a challenge for the community to assess world program-based planning.

arxiv preprint arxiv, graph, model-based learning, (11 more...)

arXiv.org Machine Learning

1912.13007

Country: Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)

Add feedback

A New Framework for Query Efficient Active Imitation Learning

Hsu, Daniel

arXiv.org Artificial IntelligenceDec-30-2019

We seek to align agent policy with human expert behavior in a reinforcement learning (RL) setting, without any prior knowledge about dynamics, reward function, and unsafe states. There is a human expert knowing the rewards and unsafe states based on his preference and objective, but querying that human expert is expensive. To address this challenge, we propose a new framework for imitation learning (IL) algorithm that actively and interactively learns a model of the user's reward function with efficient queries. We build an adversarial generative model of states and a successor feature (SR) model trained over transition experience collected by learning policy. Our method uses these models to select state-action pairs, asking the user to comment on the optimality or safety, and trains a adversarial neural network to predict the rewards. Different from previous papers, which are almost all based on uncertainty sampling, the key idea is to actively and efficiently select state-action pairs from both on-policy and off-policy experience, by discriminating the queried (expert) and unqueried (generated) data and maximizing the efficiency of value function learning. We call this method adversarial reward query with successor representation. We evaluate the proposed method with simulated human on a state-based 2D navigation task, robotic control tasks and the image-based video games, which have high-dimensional observation and complex state dynamics. The results show that the proposed method significantly outperforms uncertainty-based methods on learning reward models, achieving better query efficiency, where the adversarial discriminator can make the agent learn human behavior more efficiently and the SR can select states which have stronger impact on value function. Moreover, the proposed method can also learn to avoid unsafe states when training the reward model.

learning, representation, state-action pair, (13 more...)

arXiv.org Artificial Intelligence

1912.13037

Country: North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Games (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Computational model discovery with reinforcement learning

Bassenne, Maxime, Lozano-Durán, Adrián

arXiv.org Machine LearningDec-29-2019

The motivation of this study is to leverage recent breakthroughs in artificial intelligence research to unlock novel solutions to important scientific problems encountered in computational science. To address the human intelligence limitations in discovering reduced-order models, we propose to supplement human thinking with artificial intelligence. Our three-pronged strategy consists of learning (i) models expressed in analytical form, (ii) which are evaluated a posteriori, and iii) using exclusively integral quantities from the reference solution as prior knowledge. In point (i), we pursue interpretable models expressed symbolically as opposed to black-box neural networks, the latter only being used during learning to efficiently parameterize the large search space of possible models. In point (ii), learned models are dynamically evaluated a posteriori in the computational solver instead of based on a priori information from preprocessed high-fidelity data, thereby accounting for the specificity of the solver at hand such as its numerics. Finally in point (iii), the exploration of new models is solely guided by predefined integral quantities, e.g., averaged quantities of engineering interest in Reynolds-averaged or large-eddy simulations (LES). We use a coupled deep reinforcement learning framework and computational solver to concurrently achieve these objectives. The combination of reinforcement learning with objectives (i), (ii) and (iii) differentiate our work from previous modeling attempts based on machine learning. In this report, we provide a high-level description of the model discovery framework with reinforcement learning. The method is detailed for the application of discovering missing terms in differential equations. An elementary instantiation of the method is described that discovers missing terms in the Burgers' equation.

artificial intelligence, health & medicine, upstream oil & gas, (19 more...)

arXiv.org Machine Learning

2001.00008

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report > Promising Solution (0.48)

Industry:

Leisure & Entertainment > Games (1.00)
Health & Medicine > Therapeutic Area (0.68)
Energy > Oil & Gas > Upstream (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Augmented Replay Memory in Reinforcement Learning With Continuous Control

Ramicic, Mirza, Bonarini, Andrea

arXiv.org Artificial IntelligenceDec-29-2019

Online reinforcement learning agents are currently able to process an increasing amount of data by converting it into a higher order value functions. This expansion of the information collected from the environment increases the agent's state space enabling it to scale up to a more complex problems but also increases the risk of forgetting by learning on redundant or conflicting data. To improve the approximation of a large amount of data, a random mini-batch of the past experiences that are stored in the replay memory buffer is often replayed at each learning step. The proposed work takes inspiration from a biological mechanism which act as a protective layer of human brain higher cognitive functions: active memory consolidation mitigates the effect of forgetting of previous memories by dynamically processing the new ones. The similar dynamics are implemented by a proposed augmented memory replay AMR capable of optimizing the replay of the experiences from the agent's memory structure by altering or augmenting their relevance. Experimental results show that an evolved AMR augmentation function capable of increasing the significance of the specific memories is able to further increase the stability and convergence speed of the learning algorithms dealing with the complexity of continuous action domains.

agent, algorithm, reinforcement, (14 more...)

arXiv.org Artificial Intelligence

1912.12719

Country:

Europe > Czechia > Prague (0.04)
Europe > Italy > Lombardy > Milan (0.04)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Individual specialization in multi-task environments with multiagent reinforcement learners

Gasparrini, Marco Jerome, Solé, Ricard, Sánchez-Fibla, Martí

arXiv.org Artificial IntelligenceDec-29-2019

There is a growing interest in Multi-Agent Reinforcement Learning (MARL) as the first steps towards building general intelligent agents that learn to make low and high-level decisions in non-stationary complex environments in the presence of other agents. Previous results point us towards increased conditions for coordination, efficiency/fairness, and common-pool resource sharing. We further study coordination in multi-task environments where several rewarding tasks can be performed and thus agents don't necessarily need to perform well in all tasks, but under certain conditions may specialize. An observation derived from the study is that epsilon greedy exploration of value-based reinforcement learning methods is not adequate for multi-agent independent learners because the epsilon parameter that controls the probability of selecting a random action synchronizes the agents artificially and forces them to have deterministic policies at the same time. By using policy-based methods with independent entropy regularised exploration updates, we achieved a better and smoother convergence. Another result that needs to be further investigated is that with an increased number of agents specialization tends to be more probable.

agent, individual specialization, specialization, (12 more...)

arXiv.org Artificial Intelligence

1912.12671

Country: Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.06)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Loss aversion fosters coordination among independent reinforcement learners

Gasparrini, Marco Jerome, Sánchez-Fibla, Martí

arXiv.org Artificial IntelligenceDec-29-2019

L OSS AVERSION FOSTERS COORDINATION AMONG INDEPENDENT REINFORCEMENT LEARNERS ARX IV VERSION Marco Jerome Gasparrini University Pompeu Fabra Barcelona, Spain Martí Sánchez-Fibla † University Pompeu Fabra Barcelona, Spain January 1, 2020 A BSTRACT We study what are the factors that can accelerate the emergence of collaborative behaviours among independent selfish learning agents. We depart from the "Battle of the Exes" (BoE), a spatial repeated game from which human behavioral data has been obtained (by Hawkings and Goldstone, 2016) that we find interesting because it considers two cases: a classic game theory version, called ballistic, in which agents can only make one action/decision (equivalent to the Battle of the Sexes) and a spatial version, called dynamic, in which agents can change decision (a spatial continuous version). We model both versions of the game with independent reinforcement learning agents and we manipulate the reward function transforming it into an utility introducing "loss aversion": the reward that an agent obtains can be perceived as less valuable when compared to what the other got. We prove experimentally the introduction of loss aversion fosters cooperation by accelerating its appearance, and by making it possible in some cases like in the dynamic condition. We suggest that this may be an important factor explaining the rapid converge of human behaviour towards collaboration reported in the experiment of Hawkings and Goldstone.

agent, loss aversion, loss aversion foster coordination, (9 more...)

arXiv.org Artificial Intelligence

doi: 10.3233/978-1-61499-918-8-307

1912.12633

Country: Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.45)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.70)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Real-time Policy Distillation in Deep Reinforcement Learning

Sun, Yuxiang, Fazli, Pooyan

arXiv.org Artificial IntelligenceDec-29-2019

Policy distillation in deep reinforcement learning provides an effective way to transfer control policies from a larger network to a smaller untrained network without a significant degradation in performance. However, policy distillation is underexplored in deep reinforcement learning, and existing approaches are computationally inefficient, resulting in a long distillation time. In addition, the effectiveness of the distillation process is still limited to the model capacity. We propose a new distillation mechanism, called real-time policy distillation, in which training the teacher model and distilling the policy to the student model occur simultaneously. Accordingly, the teacher's latest policy is transferred to the student model in real time. This reduces the distillation time to half the original time or even less and also makes it possible for extremely small student models to learn skills at the expert level. We evaluated the proposed algorithm in the Atari 2600 domain. The results show that our approach can achieve full distillation in most games, even with compression ratios up to 1.7%.

distillation, policy distillation, student, (12 more...)

arXiv.org Artificial Intelligence

1912.1263

Country:

North America > United States > South Carolina (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
North America > Canada (0.04)

Genre: Research Report > New Finding (0.49)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Speeding up reinforcement learning by combining attention and agency features

Demirel, Berkay, Sánchez-Fibla, Martí

arXiv.org Artificial IntelligenceDec-29-2019

When playing video-games we immediately detect which entity we control and we center the attention towards it to focus the learning and reduce its dimensionality. Reinforcement Learning (RL) has been able to deal with big state spaces, including states derived from pixel images in Atari games, but the learning is slow, depends on the brute force mapping from the global state to the action values (Q-function), thus its performance is severely affected by the dimensionality of the state and cannot be transferred to other games or other parts of the same game. We propose different transformations of the input state that combine attention and agency detection mechanisms which both have been addressed separately in RL but not together to our knowledge. We propose and benchmark different architectures including both global and local agency centered versions of the state and also including summaries of the surroundings. Results suggest that even a redundant global-local state network can learn faster than the global alone. Summarized versions of the state look promising to achieve input-size independence learning.

agent, architecture, global state, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.3233/FAIA190111

1912.12623

Country: Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.05)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback