AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Reinforcement Learning of Theorem Proving

Kaliszyk, Cezary, Urban, Josef, Michalewski, Henryk, Olšák, Mirek

arXiv.org Artificial IntelligenceMay-19-2018

Mirek Olšák Charles University We introduce a theorem proving algorithm that uses practically no domain heuristics for guiding its connection-style proof search. Instead, it runs many Monte-Carlo simulations guided by reinforcement learning from previous proof attempts. We produce several versions of the prover, parameterized by different learning and guiding algorithms. The strongest version of the system is trained on a large corpus of mathematical problems and evaluated on previously unseen problems. The trained system solves within the same number of inferences over 40% more problems than a baseline prover, which is an unusually high improvement in this hard AI domain. To our knowledge this is the first time reinforcement learning has been convincingly applied to solving general mathematical problems on a large scale.

inference, iteration, rlcop, (16 more...)

arXiv.org Artificial Intelligence

1805.07563

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > Poland > Masovia Province > Warsaw (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)

Add feedback

Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications

Brown, Daniel S., Niekum, Scott

arXiv.org Machine LearningMay-19-2018

However, despite much recent interest in IRL, little work has been done to understand of the minimum set of demonstrations needed to teach a specific sequential decision-making task. We formalize the problem of finding optimal demonstrations for IRL as a machine teaching problem where the goal is to find the minimum number of demonstrations needed to specify the reward equivalence class of the demonstrator. We extend previous work on algorithmic teaching for sequential decision-making tasks by showing an equivalence to the set cover problem, and use this equivalence to develop an efficient algorithm for determining the set of maximally-informative demonstrations. We apply our proposed machine teaching algorithm to two novel applications: benchmarking active learning IRL algorithms and developing an IRL algorithm that, rather than assuming demonstrations are i.i.d., uses counterfactual reasoning over informative demonstrations to learn more efficiently.

demonstration, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1805.07687

Country: North America > United States (0.28)

Genre:

Research Report (1.00)
Overview > Innovation (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Episodic Memory Deep Q-Networks

Lin, Zichuan, Zhao, Tianqi, Yang, Guangwen, Zhang, Lintao

arXiv.org Artificial IntelligenceMay-19-2018

Reinforcement learning (RL) algorithms have made huge progress in recent years by leveraging the power of deep neural networks (DNN). Despite the success, deep RL algorithms are known to be sample inefficient, often requiring many rounds of interaction with the environments to obtain satisfactory performance. Recently, episodic memory based RL has attracted attention due to its ability to latch on good actions quickly. In this paper, we present a simple yet effective biologically inspired RL algorithm called Episodic Memory Deep Q-Networks (EMDQN), which leverages episodic memory to supervise an agent during training. Experiments show that our proposed method can lead to better sample efficiency and is more likely to find good policies. It only requires 1/5 of the interactions of DQN to achieve many state-of-the-art performances on Atari games, significantly outperforming regular DQN and other episodic memory based RL algorithms.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

1805.07603

Genre: Research Report (0.50)

Industry:

Health & Medicine > Consumer Health (1.00)
Leisure & Entertainment > Games > Computer Games (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Scripts & Frames (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Reinforcement Learning with Pytorch Udemy

@machinelearnbotMay-18-2018, 19:05:13 GMT

See you in the class! Please note that some of our lectures are marked with (COMING SOON) - as we are still adding new, interesting videos.

artificial intelligence, machine learning, reinforcement learning, (3 more...)

@machinelearnbot

Genre:

Instructional Material > Course Syllabus & Notes (0.53)
Instructional Material > Online (0.40)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (0.40)
Education > Educational Setting > Online (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback

Two geometric input transformation methods for fast online reinforcement learning with neural nets

Ghiassian, Sina, Yu, Huizhen, Rafiee, Banafsheh, Sutton, Richard S.

arXiv.org Artificial IntelligenceMay-18-2018

We apply neural nets with ReLU gates in online reinforcement learning. Our goal is to train these networks in an incremental manner, without the computationally expensive experience replay. By studying how individual neural nodes behave in online training, we recognize that the global nature of ReLU gates can cause undesirable learning interference in each node's learning behavior. We propose reducing such interferences with two efficient input transformation methods that are geometric in nature and match well the geometric property of ReLU gates. The first one is tile coding, a classic binary encoding scheme originally designed for local generalization based on the topological structure of the input space. The second one (EmECS) is a new method we introduce; it is based on geometric properties of convex sets and topological embedding of the input space into the boundary of a convex set. We discuss the behavior of the network when it operates on the transformed inputs. We also compare it experimentally with some neural nets that do not use the same input transformations, and with the classic algorithm of tile coding plus a linear function approximator, and on several online reinforcement learning tasks, we show that the neural net with tile coding or EmECS can achieve not only faster learning but also more accurate approximations. Our results strongly suggest that geometric input transformation of this type can be effective for interference reduction and takes us a step closer to fully incremental reinforcement learning with neural nets.

latexit sha1, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

1805.07476

Country:

North America > Canada > Alberta (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report > New Finding (0.88)

Industry: Education > Educational Setting > Online (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Solving the Rubik's Cube Without Human Knowledge

McAleer, Stephen, Agostinelli, Forest, Shmakov, Alexander, Baldi, Pierre

arXiv.org Artificial IntelligenceMay-18-2018

A generally intelligent agent must be able to teach itself how to solve problems in complex domains with minimal human supervision. Recently, deep reinforcement learning algorithms combined with self-play have achieved superhuman proficiency in Go, Chess, and Shogi without human data or domain knowledge. In these environments, a reward is always received at the end of the game; however, for many combinatorial optimization environments, rewards are sparse and episodes are not guaranteed to terminate. We introduce Autodidactic Iteration: a novel reinforcement learning algorithm that is able to teach itself how to solve the Rubik's Cube with no human assistance. Our algorithm is able to solve 100% of randomly scrambled cubes while achieving a median solve length of 30 moves -- less than or equal to solvers that employ human domain knowledge.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

1805.0747

Country: North America > United States > California (0.15)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Rubik's Cube (0.76)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Evolutionary RL for Container Loading

Saikia, S, Verma, R, Agarwal, P, Shroff, G, Vig, L, Srinivasan, A

arXiv.org Artificial IntelligenceMay-17-2018

Loading the containers on the ship from a yard, is an impor- tant part of port operations. Finding the optimal sequence for the loading of containers, is known to be computationally hard and is an example of combinatorial optimization, which leads to the application of simple heuristics in practice. In this paper, we propose an approach which uses a mix of Evolutionary Strategies and Reinforcement Learning (RL) tech- niques to find an approximation of the optimal solution. The RL based agent uses the Policy Gradient method, an evolutionary reward strategy and a Pool of good (not-optimal) solutions to find the approximation. We find that the RL agent learns near-optimal solutions that outperforms the heuristic solutions. We also observe that the RL agent assisted with a pool generalizes better for unseen problems than an RL agent without a pool. We present our results on synthetic data as well as on subsets of real-world problems taken from container terminal. The results validate that our approach does comparatively better than the heuristics solutions available, and adapts to unseen problems better.

container, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1805.06664

Country: Asia > India > NCT > New Delhi (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Add feedback

A Reinforcement Learning Approach to Autonomous Speed Control in Robotic Systems

Aghli, Nima (Florida Institute of Technology) | Carvalho, Marco (Florida Institute of Technology)

AAAI ConferencesMay-17-2018

Model-free reinforcement learning techniques have been successfully used in diverse robotic applications. In this paper, we design and implement the Q-learning algorithm, a widely used model-free algorithm to find the optimal speed control function for a fast moving train on a fixed track. The goal is to allow for the train to learn the fastest speed profile it may use on a track, without derailment. We contrast the performance of the learning algorithm with the performance of the human controlling trying to perform the same task. In order the test the proposed algorithm, a complete hardware and software testbed has been designed and implemented, allowing for the evaluation of the learning models over a physical environment. We conclude that in simple tasks, the performance on humans in speed control is similar to the performance of the reinforcement learning algorithm, but when a more complex track is considered, the proposed reinforcement learning learning models outperforms the humans.

autonomous speed control, reinforcement learning approach, robotic system

AAAI Conferences

The Thirty-First International Flairs Conference

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Comparison of Reinforcement Learning Methodologies in Two-Party and Three-Party Negotiation Dialogue

Xiao, Gang (University of Southern California) | Georgila, Kallirroi (University of Southern California)

AAAI ConferencesMay-17-2018

We use reinforcement learning to learn dialogue policies in a collaborative furniture layout negotiation task. We employ a variety of methodologies (i.e., learning against a simulated user versus co-learning) and algorithms. Our policies achieve the best solution or a good solution to this problem for a variety of settings and initial conditions, including in the presence of noise (e.g., due to speech recognition or natural language understanding errors). Also, our policies perform well even in situations not observed during training. Policies trained against a simulated user perform well while interacting with policies trained through co-learning, and vice versa. Furthermore, policies trained in a two-party setting are successfully applied to a three-party setting, and vice versa.

reinforcement learning methodology, two-party and three-party negotiation dialogue

AAAI Conferences

The Thirty-First International Flairs Conference

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)

Add feedback

Memoryless Exact Solutions for Deterministic MDPs with Sparse Rewards

Bertram, Joshua R., Wei, Peng

arXiv.org Machine LearningMay-17-2018

We propose an algorithm for deterministic continuous Markov Decision Processes with sparse rewards that computes the optimal policy exactly with no dependency on the size of the state space. The algorithm has time complexity of $O( |R|^3 \times |A|^2 )$ and memory complexity of $O( |R| \times |A| )$, where $|R|$ is the number of reward sources and $|A|$ is the number of actions. Furthermore, we describe a companion algorithm that can follow the optimal policy from any initial state without computing the entire value function, instead computing on-demand the value of states as they are needed. The algorithm to solve the MDP does not depend on the size of the state space for either time or memory complexity, and the ability to follow the optimal policy is linear in time and space with the path length of following the optimal policy from the initial state. We demonstrate the algorithm operation side by side with value iteration on tractable MDPs.

algorithm, reward source, value function, (16 more...)

arXiv.org Machine Learning

1805.0722

Country:

North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > United States > Iowa > Story County > Ames (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback