AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Q-Learning in enormous action spaces via amortized approximate maximization

Van de Wiele, Tom, Warde-Farley, David, Mnih, Andriy, Mnih, Volodymyr

arXiv.org Artificial IntelligenceJan-22-2020

Applying Q-learning to high-dimensional or continuous action spaces can be difficult due to the required maximization over the set of possible actions. Motivated by techniques from amortized inference, we replace the expensive maximization over all actions with a maximization over a small subset of possible actions sampled from a learned proposal distribution. The resulting approach, which we dub Amortized Q-learning (AQL), is able to handle discrete, continuous, or hybrid action spaces while maintaining the benefits of Q-learning. Our experiments on continuous control tasks with up to 21 dimensional actions show that AQL outperforms D3PG (Barth-Maron et al, 2018) and QT-Opt (Kalashnikov et al, 2018). Experiments on structured discrete action spaces demonstrate that AQL can efficiently learn good policies in spaces with thousands of discrete actions.

action space, proposal distribution, q-learning, (13 more...)

arXiv.org Artificial Intelligence

2001.08116

Country:

Oceania > Australia > Queensland > Brisbane (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Lyceum: An efficient and scalable ecosystem for robot learning

Summers, Colin, Lowrey, Kendall, Rajeswaran, Aravind, Srinivasa, Siddhartha, Todorov, Emanuel

arXiv.org Artificial IntelligenceJan-21-2020

We introduce Lyceum, a high-performance computational ecosystem for robot learning. Lyceum is built on top of the Julia programming language and the MuJoCo physics simulator, combining the ease-of-use of a high-level programming language with the performance of native C. In addition, Lyceum has a straightforward API to support parallel computation across multiple cores and machines. Overall, depending on the complexity of the environment, Lyceum is 5-30x faster compared to other popular abstractions like OpenAI's Gym and DeepMind's dm-control. This substantially reduces training time for various reinforcement learning algorithms; and is also fast enough to support real-time model predictive control through MuJoCo. The code, tutorials, and demonstration videos can be found at: www.lyceum.ml.

deep learning, ecosystem, neural network, (19 more...)

arXiv.org Artificial Intelligence

2001.07343

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas (0.37)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Add feedback

Unsupervisedly Learned Representations: Should the Quest be Over?

Nissani, Daniel N.

arXiv.org Artificial IntelligenceJan-21-2020

There exists a Classification accuracy gap of about 20% between our best methods of generating Unsupervisedly Learned Representations and the accuracy rates achieved by (naturally Unsupervisedly Learning) humans. We are at our fourth decade at least in search of this class of paradigms. It thus may well be that we are looking in the wrong direction. We present in this paper a possible solution to this puzzle. We demonstrate that Reinforcement Learning schemes can learn representations, which may be used for Pattern Recognition tasks such as Classification, achieving practically the same accuracy as that of humans. Our main modest contribution lies in the observations that: a. when applied to a real world environment (e.g. nature itself) Reinforcement Learning does not require labels, and thus may be considered a natural candidate for the long sought, accuracy competitive Unsupervised Learning method, and b. in contrast, when Reinforcement Learning is applied in a simulated or symbolic processing environment (e.g. a computer program) it does inherently require labels and should thus be generally classified, with some exceptions, as Supervised Learning. The corollary of these observations is that further search for Unsupervised Learning competitive paradigms which may be trained in simulated environments like many of those found in research and applications may be futile.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2001.07495

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Loss-annealed GAIL for sample efficient and stable Imitation Learning

Jena, Rohit, Sycara, Katia

arXiv.org Machine LearningJan-21-2020

Imitation learning is the problem of learning a policy from an expert policy without access to a reward signal. Often, the expert policy is only available in the form of expert demonstrations. Behavior cloning and GAIL are two popularly used methods for performing imitation learning in this setting. Behavior cloning converges in a few training iterations, but doesn't reach peak performance and suffers from compounding errors due to its supervised training framework and iid assumption. GAIL attempts to tackle this problem by accounting for the temporal dependencies between states while matching occupancy measures of the expert and the policy. Although GAIL has shown successes in a number of environments, it takes a lot of environment interactions. Given their complementary benefits, existing methods have mentioned trying or tried to combine the two methods, without much success. We look at some of the limitations of existing ideas that try to combine BC and GAIL, and present an algorithm that combines the best of both worlds to enable faster and stable training while not compromising on performance. Our algorithm is embarrassingly simple to implement and seamlessly integrates with different policy gradient algorithms. We demonstrate the effectiveness of the algorithm both in low dimensional control tasks in a limited data setting, and in high dimensional grid world environments.

algorithm, discriminator, trajectory, (15 more...)

arXiv.org Machine Learning

2001.07798

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Double Q-Learning with Python and Open AI

#artificialintelligenceJan-20-2020, 08:20:38 GMT

In the previous couple of articles, we explored reinforcement learning ecosystem, how it can be described and how it functions. Reinforcement learning is a type of learning that is different from supervised and unsupervised learning. Unlike the mentioned approaches, reinforcement learning uses interaction, which makes it "the third paradigm of machine learning". Main reinforcement learning elements are learning agent and environment. These two elements are in constant interaction.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Double Q-Learning with Python and Open AI

#artificialintelligenceJan-20-2020, 08:20:38 GMT

artificial intelligence, machine learning, reinforcement learning, (15 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Intelligence, physics and information -- the tradeoff between accuracy and simplicity in machine learning

Wu, Tailin

arXiv.org Machine LearningJan-20-2020

How can we enable machines to make sense of the world, and become better at learning? To approach this goal, I believe viewing intelligence in terms of many integral aspects, and also a universal two-term tradeoff between task performance and complexity, provides two feasible perspectives. In this thesis, I address several key questions in some aspects of intelligence, and study the phase transitions in the two-term tradeoff, using strategies and tools from physics and information. Firstly, how can we make the learning models more flexible and efficient, so that agents can learn quickly with fewer examples? Inspired by how physicists model the world, we introduce a paradigm and an AI Physicist agent for simultaneously learning many small specialized models (theories) and the domain they are accurate, which can then be simplified, unified and stored, facilitating few-shot learning in a continual way. Secondly, for representation learning, when can we learn a good representation, and how does learning depend on the structure of the dataset? We approach this question by studying phase transitions when tuning the tradeoff hyperparameter. In the information bottleneck, we theoretically show that these phase transitions are predictable and reveal structure in the relationships between the data, the model, the learned representation and the loss landscape. Thirdly, how can agents discover causality from observations? We address part of this question by introducing an algorithm that combines prediction and minimizing information from the input, for exploratory causal discovery from observational time series. Fourthly, to make models more robust to label noise, we introduce Rank Pruning, a robust algorithm for classification with noisy labels. I believe that building on the work of my thesis we will be one step closer to enable more intelligent machines that can make sense of the world.

attainable class information, deep learning, logic programming, (29 more...)

arXiv.org Machine Learning

2001.0378

Country:

Europe > United Kingdom > England (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.13)
Oceania > Australia (0.13)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Government (0.92)
Education > Educational Setting (0.67)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(10 more...)

Add feedback

2019 in Review: 10 AI Papers That Made an Impact

#artificialintelligenceJan-19-2020, 19:41:03 GMT

The volume of peer-reviewed AI research papers has grown by more than 300 percent over the past three decades (Stanford AI Index 2019), and the top AI conferences in 2019 saw a deluge of paper. CVPR submissions spiked to 5,165, a 56 percent increase over 2018; ICLR received 1,591 main conference paper submissions, up 60 percent over last year; ACL reported a record-breaking 2,906 submissions, almost doubling last year's 1,544; and ICCV 2019 received 4,303 submissions, more than twice the 2017 total. As part of our year-end series, Synced spotlights 10 artificial intelligence papers that garnered extraordinary attention and accolades in 2019. Abstract: Finite-horizon lookahead policies are abundantly used in Reinforcement Learning and demonstrate impressive empirical success. Usually, the lookahead policies are implemented with specific planning methods such as Monte Carlo Tree Search (e.g. in AlphaZero).

machine learning, natural language, reinforcement learning, (20 more...)

#artificialintelligence

Country:

North America > Canada > Ontario > Toronto (0.30)
North America > Canada > Quebec > Montreal (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(3 more...)

Genre: Research Report > New Finding (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.35)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.34)

Add feedback

Reinforcement Learning with Probabilistically Complete Exploration

Morere, Philippe, Francis, Gilad, Blau, Tom, Ramos, Fabio

arXiv.org Machine LearningJan-19-2020

Balancing exploration and exploitation remains a key challenge in reinforcement learning (RL). State-of-the-art RL algorithms suffer from high sample complexity, particularly in the sparse reward case, where they can do no better than to explore in all directions until the first positive rewards are found. To mitigate this, we propose Rapidly Randomly-exploring Reinforcement Learning (R3L). We formulate exploration as a search problem and leverage widely-used planning algorithms such as Rapidly-exploring Random Tree (RRT) to find initial solutions. These solutions are used as demonstrations to initialize a policy, then refined by a generic RL algorithm, leading to faster and more stable convergence. We provide theoretical guarantees of R3L exploration finding successful solutions, as well as bounds for its sampling complexity. We experimentally demonstrate the method outperforms classic and intrinsic exploration techniques, requiring only a fraction of exploration samples and achieving better asymptotic performance.

artificial intelligence, exploration, upstream oil & gas, (19 more...)

arXiv.org Machine Learning

2001.0694

Country:

North America > United States (0.28)
Asia > Middle East (0.14)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas > Upstream (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Learning Options from Demonstration using Skill Segmentation

Cockcroft, Matthew, Mawjee, Shahil, James, Steven, Ranchod, Pravesh

arXiv.org Machine LearningJan-19-2020

We present a method for learning options from segmented demonstration trajectories. The trajectories are first segmented into skills using nonparametric Bayesian clustering and a reward function for each segment is then learned using inverse reinforcement learning. From this, a set of inferred trajectories for the demonstration are generated. Option initiation sets and termination conditions are learned from these trajectories using the one-class support vector machine clustering algorithm. We demonstrate our method in the four rooms domain, where an agent is able to autonomously discover usable options from human demonstration. Our results show that these inferred options can then be used to improve learning and planning.

termination condition, termination state, trajectory, (14 more...)

arXiv.org Machine Learning

2001.06793

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Africa > South Africa > Gauteng > Johannesburg (0.04)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback