AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Sufficient Markov Decision Processes with Alternating Deep Neural Networks

Wang, Longshaokan, Laber, Eric B., Witkiewitz, Katie

arXiv.org Machine LearningMar-16-2018, 19:00:00 GMT

Markov decision processes (MDPs) (Bellman, 1957; Puterman, 2014) are the primary mathematical model for representing sequential decision problems with an indefinite time horizon (Bertsekas and Tsitsiklis, 1996; Sutton and Barto, 1998; Bather, 2000; Si, 2004; Powell, 2007; Wiering and Van Otterlo, 2012). This class of models is quite general as almost any decision process can be made into an MDP by concatenating data over multiple decision points (see Section 2 for a precise statement); however, coercing a decision process into the MDP framework in this way can lead to high-dimensional system state information that is difficult to model effectively. One common approach to construct a low-dimensional decision process from a high-dimensional MDP is to create a finite discretization of the space of possible system states and to treat the resultant process as a finite MDP (Gordon, 1995; Murao and Kitamura, 1997; Sutton and Barto, 1998; Kamio et al., 2004; Whiteson et al., 2007). However, such discretization can result in a significant loss of information and can be difficult to apply when the system state information is continuous and high-dimensional. Another common approach to dimension reduction is to construct a low-dimensional summary of the underlying system states, e.g., by applying principal components analysis (Jolliffe, 1986), multidimensional scaling (Borg and Groenen, 1997), or by constructing a local linear embedding (Roweis and Saul, 2000).

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1704.07531

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Ten Machine Learning Algorithms You Should Know to Become a Data Scientist

#artificialintelligenceMar-15-2018, 01:16:50 GMT

Let's say I am given an Excel sheet with data about various fruits and I have to tell which look like Apples. What I will do is ask a question "Which fruits are red and round?" and divide all fruits which answer yes and no to the question. Now, All Red and Round fruits might not be apples and all apples won't be red and round. So I will ask a question "Which fruits have red or yellow color hints on them? " on red and round fruits and will ask "Which fruits are green and round?" on not red and round fruits. Based on these questions I can tell with considerable accuracy which are apples. This cascade of questions is what a decision tree is. However, this is a decision tree based on my intuition.

machine learning, natural language, reinforcement learning, (17 more...)

#artificialintelligence

Country: North America > United States (0.15)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.73)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Add feedback

Rearrangement with Nonprehensile Manipulation Using Deep Reinforcement Learning

Yuan, Weihao, Stork, Johannes A., Kragic, Danica, Wang, Michael Y., Hang, Kaiyu

arXiv.org Artificial IntelligenceMar-15-2018

Rearranging objects on a tabletop surface by means of nonprehensile manipulation is a task which requires skillful interaction with the physical world. Usually, this is achieved by precisely modeling physical properties of the objects, robot, and the environment for explicit planning. In contrast, as explicitly modeling the physical environment is not always feasible and involves various uncertainties, we learn a nonprehensile rearrangement strategy with deep reinforcement learning based on only visual feedback. For this, we model the task with rewards and train a deep Q-network. Our potential field-based heuristic exploration strategy reduces the amount of collisions which lead to suboptimal outcomes and we actively balance the training set to avoid bias towards poor examples. Our training process leads to quicker learning and better performance on the task as compared to uniform exploration and standard experience replay. We demonstrate empirical evidence from simulation that our method leads to a success rate of 85%, show that our system can cope with sudden changes of the environment, and compare our performance with human level performance.

artificial intelligence, obstacle, upstream oil & gas, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICRA.2018.8462863

1803.05752

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (0.94)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Imitation Learning with Concurrent Actions in 3D Games

Harmer, Jack, Gisslén, Linus, Holst, Henrik, Bergdahl, Joakim, Olsson, Tom, Sjöö, Kristoffer, Nordin, Magnus

arXiv.org Machine LearningMar-15-2018

In this work we describe a novel deep reinforcement learning neural network architecture that allows multiple actions to be selected at every time-step. Multi-action policies allows complex behaviors to be learnt that are otherwise hard to achieve when using single action selection techniques. This work describes an algorithm that uses both imitation learning (IL) and temporal difference (TD) reinforcement learning (RL) to provide a 4x improvement in training time and 2.5x improvement in performance over single action selection TD RL. We demonstrate the capabilities of this network using a complex in-house 3D game. Mimicking the behavior of the expert teacher significantly improves world state exploration and allows the agents vision system to be trained more rapidly than TD RL alone. This initial training technique kick-starts TD learning and the agent quickly learns to surpass the capabilities of the expert.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1803.05402

Country: Europe > Sweden (0.14)

Genre: Research Report (0.52)

Industry: Leisure & Entertainment > Games > Computer Games (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Measurement-based adaptation protocol with quantum reinforcement learning

Albarrán-Arriagada, F., Retamal, J. C., Solano, E., Lamata, L.

arXiv.org Machine LearningMar-14-2018

Machine learning employs dynamical algorithms that mimic the human capacity to learn, where the reinforcement learning ones are among the most similar to humans in this respect. On the other hand, adaptability is an essential aspect to perform any task efficiently in a changing environment, and it is fundamental for many purposes, such as natural selection. Here, we propose an algorithm based on successive measurements to adapt one quantum state to a reference unknown state, in the sense of achieving maximum overlap. The protocol naturally provides many identical copies of the reference state, such that in each measurement iteration more information about it is obtained. In our protocol, we consider a system composed of three parts, the "environment" system, which provides the reference state copies; the register, which is an auxiliary subsystem that interacts with the environment to acquire information from it; and the agent, which corresponds to the quantum state that is adapted by digital feedback with input corresponding to the outcome of the measurements on the register. With this proposal we can achieve an average fidelity between the environment and the agent of more than $90\% $ with less than $30$ iterations of the protocol. In addition, we extend the formalism to $ d $-dimensional states, reaching an average fidelity of around $80\% $ in less than $400$ iterations for $d=$ 11, for a variety of genuinely quantum as well as semiclassical states. This work paves the way for the development of quantum reinforcement learning protocols using quantum data, and the future deployment of semi-autonomous quantum systems.

fidelity, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1803.0534

Country: Europe > Spain (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Hierarchical Reinforcement Learning: Approximating Optimal Discounted TSP Using Local Policies

Zahavy, Tom, Hasidim, Avinatan, Kaplan, Haim, Mansour, Yishay

arXiv.org Machine LearningMar-13-2018

In this work, we provide theoretical guarantees for reward decomposition in deterministic MDPs. Reward decomposition is a special case of Hierarchical Reinforcement Learning, that allows one to learn many policies in parallel and combine them into a composite solution. Our approach builds on mapping this problem into a Reward Discounted Traveling Salesman Problem, and then deriving approximate solutions for it. In particular, we focus on approximate solutions that are local, i.e., solutions that only observe information about the current state. Local policies are easy to implement and do not require substantial computational resources as they do not perform planning. While local deterministic policies, like Nearest Neighbor, are being used in practice for hierarchical reinforcement learning, we propose three stochastic policies that guarantee better performance than any deterministic policy.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1803.04674

Genre: Research Report (0.81)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Bayesian Optimization with Automatic Prior Selection for Data-Efficient Direct Policy Search

Pautrat, Rémi, Chatzilygeroudis, Konstantinos, Mouret, Jean-Baptiste

arXiv.org Machine LearningMar-13-2018

One of the most interesting features of Bayesian optimization for direct policy search is that it can leverage priors (e.g., from simulation or from previous tasks) to accelerate learning on a robot. In this paper, we are interested in situations for which several priors exist but we do not know in advance which one fits best the current situation. We tackle this problem by introducing a novel acquisition function, called Most Likely Expected Improvement (MLEI), that combines the likelihood of the priors and the expected improvement. We evaluate this new acquisition function on a transfer learning task for a 5-DOF planar arm and on a possibly damaged, 6-legged robot that has to learn to walk on flat ground and on stairs, with priors corresponding to different stairs and different kinds of damages. Our results show that MLEI effectively identifies and exploits the priors, even when there is no obvious match between the current situations and the priors.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1709.06919

Country: Europe (0.28)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Robots > Locomotion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

Combating Reinforcement Learning's Sisyphean Curse with Intrinsic Fear

Lipton, Zachary C., Azizzadenesheli, Kamyar, Kumar, Abhishek, Li, Lihong, Gao, Jianfeng, Deng, Li

arXiv.org Machine LearningMar-13-2018

Many practical environments contain catastrophic states that an optimal agent would visit infrequently or never. Even on toy problems, Deep Reinforcement Learning (DRL) agents tend to periodically revisit these states upon forgetting their existence under a new policy. We introduce intrinsic fear (IF), a learned reward shaping that guards DRL agents against periodic catastrophes. IF agents possess a fear model trained to predict the probability of imminent catastrophe. This score is then used to penalize the Q-learning objective. Our theoretical analysis bounds the reduction in average return due to learning on the perturbed objective. We also prove robustness to classification errors. As a bonus, IF models tend to learn faster, owing to reward shaping. Experiments demonstrate that intrinsic-fear DQNs solve otherwise pathological environments and improve on several Atari games.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1611.01211

Country: North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.56)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

[D] Deep Reinforcement Learning with Capsnets, real difference with Convnets (CNN) ? • r/MachineLearning

@machinelearnbotMar-11-2018, 17:45:18 GMT

I'm currently implementing an A3C agent in Tensorflow (Asynchronous Advantage Actor Critic) that plays doom (using vizdoom) and I was thinking about if there is a difference between using CNNs or Capsnets (Capsule Networks), Recently there was a big breakthrough in computer vision with these Capsnets. I know that Capsnets, instead of Convnets, handle the spatial relationship of the features and detecting rotated objects. As a consequence, I wondered if there is an advantage to use Capsnets in a Deep Reinforcement Learning agent?

capsnet, machine learning, reinforcement learning, (7 more...)

@machinelearnbot

Industry: Media > News (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Resurgence of Artificial Intelligence During 1983-2010

#artificialintelligenceMar-11-2018, 05:01:43 GMT

This is the second article in the four-part series on History of Artificial Intelligence. The first part can be accessed here. Every decade seems to have its technological buzzwords: we had personal computers in the 1980s; Internet and worldwide web in 1990s; smartphones and social media in 2000s; and Artificial Intelligence (AI) and Machine Learning in this decade. The 1950-82 era saw a new field of Artificial Intelligence (AI) being born, a lot of pioneering research being done, massive hype being created, and AI going into hibernation when this hype did not materialize, and the research funding dried up [56]. During 1983 and 2010, research funding ebbed and flowed, and research in AI continued to gather steam although " some computer scientists and software engineers would avoid the term artificial intelligence for fear of being viewed as wild-eyed dreamers" [43].

artificial intelligence, machine learning, reinforcement learning, (15 more...)

#artificialintelligence

Country:

North America > United States (0.14)
North America > Canada > Ontario > Toronto (0.14)

Industry:

Leisure & Entertainment > Games > Chess (1.00)
Information Technology (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback