AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Construction of Macro Actions for Deep Reinforcement Learning

Chang, Yi-Hsiang, Chang, Kuan-Yu, Kuo, Henry, Lee, Chun-Yi

arXiv.org Artificial IntelligenceAug-5-2019

Conventional deep reinforcement learning typically determines an appropriate primitive action at each timestep, which requires enormous amount of time and effort for learning an effective policy, especially in large and complex environments. To deal with the issue fundamentally, we incorporate macro actions, defined as sequences of primitive actions, into the primitive action space to form an augmented action space. The problem lies in how to find an appropriate macro action to augment the primitive action space. The agent using a proper augmented action space is able to jump to a farther state and thus speed up the exploration process as well as facilitate the learning procedure. In previous researches, macro actions are developed by mining the most frequently used action sequences or repeating previous actions. However, the most frequently used action sequences are extracted from a past policy, which may only reinforce the original behavior of that policy. On the other hand, repeating actions may limit the diversity of behaviors of the agent. Instead, we propose to construct macro actions by a genetic algorithm, which eliminates the dependency of the macro action derivation procedure from the past policies of the agent. Our approach appends a macro action to the primitive action space once at a time and evaluates whether the augmented action space leads to promising performance or not. We perform extensive experiments and show that the constructed macro actions are able to speed up the learning process for a variety of deep reinforcement learning methods. Our experimental results also demonstrate that the macro actions suggested by our approach are transferable among deep reinforcement learning methods and similar environments. We further provide a comprehensive set of ablation analysis to validate the proposed methodology.

machine learning, macro, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

1908.01478

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Dueling Posterior Sampling for Preference-Based Reinforcement Learning

Novoseller, Ellen R., Sui, Yanan, Yue, Yisong, Burdick, Joel W.

arXiv.org Artificial IntelligenceAug-4-2019

In preference-based reinforcement learning (RL), an agent interacts with the environment while receiving preferences instead of absolute feedback. While there is increasing research activity in preference-based RL, the design of formal frameworks that admit tractable theoretical analysis remains an open challenge. Building upon ideas from preference-based bandit learning and posterior sampling in RL, we present Dueling Posterior Sampling (DPS), which employs preference-based posterior sampling to learn both the system dynamics and the underlying utility function that governs the user's preferences. Because preference feedback is provided on trajectories rather than individual state/action pairs, we develop a Bayesian approach to solving the credit assignment problem, translating user preferences to a posterior distribution over state/action reward models. We prove an asymptotic no-regret rate for DPS with a Bayesian logistic regression credit assignment model; to our knowledge, this is the first regret guarantee for preference-based RL. We also discuss possible avenues for extending this proof methodology to analyze other credit assignment models. Finally, we evaluate the approach empirically, showing competitive performance against existing baselines.

machine learning, posterior, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1908.01289

Country: North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Improving Deep Reinforcement Learning in Minecraft with Action Advice

Frazier, Spencer, Riedl, Mark

arXiv.org Artificial IntelligenceAug-2-2019

Training deep reinforcement learning agents complex behaviors in 3D virtual environments requires significant computational resources. This is especially true in environments with high degrees of aliasing, where many states share nearly identical visual features. Minecraft is an exemplar of such an environment. We hypothesize that interactive machine learning (IML), wherein human teachers play a direct role in training through demonstrations, critique, or action advice, may alleviate agent susceptibility to aliasing. However, interactive machine learning is only practical when the number of human interactions is limited, requiring a balance between human teacher effort and agent performance. We conduct experiments with two reinforcement learning algorithms which enable human teachers to give action advice--Feedback Arbitration, and Newtonian Action Advice--under visual aliasing conditions. To assess potential cognitive load per advice type, we vary the accuracy and frequency of various human action advice techniques. The training efficiency, robustness against infrequent and inaccurate advisor input, and sensitivity to aliasing are examined.

agent, perceptual, reinforcement, (14 more...)

arXiv.org Artificial Intelligence

1908.01007

Country:

Europe > Sweden > Skåne County > Malmö (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Combining learned skills and reinforcement learning for robotic manipulations

Strudel, Robin, Pashevich, Alexander, Kalevatykh, Igor, Laptev, Ivan, Sivic, Josef, Schmid, Cordelia

arXiv.org Artificial IntelligenceAug-2-2019

Manipulation tasks such as preparing a meal or assembling furniture remain highly challenging for robotics and vision. The supervised approach of imitation learning can handle short tasks but suffers from compounding errors and the need of many demonstrations for longer and more complex tasks. Reinforcement learning (RL) can find solutions beyond demonstrations but requires tedious and task-specific reward engineering for multi-step problems. In this work we address the difficulties of both methods and explore their combination. To this end, we propose a RL policies operating on pre-trained skills, that can learn composite manipulations using no intermediate rewards and no demonstrations of full tasks. We also propose an efficient training of basic skills from few synthetic demonstrated trajectories by exploring recent CNN architectures and data augmentation. We show successful learning of policies for composite manipulation tasks such as making a simple breakfast. Notably, our method achieves high success rates on a real robot, while using synthetic training data only.

demonstration, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

1908.00722

Country: Europe > France (0.28)

Genre: Research Report (0.84)

Industry: Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Unsupervised Learning: The Next Wave in AI Revolution Analytics Insight

#artificialintelligenceAug-1-2019, 16:31:58 GMT

Throughout the last decade, machine learning has gained exceptional ground in areas as varied as image recognition, self-driving vehicles and playing complex games like Go. These victories have been generally acknowledged via preparing deep neural systems with one of two learning paradigms which are supervised learning and reinforcement learning. The two standards require training signals to be structured by a human and then passed to the computer. On account of supervised learning, these are the "objectives, (for example, the right name for a picture); on account of reinforcement learning, they are the "rewards" for fruitful conduct, (for example, getting a high score in an Atari game). The cutoff points of learning are in this way characterized by human mentors.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games > Computer Games (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.61)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.53)

Add feedback

Reinforcement Learning Explained: Overview, Comparisons and Applications in Business

#artificialintelligenceAug-1-2019, 01:53:37 GMT

Imagine you're completing a mission in a computer game. Maybe you're going through a military depot to find a secret weapon. You get points for the right actions (killing an enemy) and lose them for the wrong ones (falling into a pit or getting hit). If you're playing on high difficulty, you might not conclude this task in just one attempt. Try after try, you learn which consecutive actions are needed to get out of a location safe, armed, and equipped with bonuses like extra health points or small artifacts in your bag.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

#artificialintelligence

Genre: Overview (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.33)

Add feedback

Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes

Agarwal, Alekh, Kakade, Sham M., Lee, Jason D., Mahajan, Gaurav

arXiv.org Machine LearningAug-1-2019

Policy gradient methods are among the most effective methods in challenging reinforcement learning problems with large state and/or action spaces. However, little is known about even their most basic theoretical convergence properties, including: if and how fast they converge to a globally optimal solution (say with a sufficiently rich policy class); how they cope with approximation error due to using a restricted class of parametric policies; or their finite sample behavior. Such characterizations are important not only to compare these methods to their approximate value function counterparts (where such issues are relatively well understood, at least in the worst case) but also to help with more principled approaches to algorithm design. This work provides provable characterizations of computational, approximation, and sample size issues with regards to policy gradient methods in the context of discounted Markov Decision Processes (MDPs). We focus on both: 1) "tabular" policy parameterizations, where the optimal policy is contained in the class and where we show global convergence to the optimal policy, and 2) restricted policy classes, which may not contain the optimal policy and where we provide agnostic learning results. One insight of this work is in formalizing the importance how a favorable initial state distribution provides a means to circumvent worst-case exploration issues. Overall, these results place policy gradient methods under a solid theoretical footing, analogous to the global convergence guarantees of iterative value function based algorithms.

machine learning, parameterization, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1908.00261

Country: North America > United States > California (0.67)

Genre: Research Report (1.00)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.84)

Add feedback

Robby is Not a Robber (anymore): On the Use of Institutions for Learning Normative Behavior

Tomic, Stevan, Pecora, Federico, Saffiotti, Alessandro

arXiv.org Artificial IntelligenceAug-1-2019

We show how norms can be used to guide a reinforcement learning agent towards achieving normative behavior and apply the same set of norms over different domains. Thus, we are able to: (1) provide a way to intuitively encode social knowledge (through norms); (2) guide learning towards normative behaviors (through an automatic norm reward system); and (3) achieve a transfer of learning by abstracting policies; Finally, (4) the method is not dependent on a particular RL algorithm. We show how our approach can be seen as a means to achieve abstract representation and learn procedural knowledge based on the declarative semantics of norms and discuss possible implications of this in some areas of cognitive science. Index T erms --Norms, Institutions, Automatic Reward Shaping, Transfer of Learning, Abstract Policies, Abstraction, State-Space Selection, Schema I. I NTRODUCTION In order to be accepted in human society, robots need to comply with human social norms.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1908.02138

Country: Europe (0.28)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

Neural Simplex Architecture

Phan, Dung, Paoletti, Nicola, Grosu, Radu, Jansen, Nils, Smolka, Scott A., Stoller, Scott D.

arXiv.org Artificial IntelligenceAug-1-2019

We present the Neural Simplex Architecture (NSA), a new approach to runtime assurance that provides safety guarantees for neural controllers (obtained e.g. using reinforcement learning) of complex autonomous and other cyber-physical systems without unduly sacrificing performance. NSA is inspired by the Simplex control architecture of Sha et al., but with some significant differences. In the traditional Simplex approach, the advanced controller (AC) is treated as a black box; there are no techniques for correcting the AC after it generates a potentially unsafe control input that causes a failover to the BC. Our NSA addresses this limitation. NSA not only provides safety assurances for CPSs in the presence of a possibly faulty neural controller, but can also improve the safety of such a controller in an online setting via retraining, without degrading its performance. NSA also offers reverse switching strategies, which allow the AC to resume control of the system under reasonable conditions, allowing the mission to continue unabated. Our experimental results on several significant case studies, including a target-seeking ground rover navigating an obstacle field and a neural controller for an artificial pancreas system, demonstrate NSA's benefits.

machine learning, reinforcement learning, trajectory, (17 more...)

arXiv.org Artificial Intelligence

1908.00528

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment > Games (0.46)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Reinforcement Learning for Personalized Dialogue Management

Hengst, Floris den, Hoogendoorn, Mark, van Harmelen, Frank, Bosman, Joost

arXiv.org Artificial IntelligenceAug-1-2019

Language systems have been of great interest to the research community and have recently reached the mass market through various assistant platforms on the web. Reinforcement Learning methods that optimize dialogue policies have seen successes in past years and have recently been extended into methods that personalize the dialogue, e.g. take the personal context of users into account. These works, however, are limited to personalization to a single user with whom they require multiple interactions and do not generalize the usage of context across users. This work introduces a problem where a generalized usage of context is relevant and proposes two Reinforcement Learning (RL)-based approaches to this problem. The first approach uses a single learner and extends the traditional POMDP formulation of dialogue state with features that describe the user context. The second approach segments users by context and then employs a learner per context. We compare these approaches in a benchmark of existing non-RL and RL-based methods in three established and one novel application domain of financial product recommendation. We compare the influence of context and training experiences on performance and find that learning approaches generally outperform a handcrafted gold standard.

machine learning, natural language, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1908.00286

Genre:

Research Report (1.00)
Overview > Innovation (0.34)

Industry: Banking & Finance (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

Add feedback