AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

A Minimum Relative Entropy Principle for Learning and Acting

Ortega, P. A., Braun, D. A.

Journal of Artificial Intelligence ResearchAug-16-2010

This paper proposes a method to construct an adaptive agent that is universal with respect to a given class of experts, where each expert is designed specifically for a particular environment. This adaptive control problem is formalized as the problem of minimizing the relative entropy of the adaptive agent from the expert that is most suitable for the unknown environment. If the agent is a passive observer, then the optimal solution is the well-known Bayesian predictor. However, if the agent is active, then its past actions need to be treated as causal interventions on the I/O stream rather than normal probability conditions. Here it is shown that the solution to this new variational problem is given by a stochastic controller called the Bayesian control rule, which implements adaptive behavior as a mixture of experts. Furthermore, it is shown that under mild assumptions, the Bayesian control rule converges to the control law of the most suitable expert.

agent, bayesian control rule, operation mode, (12 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.3062

AI Access Foundation

10659

Journal of Artificial Intelligence Research

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.46)
Europe > Poland (0.04)
South America > Chile (0.04)
(2 more...)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)

Add feedback

Reinforcement Learning for Closed-Loop Propofol Anesthesia: A Human Volunteer Study

Moore, Brett L. (Texas Tech University) | Panousis, Periklis (Stanford University School of Medicine) | Kulkarni, Vivek (Stanford University School of Medicine) | Pyeatt, Larry D. (Texas Tech University) | Doufas, Anthony G. (Stanford University School of Medicine)

AAAI ConferencesJul-15-2010

Research has demonstrated the efficacy of closed-loop control of anesthesia using the bispectral index (BIS) of the electroencephalogram as the controlled variable, and the development of model-based, patient-adaptive systems has considerably improved anesthetic control. To further explore the use of model-based control in anesthesia, we investigated the application of reinforcement learning (RL) in the delivery of patient-specific, propofol-induced hypnosis in human volunteers. When compared to published performance metrics, RL control demonstrated accuracy and stability, indicating that further, more rigorous clinical study is warranted.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

AAAI Conferences

Twenty-Second IAAI Conference

Country:

North America > United States > Texas > Taylor County > Abilene (0.04)
North America > United States > Massachusetts > Norfolk County > Norwood (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.87)

Industry:

Health & Medicine > Surgery (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Energy > Renewable > Geothermal > Geothermal Energy Systems and Facilities > Geothermal System for Power Generation > Advanced Geothermal System (AGS) (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.47)

Add feedback

Using Imagery to Simplify Perceptual Abstraction in Reinforcement Learning Agents

Wintermute, Samuel (University of Michigan, Ann Arbor)

AAAI ConferencesJul-15-2010

In this paper, we consider the problem of reinforcement learning in spatial tasks. These tasks have many states that can be aggregated together to improve learning efficiency. In an agent, this aggregation can take the form of selecting appropriate perceptual processes to arrive at a qualitative abstraction of the underlying continuous state. However, for arbitrary problems, an agent is unlikely to have the perceptual processes necessary to discriminate all relevant states in terms of such an abstraction. To help compensate for this, reinforcement learning can be integrated with an imagery system, where simple models of physical processes are applied within a low-level perceptual representation to predict the state resulting from an action. Rather than abstracting the current state, abstraction can be applied to the predicted next state. Formally, it is shown that this integration broadens the class of perceptual abstraction methods that can be used while preserving the underlying problem. Empirically, it is shown that this approach can be used in complex domains, and can be beneficial even when formal requirements are not met.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)

Industry: Leisure & Entertainment > Games (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Relational Reinforcement Learning in Infinite Mario

Mohan, Shiwali (University of Michigan) | Laird, John E. (University of Michigan)

AAAI ConferencesJul-15-2010

Relational representations in reinforcement learning allow for the use of structural information like the presence of objects and relationships between them in the description of value functions. Through this paper, we show that such representations allow for the inclusion of background knowledge that qualitatively describes a state and can be used to design agents that demonstrate learning behavior in domains with large state and actions spaces such as computer games.

infinite mario, relational reinforcement learning

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)

Add feedback

Representation Discovery in Sequential Decision Making

Mahadevan, Sridhar (University of Massachusetts, Amherst)

AAAI ConferencesJul-15-2010

Automatically constructing novel representations of tasks from analysis of state spaces is a longstanding fundamental challenge in AI. I review recent progress on this problem for sequential decision making tasks modeled as Markov decision processes. Specifically, I discuss three classes of representation discovery problems: finding functional, state, and temporal abstractions. I describe solution techniques varying along several dimensions: diagonalization or dilation methods using approximate or exact transition models; reward-specific vs reward-invariant methods; global vs. local representation construction methods; multiscale vs. flat discovery methods; and finally, orthogonal vs. redundant representa- tion discovery methods. I conclude by describing a number of open problems for future work.

abstraction, proceedings, representation, (17 more...)

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.37)

Add feedback

Learning to Surface Deep Web Content

Wu, Zhaohui (Xi'an Jiaotong University) | Jiang, Lu (Xi'an Jiaotong University) | Zheng, Qinghua (Xi'an Jiaotong University) | Liu, Jun (Xi'an Jiaotong University)

AAAI ConferencesJul-15-2010

We propose a novel deep web crawling framework based on reinforcement learning. The crawler is regarded as an agent and deep web database as the environment. The agent perceives its current state and submits a selected action (query) to the environment according to Q-value. Based on the framework we develop an adaptive crawling method. Experimental results show that it outperforms the state of art methods in crawling capability and breaks through the assumption of full-text search implied by existing methods.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country: Asia > China > Shaanxi Province > Xi'an (0.05)

Technology:

Information Technology > Communications > Web (1.00)
Information Technology > Information Management > Search (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.56)

Add feedback

Evolved Intrinsic Reward Functions for Reinforcement Learning

Niekum, Scott (University of Massachusetts Amherst)

AAAI ConferencesJul-15-2010

The reinforcement learning (RL) paradigm typically assumes a class of efficient, general search procedures that search a given reward function that is part of the problem over the space of programs--to search for reward functions. However, in animals, all reward These reward functions operate over the entire state space of signals are generated internally, rather than being received a reinforcement learning problem and, if successful, will be directly from the environment. Furthermore, animals able to quickly and automatically identify relevant variables have evolved motivational systems that facilitate learning by and features of the problem. This will allow the agent to rewarding activities that often bear a distal relationship to outperform an agent that uses the obvious task-based reward the animal's ultimate goals. Such intrinsic motivation can function. The use of genetic programming methods may alleviate cause an agent to explore and learn in the absence of external the difficulty of scaling reward function search and rewards, possibly improving its performance over a set provide a natural way to search through a very expressive of problems.

evolutionary algorithm, machine learning, reinforcement learning, (15 more...)

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.15)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.69)

Add feedback

Relative Entropy Policy Search

Peters, Jan (Max Planck Institute for Biological Cybernetics) | Mulling, Katharina (Max Planck Institute for Biological Cybernetics) | Altun, Yasemin (Max Planck Institute for Biological Cybernetics)

AAAI ConferencesJul-15-2010

Policy search is a successful approach to reinforcement learning. However, policy improvements often result in the loss of information. Hence, it has been marred by premature convergence and implausible solutions. As first suggested in the context of covariant policy gradients, many of these problems may be addressed by constraining the information loss. In this paper, we continue this path of reasoning and suggest the Relative Entropy Policy Search (REPS) method. The resulting method differs significantly from previous policy gradient approaches and yields an exact update step. It can be shown to work well on typical reinforcement learning benchmark problems.

artificial intelligence, machine learning, reinforcement learning, (10 more...)

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Instance-Based Online Learning of Deterministic Relational Action Models

Xu, Joseph Z. (University of Michigan) | Laird, John E. (University of Michigan)

AAAI ConferencesJul-15-2010

We present an instance-based, online method for learning action models in unanticipated, relational domains. Our algorithm memorizes pre- and post-states of transitions an agent encounters while experiencing the environment, and makes predictions by using analogy to map the recorded transitions to novel situations. Our algorithm is implemented in the Soar cognitive architecture, integrating its task-independent episodic memory module and analogical reasoning implemented in procedural memory. We evaluate this algorithm’s prediction performance in a modified version of the blocks world domain and the taxi domain. We also present a reinforcement learning agent that uses our model learning algorithm to significantly speed up its convergence to an optimal policy in the modified blocks world domain.

machine learning, reinforcement learning, transition, (19 more...)

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
Europe > United Kingdom > England > Greater Manchester > Manchester (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Industry:

Education > Educational Setting > Online (0.64)
Transportation (0.61)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)

Add feedback

Learning Methods to Generate Good Plans: Integrating HTN Learning and Reinforcement Learning

Hogg, Chad (Lehigh University) | Kuter, Ugur (University of Maryland) | Munoz-Avila, Hector (Lehigh University)

AAAI ConferencesJul-15-2010

We consider how to learn Hierarchical Task Networks (HTNs) for planning problems in which both the quality of solution plans generated by the HTNs and the speed at which those plans are found is important. We describe an integration of HTN Learning with Reinforcement Learning to both learn methods by analyzing semantic annotations on tasks and to produce estimates of the expected values of the learned methods by performing Monte Carlo updates. We performed an experiment in which plan quality was inversely related to plan length. In two planning domains, we evaluated the planning performance of the learned methods in comparison to two state-of-the-art satisficing classical planners, FastForward and SGPlan6, and one optimal planner, HSP*. The results demonstrate that a greedy HTN planner using the learned methods was able to generate higher quality solutions than SGPlan6 in both domains and FastForward in one. Our planner, FastForward, and SGPlan6 ran in similar time, while HSP* was exponentially slower.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Pennsylvania > Northampton County > Bethlehem (0.04)
North America > United States > Maryland > Prince George's County > College Park (0.04)
North America > United States > California (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback