Reinforcement Learning
A Minimum Relative Entropy Principle for Learning and Acting
This paper proposes a method to construct an adaptive agent that is universal with respect to a given class of experts, where each expert is designed specifically for a particular environment. This adaptive control problem is formalized as the problem of minimizing the relative entropy of the adaptive agent from the expert that is most suitable for the unknown environment. If the agent is a passive observer, then the optimal solution is the well-known Bayesian predictor. However, if the agent is active, then its past actions need to be treated as causal interventions on the I/O stream rather than normal probability conditions. Here it is shown that the solution to this new variational problem is given by a stochastic controller called the Bayesian control rule, which implements adaptive behavior as a mixture of experts. Furthermore, it is shown that under mild assumptions, the Bayesian control rule converges to the control law of the most suitable expert.
Reinforcement Learning for Closed-Loop Propofol Anesthesia: A Human Volunteer Study
Moore, Brett L. (Texas Tech University) | Panousis, Periklis (Stanford University School of Medicine) | Kulkarni, Vivek (Stanford University School of Medicine) | Pyeatt, Larry D. (Texas Tech University) | Doufas, Anthony G. (Stanford University School of Medicine)
Research has demonstrated the efficacy of closed-loop control of anesthesia using the bispectral index (BIS) of the electroencephalogram as the controlled variable, and the development of model-based, patient-adaptive systems has considerably improved anesthetic control. To further explore the use of model-based control in anesthesia, we investigated the application of reinforcement learning (RL) in the delivery of patient-specific, propofol-induced hypnosis in human volunteers. When compared to published performance metrics, RL control demonstrated accuracy and stability, indicating that further, more rigorous clinical study is warranted.
Using Imagery to Simplify Perceptual Abstraction in Reinforcement Learning Agents
Wintermute, Samuel (University of Michigan, Ann Arbor)
In this paper, we consider the problem of reinforcement learning in spatial tasks. These tasks have many states that can be aggregated together to improve learning efficiency. In an agent, this aggregation can take the form of selecting appropriate perceptual processes to arrive at a qualitative abstraction of the underlying continuous state. However, for arbitrary problems, an agent is unlikely to have the perceptual processes necessary to discriminate all relevant states in terms of such an abstraction. To help compensate for this, reinforcement learning can be integrated with an imagery system, where simple models of physical processes are applied within a low-level perceptual representation to predict the state resulting from an action. Rather than abstracting the current state, abstraction can be applied to the predicted next state. Formally, it is shown that this integration broadens the class of perceptual abstraction methods that can be used while preserving the underlying problem. Empirically, it is shown that this approach can be used in complex domains, and can be beneficial even when formal requirements are not met.
Relational Reinforcement Learning in Infinite Mario
Mohan, Shiwali (University of Michigan) | Laird, John E. (University of Michigan)
Relational representations in reinforcement learning allow for the use of structural information like the presence of objects and relationships between them in the description of value functions. Through this paper, we show that such representations allow for the inclusion of background knowledge that qualitatively describes a state and can be used to design agents that demonstrate learning behavior in domains with large state and actions spaces such as computer games.
Representation Discovery in Sequential Decision Making
Mahadevan, Sridhar (University of Massachusetts, Amherst)
Automatically constructing novel representations of tasks from analysis of state spaces is a longstanding fundamental challenge in AI. I review recent progress on this problem for sequential decision making tasks modeled as Markov decision processes. Specifically, I discuss three classes of representation discovery problems: finding functional, state, and temporal abstractions. I describe solution techniques varying along several dimensions: diagonalization or dilation methods using approximate or exact transition models; reward-specific vs reward-invariant methods; global vs. local representation construction methods; multiscale vs. flat discovery methods; and finally, orthogonal vs. redundant representa- tion discovery methods. I conclude by describing a number of open problems for future work.
Learning to Surface Deep Web Content
Wu, Zhaohui (Xi'an Jiaotong University) | Jiang, Lu (Xi'an Jiaotong University) | Zheng, Qinghua (Xi'an Jiaotong University) | Liu, Jun (Xi'an Jiaotong University)
We propose a novel deep web crawling framework based on reinforcement learning. The crawler is regarded as an agent and deep web database as the environment. The agent perceives its current state and submits a selected action (query) to the environment according to Q-value. Based on the framework we develop an adaptive crawling method. Experimental results show that it outperforms the state of art methods in crawling capability and breaks through the assumption of full-text search implied by existing methods.
Evolved Intrinsic Reward Functions for Reinforcement Learning
Niekum, Scott (University of Massachusetts Amherst)
The reinforcement learning (RL) paradigm typically assumes a class of efficient, general search procedures that search a given reward function that is part of the problem over the space of programs--to search for reward functions. However, in animals, all reward These reward functions operate over the entire state space of signals are generated internally, rather than being received a reinforcement learning problem and, if successful, will be directly from the environment. Furthermore, animals able to quickly and automatically identify relevant variables have evolved motivational systems that facilitate learning by and features of the problem. This will allow the agent to rewarding activities that often bear a distal relationship to outperform an agent that uses the obvious task-based reward the animal's ultimate goals. Such intrinsic motivation can function. The use of genetic programming methods may alleviate cause an agent to explore and learn in the absence of external the difficulty of scaling reward function search and rewards, possibly improving its performance over a set provide a natural way to search through a very expressive of problems.
Relative Entropy Policy Search
Peters, Jan (Max Planck Institute for Biological Cybernetics) | Mulling, Katharina (Max Planck Institute for Biological Cybernetics) | Altun, Yasemin (Max Planck Institute for Biological Cybernetics)
Policy search is a successful approach to reinforcement learning. However, policy improvements often result in the loss of information. Hence, it has been marred by premature convergence and implausible solutions. As first suggested in the context of covariant policy gradients, many of these problems may be addressed by constraining the information loss. In this paper, we continue this path of reasoning and suggest the Relative Entropy Policy Search (REPS) method. The resulting method differs significantly from previous policy gradient approaches and yields an exact update step. It can be shown to work well on typical reinforcement learning benchmark problems.
Instance-Based Online Learning of Deterministic Relational Action Models
Xu, Joseph Z. (University of Michigan) | Laird, John E. (University of Michigan)
We present an instance-based, online method for learning action models in unanticipated, relational domains. Our algorithm memorizes pre- and post-states of transitions an agent encounters while experiencing the environment, and makes predictions by using analogy to map the recorded transitions to novel situations. Our algorithm is implemented in the Soar cognitive architecture, integrating its task-independent episodic memory module and analogical reasoning implemented in procedural memory. We evaluate this algorithm’s prediction performance in a modified version of the blocks world domain and the taxi domain. We also present a reinforcement learning agent that uses our model learning algorithm to significantly speed up its convergence to an optimal policy in the modified blocks world domain.
Learning Methods to Generate Good Plans: Integrating HTN Learning and Reinforcement Learning
Hogg, Chad (Lehigh University) | Kuter, Ugur (University of Maryland) | Munoz-Avila, Hector (Lehigh University)
We consider how to learn Hierarchical Task Networks (HTNs) for planning problems in which both the quality of solution plans generated by the HTNs and the speed at which those plans are found is important. We describe an integration of HTN Learning with Reinforcement Learning to both learn methods by analyzing semantic annotations on tasks and to produce estimates of the expected values of the learned methods by performing Monte Carlo updates. We performed an experiment in which plan quality was inversely related to plan length. In two planning domains, we evaluated the planning performance of the learned methods in comparison to two state-of-the-art satisficing classical planners, FastForward and SGPlan6, and one optimal planner, HSP*. The results demonstrate that a greedy HTN planner using the learned methods was able to generate higher quality solutions than SGPlan6 in both domains and FastForward in one. Our planner, FastForward, and SGPlan6 ran in similar time, while HSP* was exponentially slower.