Goto

Collaborating Authors

 Reinforcement Learning


Balancing Exploration and Exploitation in Agent Learning

AAAI Conferences

The Cultural Geography (CG) model is a governmentowned, open-source agent-based model designed to Balancing the ratio of exploration and exploitation is an address the behavioral response of civilian populations in important problem in reinforcement learning [1]. If you conflict environments [3]. Agents within the CG Model examine the relationship between agent and the select their action according to a constant temperature environment in reinforcement learning, agent has two setting over the course of a model run. To enhance the action selections in its environment: exploration and functionality of agents in selecting their actions and to get exploitation. The agent can choose to explore its more realistic results with better utilities we changed this environment and try new actions in search for better ones constant to a dynamic parameter which depends on time in to be adopted in the future, or exploit already tested actions Time Based Selection and on utility in Aggregate Utility and adopt them.


Personalized Intelligent Tutoring System Using Reinforcement Learning

AAAI Conferences

In this paper, we present a Personalized Intelligent Tutoring System that uses Reinforcement Learning techniques to implicitly learn teaching rules and provide instructions to students based on their needs. The system works on coarsely labeled data with minimum expert knowledge to ease extension to newer domains.


An Introduction to Intertask Transfer for Reinforcement Learning

AI Magazine

Transfer learning has recently gained popularity due to the development of algorithms that can successfully generalize information across multiple tasks. This article focuses on transfer in the context of reinforcement learning domains, a general learning framework where an agent acts in an environment to maximize a reward signal. The goals of this article are to (1) familiarize readers with the transfer learning problem in reinforcement learning domains, (2) explain why the problem is both interesting and difficult, (3) present a selection of existing techniques that demonstrate different solutions, and (4) provide representative open problems in the hope of encouraging additional research in this exciting area.


Automatic Discovery and Transfer of Task Hierarchies in Reinforcement Learning

AI Magazine

A principal one among them is the existence of multiple domains that share the same underlying causal structure for actions. We describe an approach that exploits this shared causal structure to discover a hierarchical task structure in a source domain, which in turn speeds up learning of task execution knowledge in a new target domain. Our approach is theoretically justified and compares favorably to manually designed task hierarchies in learning efficiency in the target domain. We demonstrate that causally motivated task hierarchies transfer more robustly than other kinds of detailed knowledge that depend on the idiosyncrasies of the source domain and are hence less transferable.


Automatic Discovery and Transfer of Task Hierarchies in Reinforcement Learning

AI Magazine

Sequential decision tasks present many opportunities for the study of transfer learning. A principal one among them is the existence of multiple domains that share the same underlying causal structure for actions. We describe an approach that exploits this shared causal structure to discover a hierarchical task structure in a source domain, which in turn speeds up learning of task execution knowledge in a new target domain. Our approach is theoretically justified and compares favorably to manually designed task hierarchies in learning efficiency in the target domain. We demonstrate that causally motivated task hierarchies transfer more robustly than other kinds of detailed knowledge that depend on the idiosyncrasies of the source domain and are hence less transferable.


An Introduction to Intertask Transfer for Reinforcement Learning

AI Magazine

Transfer learning has recently gained popularity due to the development of algorithms that can successfully generalize information across multiple tasks. This article focuses on transfer in the context of reinforcement learning domains, a general learning framework where an agent acts in an environment to maximize a reward signal. The goals of this article are to (1) familiarize readers with the transfer learning problem in reinforcement learning domains, (2) explain why the problem is both interesting and difficult, (3) present a selection of existing techniques that demonstrate different solutions, and (4) provide representative open problems in the hope of encouraging additional research in this exciting area.


A Framework for Teaching and Executing Verb Phrases

AAAI Conferences

This paper describes a framework for an agent to learn verb-phrase meanings from human teachers and combine these models with environmental dynamics so the agent can enact verb commands from the human teacher. This style of human/agent interaction allows the human teacher to issue natural-language commands and demonstrate ground actions, thereby alleviating the need for advanced teaching interfaces or difficult goal encodings. The framework extends prior work in apprenticeship learning and builds off of recent advancements in learning to recognize activities and modeling domains with multiple objects. In our studies, we show how to both learn a verb model and turn it into reward and heuristic functions that can then be composed with a dynamics model. The resulting "combined model" can then be efficiently searched by a sample-based planner which determines a policy for enacting a verb command in a given environment. Our experiments with a simulated robot domain show this framework can be used to quickly teach verb commands that the agent can then enact in new environments.


Using Human Demonstrations to Improve Reinforcement Learning

AAAI Conferences

This work introduces Human-Agent Transfer (HAT), an algorithm that combines transfer learning, learning from demonstration and reinforcement learning to achieve rapid learning and high performance in complex domains. Using experiments in a simulated robot soccer domain, we show that human demonstrations transferred into a baseline policy for an agent and refined using reinforcement learning significantly improve both learning time and policy performance. Our evaluation compares three algorithmic approaches to incorporating demonstration rule summaries into transfer learning, and studies the impact of demonstration quality and quantity. Our results show that all three transfer methods lead to statistically significant improvement in performance over learning without demonstration.


Reinforcement Learning with Human Feedback in Mountain Car

AAAI Conferences

As computational agents are increasingly used beyond research labs, their success will depend on their ability to learn new skills and adapt to their dynamic, complex environments. If human users — without programming skills — can transfer their task knowledge to the agents, learning rates can increase dramatically, reducing costly trials. The TAMER framework guides the design of agents whose behavior can be shaped through signals of approval and disapproval, a natural form of human feedback. Whereas early work on TAMER assumed that the agent's only feedback was from the human teacher, this paper considers the scenario of an agent within a Markov decision process (MDP), receiving and simultaneously learning from both MDP reward and human reinforcement signals. Preserving MDP reward as the determinant of optimal behavior, we test two methods of combining human reinforcement and MDP reward and analyze their respective performances. Both methods create a predictive model, H-hat, of human reinforcement and use that model in different ways to augment a reinforcement learning (RL) algorithm. We additionally introduce a technique for appropriately determining the magnitude of the model's influence on the RL algorithm throughout time and the state space.


Decision Making Agent Searching for Markov Models in Near-Deterministic World

arXiv.org Artificial Intelligence

Reinforcement learning has solid foundations, but becomes inefficient in partially observed (non-Markovian) environments. Thus, a learning agent -born with a representation and a policy- might wish to investigate to what extent the Markov property holds. We propose a learning architecture that utilizes combinatorial policy optimization to overcome non-Markovity and to develop efficient behaviors, which are easy to inherit, tests the Markov property of the behavioral states, and corrects against non-Markovity by running a deterministic factored Finite State Model, which can be learned. We illustrate the properties of architecture in the near deterministic Ms. Pac-Man game. We analyze the architecture from the point of view of evolutionary, individual, and social learning.