AITopics

0907.0746

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(25 more...)

Genre:

Research Report (0.50)
Collection (0.46)

Industry: Leisure & Entertainment > Games > Chess (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(4 more...)

Konidaris, George (University of Massachusetts Amherst) | Barto, Andrew (University of Massachusetts Amherst)

Efficient Skill Learning Using Abstraction Selection

AAAI ConferencesJun-23-2009

We present an algorithm for selecting an appropriate abstraction when learning a new skill. We show empirically that it can consistently select an appropriate abstraction using very little sample data, and that it significantly improves skill learning performance in a reasonably large real-valued reinforcement learning domain.

abstraction, agent, reinforcement, (16 more...)

Twenty-First International Joint Conference on Artificial Intelligence

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.05)
North America > United States > New York > New York County > New York City (0.04)

Industry: Education (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

AAAI ConferencesJun-23-2009

Inverse Reinforcement Learning in Partially Observable Environments

Choi, Jaedeug (KAIST) | Kim, Kee-Eung (KAIST)

Inverse reinforcement learning (IRL) is the problem of recovering the underlying reward function from the behaviour of an expert. Most of the existing algorithms for IRL assume that the expert's environment is modeled as a Markov decision process (MDP), although they should be able to handle partially observable settings in order to widen the applicability to more realistic scenarios. In this paper, we present an extension of the classical IRL algorithm by Ng and Russell to partially observable environments. We discuss technical issues and challenges, and present the experimental results on some of the benchmark partially observable domains.

node, reward function, trajectory, (16 more...)

Twenty-First International Joint Conference on Artificial Intelligence

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > South Korea > Daejeon > Daejeon (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)

Akiyama, Takayuki (Tokyo Institute of Technology) | Hachiya, Hirotaka (Tokyo Institute of Technology) | Sugiyama, Masashi (Tokyo Institute of Technology)

Active Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning

AAAI ConferencesJun-23-2009

Appropriately designing sampling policies is highly important for obtaining better control policies in reinforcement learning. In this paper, we first show that the least-squares policy iteration (LSPI) framework allows us to employ statistical active learning methods for linear regression. Then we propose a design method of good sampling policies for efficient exploration, which is particularly useful when the sampling cost of immediate rewards is high. We demonstrate the usefulness of the proposed method, named active policy iteration (API), through simulations with a batting robot.

generalization error, immediate reward, iteration, (13 more...)

Twenty-First International Joint Conference on Artificial Intelligence

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.41)

arXiv.org Artificial IntelligenceJun-9-2009

Feature Reinforcement Learning: Part I: Unstructured MDPs

Hutter, Marcus

General-purpose, intelligent, learning agents cycle through sequences of observations, actions, and rewards that are complex, uncertain, unknown, and non-Markovian. On the other hand, reinforcement learning is well-developed for small finite state Markov decision processes (MDPs). Up to now, extracting the right state representations out of bare observations, that is, reducing the general agent setup to the MDP framework, is an art that involves significant effort by designers. The primary goal of this work is to automate the reduction process and thereby significantly expand the scope of many existing reinforcement learning algorithms and the agents that employ them. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main contribution of this article is to develop such a criterion. I also integrate the various parts into one learning algorithm. Extensions to more realistic dynamic Bayesian networks are developed in Part II [Hut09c]. The role of POMDPs is also considered there.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

0906.1713

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(12 more...)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Tokic, Michel (University of Applied Sciences Ravensburg-Weingarten) | Ertel, Wolfgang (University of Applied Sciences Ravensburg-Weingarten) | Fessler, Joachim (University of Applied Sciences Ravensburg-Weingarten)

The Crawler, A Class Room Demonstrator for Reinforcement Learning

AAAI ConferencesMay-21-2009

We present a little crawling robot with a two DOF arm that learns to move forward within about 15 seconds in real time. Due to its small size and weight the robot is ideally suited for classroom demonstrations as well as for talks to the public. Students who want to practice their knowledge about reinforcement learning and value iteration can use a wireless connection to a PC and monitor the internal state of the robot such as the value function or the reward table. Due to its adaptivity, depending on the surface properties of the underground the robot may surprise its audience with unexpected but efficient walking policies. The GUI is open source and the robot hardware is available as a kit from the authors.

algorithm, robot, student, (12 more...)

Twenty-Second International FLAIRS Conference

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Rajendran, Srividhya (The University of Texas at Arlington) | Huber, Manfred (The University of Texas at Arlington)

Generalizing and Categorizing Skills in Reinforcement Learning Agents Using Partial Policy Homomorphisms

AAAI ConferencesMay-21-2009

A reinforcement learning agent involved in life-long learning in a complex and dynamic environment has to have the ability to utilize control knowledge acquired in one situation in novel contexts. As part of this, it is important for the learning agent not only to be able to learn a new skill for a specific instance of a task but also to identify similar tasks, form a reusable skill and representational abstractions for the corresponding ''task type'', and to apply these abstractions in new, previously unseen contexts. This paper presents a new approach to policy generalization that derives an abstract policy for a set of similar tasks (a ''task type'') by constructing a partial policy homomorphism from a set of basic policies learned for previously seen task instances. The resulting generalized policy can then be applied in new contexts to address new instances of similar tasks. As opposed to many recent approaches in lifelong learning systems, this approach allows to identify similar tasks based on the functional characteristics of the corresponding skills and provides a means of transferring the learned knowledge to new situations without the need for complete knowledge of the state space and the system dynamics in the new environment. To illustrate the new policy generalization method and to demonstrate its ability to reuse the gained knowledge in new contexts, it is applied to a set of grid world examples.

abstraction, homomorphism, policy homomorphism, (12 more...)

Twenty-Second International FLAIRS Conference

Country: North America > United States > Texas > Tarrant County > Arlington (0.05)

Industry: Education > Educational Setting > Continuing Education (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceMay-20-2009

Optimistic Simulated Exploration as an Incentive for Real Exploration

Danihelka, Ivo

Many reinforcement learning exploration techniques are overly optimistic and try to explore every state. Such exploration is impossible in environments with the unlimited number of states. I propose to use simulated exploration with an optimistic model to discover promising paths for real exploration. This reduces the needs for the real exploration.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

0903.2972

Country: Europe > Czechia > Prague (0.05)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.52)

Szita, Istvan, Lorincz, Andras

Optimistic Initialization and Greediness Lead to Polynomial Time Learning in Factored MDPs - Extended Version

arXiv.org Artificial IntelligenceApr-21-2009

In this paper we propose an algorithm for polynomial-time reinforcement learning in factored Markov decision processes (FMDPs). The factored optimistic initial model (FOIM) algorithm, maintains an empirical model of the FMDP in a conventional way, and always follows a greedy policy with respect to its model. The only trick of the algorithm is that the model is initialized optimistically. We prove that with suitable initialization (i) FOIM converges to the fixed point of approximate value iteration (AVI); (ii) the number of steps when the agent makes non-near-optimal decisions (with respect to the solution of AVI) is polynomial in all relevant quantities; (iii) the per-step costs of the algorithm are also polynomial. To our best knowledge, FOIM is the first algorithm with these properties. This extended version contains the rigorous proofs of the main theorem. A version of this paper appeared in ICML'09.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

0904.3352

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.34)

Levene, Mark, Fenner, Trevor

A Methodology for Learning Players' Styles from Game Records

arXiv.org Artificial IntelligenceApr-16-2009

In Chess, as in other popular strategic board games, players have different styles. For example, in Chess some players are more "positional" and other more "tactical", and this difference in style will affect their move choice in any given board position, and more generally their overall plan. The problem we tackle in this paper is that of applying machine learning to teach a computer to discriminate between players based on their style. Before we explain our methodology, we briefly review the method of temporal difference learning, which is central to our approach. Temporal difference learning [Sut88] is a machine learning technique, originating from the seminal work of Samuel [Sam59], in which learning occurs by minimising the differences between predictions and actual outcomes of a temporal sequence of observations. Samuel [Sam59] used the game of Checkers as a vehicle to study the feasibility of a computer learning from experience. Although the program written by Samuel did not achieve master strength, it was the precursor of the Checkers program Chinook [Sch97, SHJ01], which was the first computer program to win a match against a human world champion.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

0904.2595

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
Europe > Netherlands > Limburg > Maastricht (0.05)
(10 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Chess (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.94)