AITopics

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Reinforcement Learning with Soft State Aggregation

Singh, Satinder P., Jaakkola, Tommi, Jordan, Michael I.

It is widely accepted that the use of more compact representations than lookup tables is crucial to scaling reinforcement learning (RL) algorithms to real-world problems. Unfortunately almost all of the theory of reinforcement learning assumes lookup table representations. In this paper we address the pressing issue of combining function approximation and RL, and present 1) a function approximator based on a simple extension to state aggregation (a commonly used form of compact representation), namely soft state aggregation, 2) a theory of convergence for RL with arbitrary, but fixed, soft state aggregation, 3) a novel intuitive understanding of the effect of state aggregation on online RL, and 4) a new heuristic adaptive state aggregation algorithm that finds improved compact representations by exploiting the non-discrete nature of soft state aggregation. Preliminary empirical results are also presented.

artificial intelligence, reinforcement learning, state aggregation, (13 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems

Jaakkola, Tommi, Singh, Satinder P., Jordan, Michael I.

Increasing attention has been paid to reinforcement learning algorithms inrecent years, partly due to successes in the theoretical analysis of their behavior in Markov environments. If the Markov assumption is removed, however, neither generally the algorithms nor the analyses continue to be usable. We propose and analyze a new learning algorithm to solve a certain class of non-Markov decision problems. Our algorithm applies to problems in which the environment is Markov, but the learner has restricted access to state information. The algorithm involves a Monte-Carlo policy evaluationcombined with a policy improvement method that is similar to that of Markov decision problems and is guaranteed to converge to a local maximum. The algorithm operates in the space of stochastic policies, a space which can yield a policy that performs considerablybetter than any deterministic policy. Although the space of stochastic policies is continuous-even for a discrete action space-our algorithm is computationally tractable.

algorithm, artificial intelligence, reinforcement learning, (16 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Reinforcement Learning with Soft State Aggregation

Singh, Satinder P., Jaakkola, Tommi, Jordan, Michael I.

It is widely accepted that the use of more compact representations than lookup tables is crucial to scaling reinforcement learning (RL) algorithms to real-world problems. Unfortunately almost all of the theory of reinforcement learning assumes lookup table representations. Inthis paper we address the pressing issue of combining function approximation and RL, and present 1) a function approximator basedon a simple extension to state aggregation (a commonly used form of compact representation), namely soft state aggregation, 2) a theory of convergence for RL with arbitrary, but fixed, soft state aggregation, 3) a novel intuitive understanding of the effect of state aggregation on online RL, and 4) a new heuristic adaptive state aggregation algorithm that finds improved compact representations by exploiting the non-discrete nature of soft state aggregation. Preliminary empirical results are also presented.

artificial intelligence, reinforcement learning, state aggregation, (12 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Reinforcement Learning with Soft State Aggregation

Singh, Satinder P., Jaakkola, Tommi, Jordan, Michael I.

It is widely accepted that the use of more compact representations than lookup tables is crucial to scaling reinforcement learning (RL) algorithms to real-world problems. Unfortunately almost all of the theory of reinforcement learning assumes lookup table representations. In this paper we address the pressing issue of combining function approximation and RL, and present 1) a function approximator based on a simple extension to state aggregation (a commonly used form of compact representation), namely soft state aggregation, 2) a theory of convergence for RL with arbitrary, but fixed, soft state aggregation, 3) a novel intuitive understanding of the effect of state aggregation on online RL, and 4) a new heuristic adaptive state aggregation algorithm that finds improved compact representations by exploiting the non-discrete nature of soft state aggregation. Preliminary empirical results are also presented.

artificial intelligence, reinforcement learning, state aggregation, (13 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems

Jaakkola, Tommi, Singh, Satinder P., Jordan, Michael I.

Increasing attention has been paid to reinforcement learning algorithms in recent years, partly due to successes in the theoretical analysis of their behavior in Markov environments. If the Markov assumption is removed, however, neither generally the algorithms nor the analyses continue to be usable. We propose and analyze a new learning algorithm to solve a certain class of non-Markov decision problems. Our algorithm applies to problems in which the environment is Markov, but the learner has restricted access to state information. The algorithm involves a Monte-Carlo policy evaluation combined with a policy improvement method that is similar to that of Markov decision problems and is guaranteed to converge to a local maximum. The algorithm operates in the space of stochastic policies, a space which can yield a policy that performs considerably better than any deterministic policy. Although the space of stochastic policies is continuous-even for a discrete action space-our algorithm is computationally tractable.

algorithm, artificial intelligence, reinforcement learning, (16 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Neural Information Processing SystemsDec-31-1994

Convergence of Stochastic Iterative Dynamic Programming Algorithms

Jaakkola, Tommi, Jordan, Michael I., Singh, Satinder P.

Increasing attention has recently been paid to algorithms based on dynamic programming (DP) due to the suitability of DP for learning problems involving control. In stochastic environments where the system being controlled is only incompletely known, however, a unifying theoretical account of these methods has been missing. In this paper we relate DPbased learning algorithms to the powerful techniques of stochastic approximation via a new convergence theorem, enabling us to establish a class of convergent algorithms to which both TD("\) and Q-Iearning belong. 1 INTRODUCTION Learning to predict the future and to find an optimal way of controlling it are the basic goals of learning systems that interact with their environment. A variety of algorithms are currently being studied for the purposes of prediction and control in incompletely specified, stochastic environments. Here we consider learning algorithms defined in Markov environments. There are actions or controls (u) available for the learner that affect both the state transition probabilities, and the probability distribution for the immediate, state dependent costs (Ci(u)) incurred by the learner.

algorithm, artificial intelligence, optimization problem, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)

Neural Information Processing SystemsDec-31-1994

Convergence of Stochastic Iterative Dynamic Programming Algorithms

Jaakkola, Tommi, Jordan, Michael I., Singh, Satinder P.

Increasing attention has recently been paid to algorithms based on dynamic programming (DP) due to the suitability of DP for learning problems involving control. In stochastic environments where the system being controlled is only incompletely known, however, a unifying theoretical account of these methods has been missing. In this paper we relate DPbased learning algorithms to the powerful techniques of stochastic approximation via a new convergence theorem, enabling us to establish a class of convergent algorithms to which both TD("\) and Q-Iearning belong. 1 INTRODUCTION Learning to predict the future and to find an optimal way of controlling it are the basic goals of learning systems that interact with their environment. A variety of algorithms are currently being studied for the purposes of prediction and control in incompletely specified, stochastic environments. Here we consider learning algorithms defined in Markov environments. There are actions or controls (u) available for the learner that affect both the state transition probabilities, and the probability distribution for the immediate, state dependent costs (Ci(u)) incurred by the learner.

algorithm, artificial intelligence, optimization problem, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)

Neural Information Processing SystemsDec-31-1994

Convergence of Stochastic Iterative Dynamic Programming Algorithms

Jaakkola, Tommi, Jordan, Michael I., Singh, Satinder P.

Increasing attention has recently been paid to algorithms based on dynamic programming (DP) due to the suitability of DP for learning problemsinvolving control. In stochastic environments where the system being controlled is only incompletely known, however, a unifying theoretical account of these methods has been missing. In this paper we relate DPbased learning algorithms to the powerful techniquesof stochastic approximation via a new convergence theorem, enabling us to establish a class of convergent algorithms to which both TD("\) and Q-Iearning belong. 1 INTRODUCTION Learning to predict the future and to find an optimal way of controlling it are the basic goals of learning systems that interact with their environment. A variety of algorithms are currently being studied for the purposes of prediction and control in incompletely specified, stochastic environments. Here we consider learning algorithms definedin Markov environments. There are actions or controls (u) available for the learner that affect both the state transition probabilities, and the probability distributionfor the immediate, state dependent costs (Ci( u)) incurred by the learner.

algorithm, artificial intelligence, optimization problem, (14 more...)