AITopics

We study a new, model-free form of approximate policy iteration which uses Sarsa updates with linear state-action value function approximation for policy evaluation, and a "policy improvement operator" to generate a new policy based on the learned state-action values. We prove that if the policy improvement operator produces -soft policies and is Lipschitz continuous in the action values, with a constant that is not too large, then the approximate policy iteration algorithm converges to a unique solution from any initial policy. To our knowledge, this is the first convergence result for any form of approximate policy iteration under similar computational-resource assumptions.

algorithm, iteration, policy improvement operator, (12 more...)

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > Canada > Quebec > Montreal (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)

Roy, Benjamin V., Farias, Daniela D.

Approximate Linear Programming for Average-Cost Dynamic Programming

This paper extends our earlier analysis on approximate linear programming as an approach to approximating the cost-to-go function in a discounted-cost dynamic program [6]. In this paper, we consider the average-cost criterion and a version of approximate linear programming that generates approximations to the optimal average cost and differential cost function. We demonstrate that a naive version of approximate linear programming prioritizes approximation of the optimal average cost and that this may not be well-aligned with the objective of deriving a policy with low average cost. For that, the algorithm should aim at producing a good approximation of the differential cost function. We propose a twophase variant of approximate linear programming that allows for external control of the relative accuracy of the approximation of the differential cost function over different portions of the state space via state-relevance weights. Performance bounds suggest that the new algorithm is compatible with the objective of optimizing performance and provide guidance on appropriate choices for state-relevance weights.

approximation, cost function, differential cost function, (15 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Schoknecht, Ralf, Merke, Artur

Convergent Combinations of Reinforcement Learning with Linear Function Approximation

Convergence for iterative reinforcement learning algorithms like TD(O) depends on the sampling strategy for the transitions. However, in practical applications it is convenient to take transition data from arbitrary sources without losing convergence. In this paper we investigate the problem of repeated synchronous updates based on a fixed set of transitions. Our main theorem yields sufficient conditions of convergence for combinations of reinforcement learning algorithms and linear function approximation. This allows to analyse if a certain reinforcement learning algorithm and a certain function approximator are compatible.

algorithm, converge, eigenvalue, (13 more...)

Country:

Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.64)

Wang, Xiaofeng, Sandholm, Tuomas

Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games

Multiagent learning is a key problem in AI. In the presence of multiple Nash equilibria, even agents with non-conflicting interests may not be able to learn an optimal coordination policy. The problem is exaccerbated if the agents do not know the game and independently receive noisy payoffs. So, multiagent reinforfcement learning involves two interrelated problems: identifying the game and learning to play.

agent, equilibrium, nash equilibrium, (15 more...)

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Likhachev, Maxim, Koenig, Sven

Speeding up the Parti-Game Algorithm

In this paper, we introduce an efficient replanning algorithm for nondeterministic domains, namely what we believe to be the first incremental heuristic minimax search algorithm. We apply it to the dynamic discretization of continuous domains, resulting in an efficient implementation of the parti-game reinforcement-learning algorithm for control in high-dimensional domains.

algorithm, implementation, parti-game algorithm, (16 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Georgia > Fulton County > Atlanta (0.14)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Optimality of Reinforcement Learning Algorithms with Linear Function Approximation

Schoknecht, Ralf

There are several reinforcement learning algorithms that yield approximate solutions for the problem of policy evaluation when the value function is represented with a linear function approximator. In this paper we show that each of the solutions is optimal with respect to a specific objective function.

algorithm, eigenvalue, residual gradient algorithm, (14 more...)

Country:

Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.42)

Poupart, Pascal, Boutilier, Craig

Value-Directed Compression of POMDPs

We examine the problem of generating state-space compressions of POMDPs in a way that minimally impacts decision quality. We analyze the impact of compressions on decision quality, observing that compressions that allow accurate policy evaluation (prediction of expected future reward) will not affect decision quality. We derive a set of sufficient conditions that ensure accurate prediction in this respect, illustrate interesting mathematical properties these confer on lossless linear compressions, and use these to derive an iterative procedure for finding good linear lossy compressions. We also elaborate on how structured representations of a POMDP can be used to find such compressions.

compression, nullnullnullnullnull null, pomdp, (15 more...)

Country:

North America > Canada > Ontario > Toronto (0.15)
North America > United States > Oregon > Multnomah County > Portland (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Bias-Optimal Incremental Problem Solving

Schmidhuber, Jürgen

Given is a problem sequence and a probability distribution (the bias) on programs computing solution candidates. We present an optimally fast way of incrementally solving each task in the sequence. Bias shifts are computed by program prefixes that modify the distribution on their suffixes by reusing successful code for previous tasks (stored in non-modifiable memory). No tested program gets more runtime than its probability times the total search time.

instruction, probability, programming language, (15 more...)

Country:

Asia > Vietnam > Hanoi > Hanoi (0.08)
Europe > Switzerland (0.04)
North America > United States > Massachusetts (0.04)

Genre: Workflow (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)

Morimoto, Jun, Atkeson, Christopher G.

Minimax Differential Dynamic Programming: An Application to Robust Biped Walking

We developed a robust control policy design method in high-dimensional state space by using differential dynamic programming with a minimax criterion. As an example, we applied our method to a simulated five link biped robot. The results show lower joint torques from the optimal control policy compared to a hand-tuned PD servo controller. Results also show that the simulated biped robot can successfully walk with unknown disturbances that cause controllers generated by standard differential dynamic programming and the hand-tuned PD servo to fail. Learning to compensate for modeling error and previously unknown disturbances in conjunction with robust control design is also demonstrated.

controller, minimax ddp, trajectory, (13 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > New York > New York County > New York City (0.05)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.70)

Porr, Bernd, Wörgötter, Florentin

Learning a Forward Model of a Reflex

We develop a systems theoretical treatment of a behavioural system that interacts with its environment in a closed loop situation such that its motor actions influence its sensor inputs. The simplest form of a feedback is a reflex. Reflexes occur always "too late"; i.e., only after a (unpleasant, painful, dangerous) reflex-eliciting sensor event has occurred. This defines an objective problem which can be solved if another sensor input exists which can predict the primary reflex and can generate an earlier reaction. In contrast to previous approaches, our linear learning algorithm allows for an analytical proof that this system learns to apply feedforward control with the result that slow feedback loops are replaced by their equivalent feed-forward controller creating a forward model. In other words, learning turns the reactive system into a proactive system. By means of a robot implementation we demonstrate the applicability of the theoretical results which can be used in a variety of different areas in physics and engineering.

collision, transfer function, weight change, (17 more...)

Country:

North America > United States > New York (0.06)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)