AITopics

Country: North America > United States (0.14)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Barycentric Interpolators for Continuous Space and Time Reinforcement Learning

Munos, Rémi, Moore, Andrew W.

In order to find the optimal control of continuous state-space and time reinforcement learning (RL) problems, we approximate the value function (VF) with a particular class of functions called the barycentric interpolators. We establish sufficient conditions under which a RL algorithm converges to the optimal VF, even when we use approximate models of the state dynamics and the reinforcement functions.

artificial intelligence, convergence, reinforcement learning, (13 more...)

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Gradient Descent for General Reinforcement Learning

III, Leemon C. Baird, Moore, Andrew W.

These algorithms all have guaranteed convergence, and include modifications of several existing algorithms that were known to fail to converge on simple MOPs. These include Q learning, SARSA, and advantage learning. In addition to these value-based algorithms it also generates pure policy-search reinforcement-learning algorithms, which learn optimal policies without learning a value function.

algorithm, artificial intelligence, reinforcement learning, (14 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.15)
North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.95)

Gradient Descent for General Reinforcement Learning

III, Leemon C. Baird, Moore, Andrew W.

A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide range of new reinforcementlearning algorithms. These algorithms solve a number of open problems, define several new approaches to reinforcement learning, and unify different approaches to reinforcement learning under a single theory. These algorithms all have guaranteed convergence, and include modifications of several existing algorithms that were known to fail to converge on simple MOPs. These include Q learning, SARSA, and advantage learning. In addition to these value-based algorithms it also generates pure policy-search reinforcement-learning algorithms, which learn optimal policies without learning a value function. In addition, it allows policysearch and value-based algorithms to be combined, thus unifying two very different approaches to reinforcement learning into a single Value and Policy Search (V APS) algorithm.

algorithm, artificial intelligence, reinforcement learning, (13 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.15)
North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.53)

Barycentric Interpolators for Continuous Space and Time Reinforcement Learning

Munos, Rémi, Moore, Andrew W.

artificial intelligence, convergence, reinforcement learning, (13 more...)

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Very Fast EM-Based Mixture Model Clustering Using Multiresolution Kd-Trees

Moore, Andrew W.

Clustering is importa nt in m any fields including m anufactlll'ing,

artificial intelligence, machine learning, node, (15 more...)

Country: North America > United States (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.94)

Very Fast EM-Based Mixture Model Clustering Using Multiresolution Kd-Trees

Moore, Andrew W.

Clustering is importa nt in m any fields including m anufactlll'ing,

artificial intelligence, machine learning, node, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.94)

Neural Information Processing SystemsDec-31-1996

Memory-based Stochastic Optimization

Moore, Andrew W., Schneider, Jeff G.

In this paper we introduce new algorithms for optimizing noisy plants in which each experiment is very expensive. The algorithms build a global nonlinear model of the expected output at the same time as using Bayesian linear regression analysis of locally weighted polynomial models. The local model answers queries about confidence, noise, gradient and Hessians, and use them to make automated decisions similar to those made by a practitioner of Response Surface Methodology. The global and local models are combined naturally as a locally weighted regression. We examine the question of whether the global model can really help optimization, and we extend it to the case of time-varying functions. We compare the new algorithms with a highly tuned higher-order stochastic optimization algorithm on randomly-generated functions and a simulated manufacturing task. We note significant improvements in total regret, time to converge, and final solution quality. 1 INTRODUCTION In a stochastic optimization problem, noisy samples are taken from a plant. A sample consists of a chosen control u (a vector ofreal numbers) and a noisy observed response y.

algorithm, artificial intelligence, optimization problem, (15 more...)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)

Neural Information Processing SystemsDec-31-1996

Memory-based Stochastic Optimization

Moore, Andrew W., Schneider, Jeff G.

In this paper we introduce new algorithms for optimizing noisy plants in which each experiment is very expensive. The algorithms build a global nonlinear model of the expected output at the same time as using Bayesian linear regression analysis of locally weighted polynomial models. The local model answers queries about confidence, noise,gradient and Hessians, and use them to make automated decisions similar to those made by a practitioner of Response Surface Methodology. The global and local models are combined naturally as a locally weighted regression. We examine the question ofwhether the global model can really help optimization, and we extend it to the case of time-varying functions. We compare the new algorithms with a highly tuned higher-order stochastic optimization algorithmon randomly-generated functions and a simulated manufacturing task. We note significant improvements in total regret, time to converge, and final solution quality. 1 INTRODUCTION In a stochastic optimization problem, noisy samples are taken from a plant. A sample consists of a chosen control u (a vector ofreal numbers) and a noisy observed response y.

algorithm, artificial intelligence, optimization problem, (15 more...)

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)

Neural Information Processing SystemsDec-31-1995

Generalization in Reinforcement Learning: Safely Approximating the Value Function

Boyan, Justin A., Moore, Andrew W.

Reinforcement learning-the problem of getting an agent to learn to act from sparse, delayed rewards-has been advanced by techniques based on dynamic programming (DP). These algorithms compute a value function which gives, for each state, the minimum possiblelong-term cost commencing in that state. For the high-dimensional and continuous state spaces characteristic of real-world control tasks, a discrete representation ofthe value function is intractable; some form of generalization is required. A natural way to incorporate generalization into DP is to use a function approximator, rather than a lookup table, to represent the value function. This approach, which dates back to uses of Legendre polynomials in DP [Bellman et al., 19631, has recently worked well on several dynamic control problems [Mahadevan and Connell, 1990, Lin, 1993] and succeeded spectacularly on the game of backgammon [Tesauro, 1992, Boyan, 1992].

artificial intelligence, backgammon, iteration, (16 more...)

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Industry: Leisure & Entertainment > Games > Backgammon (0.55)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)