AITopics

Learning Real-Time A* (LRTA*) is a popular control method that interleaves planning andplan execution and has been shown to solve search problems in known environments efficiently. In this paper, we apply LRTA* to the problem of getting to a given goal location in an initially unknown environment. Uninformed LRTA* with maximal lookahead always moves on a shortest path to the closest unvisited state, that is, to the closest potential goal state. This was believed to be a good exploration heuristic, but we show that it does not minimize the worst-case plan-execution time compared to other uninformed exploration methods. This result is also of interest to reinforcement-learning researchers since many reinforcement learning methods use asynchronous dynamic programming, interleave planning and plan execution, and exhibit optimism in the face of uncertainty, just like LRTA*.

machine learning, plan-execution time, reinforcement learning, (18 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Kearns, Michael J., Singh, Satinder P.

Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms

In this paper, we address two issues of longstanding interest in the reinforcement learningliterature. First, what kinds of performance guarantees can be made for Q-learning after only a finite number of actions? Second, what quantitative comparisons can be made between Q-learning and model-based (indirect) approaches, which use experience to estimate next-state distributions for off-line value iteration? We first show that both Q-learning and the indirect approach enjoy rather rapid convergence to the optimal policy as a function of the number ofstate transitions observed.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Hayashi, Akira, Suematsu, Nobuo

Viewing Classifier Systems as Model Free Learning in POMDPs

Classifier systems are now viewed disappointing because of their problems suchas the rule strength vs rule set performance problem and the credit assignment problem. In order to solve the problems, we have developed ahybrid classifier system: GLS (Generalization Learning System). In designing GLS, we view CSs as model free learning in POMDPs and take a hybrid approach to finding the best generalization, given the total number of rules. GLS uses the policy improvement procedure by Jaakkola et al. for an locally optimal stochastic policy when a set of rule conditions is given. GLS uses GA to search for the best set of rule conditions. 1 INTRODUCTION Classifier systems (CSs) (Holland 1986) have been among the most used in reinforcement learning.

artificial intelligence, expert system, machine learning, (16 more...)

Country: Asia > Japan (0.15)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.97)

Brown, Timothy X., Tong, Hui, Singh, Satinder P.

Optimizing Admission Control while Ensuring Quality of Service in Multimedia Networks via Reinforcement Learning

This paper examines the application of reinforcement learning to a telecommunications networking problem. The problem requires that revenue bemaximized while simultaneously meeting a quality of service constraint that forbids entry into certain states. We present a general solution to this multi-criteria problem that is able to earn significantly higher revenues than alternatives.

constraint, machine learning, reinforcement learning, (12 more...)

Country: North America > United States > Colorado > Boulder County > Boulder (0.14)

Genre: Research Report (0.34)

Industry: Telecommunications (0.67)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

III, Leemon C. Baird, Moore, Andrew W.

Gradient Descent for General Reinforcement Learning

A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide range of new reinforcementlearning algorithms.These algorithms solve a number of open problems, define several new approaches to reinforcement learning, and unify different approaches to reinforcement learning under a single theory. These algorithms all have guaranteed convergence, and include modifications of several existing algorithms that were known to fail to converge on simple MOPs. These include Q learning, SARSA, and advantage learning. In addition to these value-based algorithms it also generates pure policy-search reinforcement-learning algorithms, which learn optimal policies without learning a value function. In addition, it allows policysearch andvalue-based algorithms to be combined, thus unifying two very different approaches to reinforcement learning into a single Value and Policy Search (V APS) algorithm.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.15)
North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.95)

Wolpert, David, Tumer, Kagan, Frank, Jeremy

Using Collective Intelligence to Route Internet Traffic

A COllective INtelligence (COIN) is a set of interacting reinforcement learning(RL) algorithms designed in an automated fashion so that their collective behavior optimizes a global utility function. We summarize the theory of COINs, then present experiments using thattheory to design COINs to control internet traffic routing. These experiments indicate that COINs outperform all previously investigated RL-based, shortest path routing algorithms. 1 INTRODUCTION COllective INtelligences (COINs) are large, sparsely connected recurrent neural networks, whose "neurons" are reinforcement learning (RL) algorithms. The distinguishing featureof COINs is that their dynamics involves no centralized control, but only the collective effects of the individual neurons each modifying their behavior viatheir individual RL algorithms. This restriction holds even though the goal of the COIN concerns the system's global behavior.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Country: North America > United States (0.49)

Industry:

Government > Space Agency (0.31)
Government > Regional Government > North America Government > United States Government (0.31)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Williamson, Matthew M., Murray-Smith, Roderick, Hansen, Volker

Robot Docking Using Mixtures of Gaussians

This paper applies the Mixture of Gaussians probabilistic model, combined withExpectation Maximization optimization to the task of summarizing threedimensional range data for a mobile robot. This provides a flexible way of dealing with uncertainties in sensor information, and allows theintroduction of prior knowledge into low-level perception modules. Problemswith the basic approach were solved in several ways: the mixture of Gaussians was reparameterized to reflect the types of objects expected in the scene, and priors on model parameters were included in the optimization process. Both approaches force the optimization to find'interesting' objects, given the sensor and object characteristics. A higher level classifier was used to interpret the results provided by the model, and to reject spurious solutions.

algorithm, artificial intelligence, machine learning, (16 more...)

Country:

Europe (0.46)
North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)

Spence, Clay, Sajda, Paul

Applications of Multi-Resolution Neural Networks to Mammography

We have previously presented a coarse-to-fine hierarchical pyramid/neural network(HPNN) architecture which combines multiscale image processing techniques with neural networks.

architecture, artificial intelligence, machine learning, (17 more...)

Country: North America > United States > Massachusetts (0.14)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.66)
Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.41)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.76)

Prank, Klaus, Börger, Julia, Mühlen, Alexander von zur, Brabant, Georg, Schöfl, Christof

Independent Component Analysis of Intracellular Calcium Spike Data

These Ca2 signals are often organized in complex temporal and spatial patterns even under conditions of sustained stimulation.

algorithm, artificial intelligence, machine learning, (15 more...)

Country: North America > United States (0.47)

Genre: Research Report (0.47)

Industry: Health & Medicine > Therapeutic Area (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Oliver, Nuria, Rosario, Barbara, Pentland, Alex

Graphical Models for Recognizing Human Interactions

We describe a real-time computer vision and machine learning system for modeling and recognizing human behaviors in two different scenarios: (1) complex, twohanded actionrecognition in the martial art of Tai Chi and (2) detection and recognition of individual human behaviors and multiple-person interactions in a visual surveillance task. In the latter case, the system is particularly concerned with detecting when interactions between people occur, and classifying them. Graphical models, such as Hidden Markov Models (HMMs) [6] and Coupled Hidden MarkovModels (CHMMs) [3, 2], seem appropriate for modeling and, classifying human behaviors because they offer dynamic time warping, a well-understood training algorithm, and a clear Bayesian semantics for both individual (HMMs) and interacting or coupled (CHMMs) generative processes. A major problem with this data-driven statistical approach, especially when modeling rare or anomalous behaviors, is the limited number of training examples. A major emphasis of our work, therefore, is on efficient Bayesian integration of both prior knowledge with evidence from data.

artificial intelligence, bayesian inference, machine learning, (17 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)

Industry: Leisure & Entertainment > Sports (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)