AITopics

We describe the relationship between certain reinforcement learning (RL) methods based on dynamic programming (DP) and a class of unorthodox Monte Carlo methods for solving systems of linear equations proposed in the 1950's. These methods recast the solution of the linear system as the expected value of a statistic suitably defined over sample paths of a Markov chain. The significance of our observations lies in arguments (Curtiss, 1954) that these Monte Carlo methods scale better with respect to state-space size than do standard, iterative techniques for solving systems of linear equations. This analysis also establishes convergence rate estimates. Because methods used in RL systems for approximating the evaluation function of a fixed control policy also approximate solutions to systems of linear equations, the connection to these Monte Carlo methods establishes that algorithms very similar to TD algorithms (Sutton, 1988) are asymptotically more efficient in a precise sense than other methods for evaluating policies. Further, all DPbased RL methods have some of the properties of these Monte Carlo algorithms, which suggests that although RL is often perceived to be slow, for sufficiently large problems, it may in fact be more efficient than other known classes of methods capable of producing the same results.

algorithm, monte carlo algorithm, monte carlo method, (9 more...)

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > New York > New York County > New York City (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.36)

Bayesian Modeling and Classification of Neural Signals

Lewicki, Michael S.

Signal processing and classification algorithms often have limited applicability resulting from an inaccurate model of the signal's underlying structure. We present here an efficient, Bayesian algorithm for modeling a signal composed of the superposition of brief, Poisson-distributed functions. This methodology is applied to the specific problem of modeling and classifying extracellular neural waveforms which are composed of a superposition of an unknown number of action potentials CAPs). Previous approaches have had limited success due largely to the problems of determining the spike shapes, deciding how many are shapes distinct, and decomposing overlapping APs. A Bayesian solution to each of these problems is obtained by inferring a probabilistic model of the waveform. This approach quantifies the uncertainty of the form and number of the inferred AP shapes and is used to obtain an efficient method for decomposing complex overlaps. This algorithm can extract many times more information than previous methods and facilitates the extracellular investigation of neuronal classes and of interactions within neuronal circuits.

bayesian modeling and classification, spike, spike model, (14 more...)

Country: North America > United States > California > Los Angeles County > Pasadena (0.04)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Bayesian Backprop in Action: Pruning, Committees, Error Bars and an Application to Spectroscopy

Thodberg, Hans Henrik

MacKay's Bayesian framework for backpropagation is conceptually appealing as well as practical. It automatically adjusts the weight decay parameters during training, and computes the evidence for each trained network. The evidence is proportional to our belief in the model. In this paper, the framework is extended to pruned nets, leading to an Ockham Factor for "tuning the architecture to the data". A committee of networks, selected by their high evidence, is a natural Bayesian construction.

bayesian backprop, ockham factor, weight decay parameter, (12 more...)

Country: North America > United States > Ohio (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.90)

Ron, Dana, Singer, Yoram, Tishby, Naftali

The Power of Amnesia

We propose a learning algorithm for a variable memory length Markov process. Human communication, whether given as text, handwriting, or speech, has multi characteristic time scales. On short scales it is characterized mostly by the dynamics that generate the process, whereas on large scales, more syntactic and semantic information is carried. For that reason the conventionally used fixed memory Markov models cannot capture effectively the complexity of such structures. On the other hand using long memory models uniformly is not practical even for as short memory as four.

algorithm, automaton, probability, (14 more...)

Country:

North America > United States > New York (0.04)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

A Local Algorithm to Learn Trajectories with Stochastic Neural Networks

Movellan, Javier R.

This paper presents a simple algorithm to learn trajectories with a continuous time, continuous activation version of the Boltzmann machine. The algorithm takes advantage of intrinsic Brownian noise in the network to easily compute gradients using entirely local computations. The algorithm may be ideal for parallel hardware implementations. This paper presents a learning algorithm to train continuous stochastic networks to respond with desired trajectories in the output units to environmental input trajectories. This is a task, with potential applications to a variety of problems such as stochastic modeling of neural processes, artificial motor control, and continuous speech recognition.

learn trajectory, probability, trajectory, (13 more...)

Country:

North America > United States > New York (0.05)
North America > United States > California > San Diego County > San Diego (0.05)
North America > United States > California > San Diego County > La Jolla (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.72)

Renals, Steve, Hochberg, Mike, Robinson, Tony

Learning Temporal Dependencies in Connectionist Speech Recognition

In this paper, we discuss the nature of the time dependence currently employed in our systems using recurrent networks (RNs) and feed-forward multi-layer perceptrons (MLPs). In particular, we introduce local recurrences into a MLP to produce an enhanced input representation. This is in the form of an adaptive gamma filter and incorporates an automatic approach for learning temporal dependencies. We have experimented on a speakerindependent phonerecognition task using the TIMIT database. Results using the gamma filtered input representation have shown improvement over the baseline MLP system. Improvements have also been obtained through merging the baseline and gamma filter models.

coefficient, filter coefficient, gamma filter coefficient, (12 more...)

Country:

North America > United States > California > San Mateo County > San Mateo (0.04)
North America > United States > California > San Mateo County > Redwood City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.56)

Singer, Yoram, Tishby, Naftali

Decoding Cursive Scripts

Online cursive handwriting recognition is currently one of the most intriguing challenges in pattern recognition. This study presents a novel approach to this problem which is composed of two complementary phases.The first is dynamic encoding of the writing trajectory into a compact sequence of discrete motor control symbols. In this compact representation we largely remove the redundancy of the script, while preserving most of its intelligible components. In the second phase these control sequences are used to train adaptive probabilistic acyclic automata (PAA) for the important ingredients of the writing trajectories, e.g.

automata, handwriting, probability, (17 more...)

Country: Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.05)

Genre: Overview (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Probabilistic Anomaly Detection in Dynamic Systems

Smyth, Padhraic

Padhraic Smyth Jet Propulsion Laboratory 238-420 California Institute of Technology 4800 Oak Grove Drive Pasadena, CA 91109 Abstract This paper describes probabilistic methods for novelty detection when using pattern recognition methods for fault monitoring of dynamic systems. The problem of novelty detection is particularly acutewhen prior knowledge and training data only allow one to construct an incomplete classification model. Allowance must be made in model design so that the classifier will be robust to data generated by classes not included in the training phase. For diagnosis applications one practical approach is to construct both an input density model and a discriminative class model. Using Bayes' rule and prior estimates of the relative likelihood of data of known and unknown origin the resulting classification equations are straightforward.

application, detection, probability, (15 more...)

Country:

North America > United States > California > Los Angeles County > Pasadena (0.25)
North America > United States > California > San Mateo County > San Mateo (0.05)
Oceania > Australia (0.04)
(3 more...)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.32)

A Local Algorithm to Learn Trajectories with Stochastic Neural Networks

Movellan, Javier R.

learn trajectory, local algorithm, trajectory, (13 more...)

Country:

North America > United States > New York (0.05)
North America > United States > California > San Diego County > San Diego (0.05)
North America > United States > California > San Diego County > La Jolla (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.72)

Barto, Andrew, Duff, Michael

Monte Carlo Matrix Inversion and Reinforcement Learning

We describe the relationship between certain reinforcement learning (RL) methods based on dynamic programming (DP) and a class of unorthodox Monte Carlo methods for solving systems of linear equations proposed in the 1950's. These methods recast the solution of the linear system as the expected value of a statistic suitably defined over sample paths of a Markov chain. The significance of our observations lies in arguments (Curtiss, 1954) that these Monte Carlo methods scale better with respect to state-space size than do standard, iterative techniques for solving systems of linear equations. This analysis also establishes convergence rate estimates. Because methods used in RL systems for approximating the evaluation function of a fixed control policy also approximate solutions to systems of linear equations, the connection to these Monte Carlo methods establishes that algorithms very similar to TD algorithms (Sutton, 1988) are asymptotically more efficient in a precise sense than other methods for evaluating policies. Further, all DPbased RL methods have some of the properties of these Monte Carlo algorithms, that although RL is often perceived towhich suggests be slow, for sufficiently large problems, it may in fact be more efficient than other known classes of methods capable of producing the same results.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.36)