AITopics

Country: Europe > Netherlands (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.35)

Reinforcement Learning with Function Approximation Converges to a Region

Gordon, Geoffrey J.

Many algorithms for approximate reinforcement learning are not known to converge. In fact, there are counterexamples showing that the adjustable weights in some algorithms may oscillate within a region rather than converging to a point. This paper shows that, for two popular algorithms, such oscillation is the worst that can happen: the weights cannot diverge, but instead must converge to a bounded region. The algorithms are SARSA(O) and V(O); the latter algorithm was used in the well-known TD-Gammon program. 1 Introduction Although there are convergent online algorithms (such as TD()') [1]) for learning the parameters of a linear approximation to the value function of a Markov process, no way is known to extend these convergence proofs to the task of online approximation ofeither the state-value (V*) or the action-value (Q*) function of a general Markov decision process. In fact, there are known counterexamples to many proposed algorithms.For example, fitted value iteration can diverge even for Markov processes [2]; Q-Iearning with linear function approximators can diverge, even when the states are updated according to a fixed update policy [3]; and SARSA(O) can oscillate between multiple policies with different value functions [4].

artificial intelligence, backgammon, sarsa, (16 more...)

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Industry: Leisure & Entertainment > Games > Backgammon (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.90)

Tchorz, Jürgen, Kleinschmidt, Michael, Kollmeier, Birger

Noise Suppression Based on Neurophysiologically-motivated SNR Estimation for Robust Speech Recognition

Abstract Missing

neural network, noise suppression, speech recognition, (19 more...)

Country: Europe > Germany (0.15)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.66)

Archer, Cynthia, Leen, Todd K.

From Mixtures of Mixtures to Adaptive Transform Coding

We establish a principled framework for adaptive transform coding. Transform coders are often constructed by concatenating an ad hoc choice of transform with suboptimal bit allocation and quantizer design. Instead, we start from a probabilistic latent variable model in the form of a mixture of constrained Gaussian mixtures. From this model we derive a transform coding algorithm, which is a constrained version of the generalized Lloyd algorithm for vector quantizer design. A byproduct of our derivation is the introduction of a new transform basis, which unlike other transforms (PCA, DCT, etc.) is explicitly optimized for coding.

artificial intelligence, machine learning, transform coder, (17 more...)

Country: North America > United States > Oregon (0.14)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.71)

Bhattacharyya, Chiranjib, Keerthi, S. Sathiya

A Variational Mean-Field Theory for Sigmoidal Belief Networks

In this paper we will discuss a variational mean-field theory and its application to BNs, sigmoidal BNs in particular. We present a variational derivation of the mean-field theory, proposed by Plefka[2].

approximation, artificial intelligence, machine learning, (16 more...)

Country: Asia > India (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.43)

Hahnloser, Richard H. R., Seung, H. Sebastian

Permitted and Forbidden Sets in Symmetric Threshold-Linear Networks

Ascribing computational principles to neural feedback circuits is an important problem in theoretical neuroscience. We study symmetric threshold-linear networks and derive stability results that go beyond the insights that can be gained from Lyapunov theory or energy functions. By applying linear analysis to subnetworks composed of coactive neurons, we determine the stability of potential steady states. We find that stability depends on two types of eigenmodes. One type determines global stability and the other type determines whether or not multistability is possible.

neural network, neurology, steady state, (19 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Ormoneit, Dirk, Glynn, Peter W.

Kernel-Based Reinforcement Learning in Average-Cost Problems: An Application to Optimal Portfolio Choice

Many approaches to reinforcement learning combine neural networks or other parametric function approximators with a form of temporal-difference learning to estimate the value function of a Markov Decision Process. A significant disadvantage of those procedures is that the resulting learning algorithms are frequently unstable. In this work, we present a new, kernel-based approach to reinforcement learning which overcomes this difficulty and provably converges to a unique solution. By contrast to existing algorithms, our method can also be shown to be consistent in the sense that its costs converge to the optimal costs asymptotically. Our focus is on learning in an average-cost framework and on a practical application to the optimal portfolio choice problem.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Country: North America > United States > California > Santa Clara County (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Scarpetta, Silvia, Li, Zhaoping, Hertz, John A.

Spike-Timing-Dependent Learning for Oscillatory Networks

The model structure is an abstrac- tion of the hippocampus or the olfactory cortex. We propose a simple generalized Hebbian rule, using temporal-activity-dependent LTP and LTD, to encode both magnitudes and phases of oscillatory patterns into the synapses in the network. After learning, the model responds resonantly to inputs which have been learned (or, for networks which operate essentially linearly, to linear combinations of learned inputs), but negligibly to other input patterns. Encoding both amplitude and phase enhances computational capacity, for which the price is having to learn both the excitatory-to-excitatory and the excitatory-to-inhibitory connections. Our model puts contraints on the form of the learning kernal A(r) that should be experimenally observed, e.g., for small oscillation frequencies, it requires that the overall LTP dominates the overall LTD, but this requirement should be modified if the stored oscillations are of high frequencies.

frequency, health & medicine, neural network, (21 more...)

Country: Europe > Denmark (0.14)

Industry: Health & Medicine (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Caruana, Rich, Lawrence, Steve, Giles, C. Lee

Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping

The conventional wisdom is that backprop nets with excess hidden units generalize poorly. We show that nets with excess capacity generalize well when trained with backprop and early stopping. Experiments suggest tworeasons for this: 1) Overfitting can vary significantly in different regions of the model. Excess capacity allows better fit to regions of high non-linearity, and backprop often avoids overfitting the regions of low non-linearity.

artificial intelligence, generalization, neural network, (14 more...)

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report > New Finding (0.47)

Industry: Energy > Oil & Gas (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.41)

Smola, Alex J., Bartlett, Peter L.

Sparse Greedy Gaussian Process Regression

We present a simple sparse greedy technique to approximate the maximum a posteriori estimate of Gaussian Processes with much improved scaling behaviour in the sample size m.