AITopics

Many algorithms for approximate reinforcement learning are not known to converge. In fact, there are counterexamples showing that the adjustable weights in some algorithms may oscillate within a region rather than converging to a point. This paper shows that, for two popular algorithms, such oscillation is the worst that can happen: the weights cannot diverge, but instead must converge to a bounded region. The algorithms are SARSA(O) and V(O); the latter algorithm was used in the well-known TD-Gammon program. 1 Introduction Although there are convergent online algorithms (such as TD()') [1]) for learning the parameters of a linear approximation to the value function of a Markov process, no way is known to extend these convergence proofs to the task of online approximation ofeither the state-value (V*) or the action-value (Q*) function of a general Markov decision process. In fact, there are known counterexamples to many proposed algorithms.For example, fitted value iteration can diverge even for Markov processes [2]; Q-Iearning with linear function approximators can diverge, even when the states are updated according to a fixed update policy [3]; and SARSA(O) can oscillate between multiple policies with different value functions [4].

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Country: North America > United States (0.94)

Industry: Leisure & Entertainment > Games > Backgammon (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.90)

Punyakanok, Vasin, Roth, Dan

The Use of Classifiers in Sequential Inference

We study the problem of combining the outcomes of several different classifiers in a way that provides a coherent inference that satisfies some constraints. In particular, we develop two general approaches for an important subproblem-identifying phrase structure. The first is a Markovian approach that extends standard HMMs to allow the use of a rich observation structureand of general classifiers to model state-observation dependencies. The second is an extension of constraint satisfaction formalisms. Wedevelop efficient combination algorithms under both models and study them experimentally in the context of shallow parsing.

Tchorz, Jürgen, Kleinschmidt, Michael, Kollmeier, Birger

Noise Suppression Based on Neurophysiologically-motivated SNR Estimation for Robust Speech Recognition

ForSNR-estimation, the input signal is transformed into so-called Amplitude Modulation Spectrograms (AMS), which represent bothspectral and temporal characteristics of the respective analysis frame, and which imitate the representation of modulation frequenciesin higher stages of the mammalian auditory system. Aneural network is used to analyse AMS patterns generated from noisy speech and estimates the local SNR.

artificial intelligence, machine learning, noise suppression, (17 more...)

Country: Europe > Germany (0.15)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.66)

Ghahramani, Zoubin, Beal, Matthew J.

Propagation Algorithms for Variational Bayesian Learning

Variational approximations are becoming a widespread tool for Bayesian learning of graphical models. We provide some theoretical resultsfor the variational updates in a very general family of conjugate-exponential graphical models. We show how the belief propagation and the junction tree algorithms can be used in the inference step of variational Bayesian learning. Applying these results tothe Bayesian analysis of linear-Gaussian state-space models we obtain a learning procedure that exploits the Kalman smoothing propagation,while integrating over all model parameters. We demonstrate how this can be used to infer the hidden state dimensionality ofthe state-space model in a variety of synthetic problems and one real high-dimensional data set. 1 Introduction Bayesian approaches to machine learning have several desirable properties. Bayesian integration does not suffer overfitting (since nothing is fit to the data). Prior knowledge canbe incorporated naturally and all uncertainty is manipulated in a consistent manner. Moreover it is possible to learn model structures and readily compare between model classes. Unfortunately, for most models of interest a full Bayesian analysis is computationally intractable.

artificial intelligence, bayesian inference, machine learning, (17 more...)

Country: Europe > United Kingdom (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Leisink, Martijn A. R., Kappen, Hilbert J.

A Tighter Bound for Graphical Models

Theneurons in these networks are the random variables, whereas the connections between them model the causal dependencies. Usually, some of the nodes have a direct relation with the random variables in the problem and are called'visibles'. The other nodes, known as'hiddens', are used to model more complex probability distributions. Learning in graphical models can be done as long as the likelihood that the visibles correspond to a pattern in the data set, can be computed. In general the time it takes, scales exponentially with the number of hidden neurons.

approximation, artificial intelligence, machine learning, (18 more...)

Country: Europe > Netherlands (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.35)

Bogacz, Rafal, Brown, Malcolm W., Giraud-Carrier, Christophe G.

Emergence of Movement Sensitive Neurons' Properties by Learning a Sparse Code for Natural Moving Images

Olshausen & Field demonstrated that a learning algorithm that attempts to generate a sparse code for natural scenes develops a complete family of localised, oriented, bandpass receptive fields, similar to those of'simple cells' in VI. This paper describes an algorithm which finds a sparse code for sequences of images that preserves information about the input. This algorithm when trained on natural video sequences develops bases representing the movement in particular directions with particular speeds, similar to the receptive fields of the movement-sensitive cells observed in cortical visual areas. Furthermore, in contrast to previous approaches to learning direction selectivity, the timing of neuronal activity encodes the phase of the movement, so the precise timing of spikes is crucially important to the information encoding.

artificial intelligence, machine learning, sequence, (18 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.69)

Regularized Winnow Methods

Zhang, Tong

In theory, the Winnow multiplicative update has certain advantages over the Perceptron additive update when there are many irrelevant attributes. Recently, there has been much effort on enhancing the Perceptron algorithm byusing regularization, leading to a class of linear classification methods called support vector machines. Similarly, it is also possible to apply the regularization idea to the Winnow algorithm, which gives methods wecall regularized Winnows. We show that the resulting methods compare with the basic Winnows in a similar way that a support vector machine compares with the Perceptron. We investigate algorithmic issues andlearning properties of the derived methods. Some experimental results will also be provided to illustrate different methods. 1 Introduction In this paper, we consider the binary classification problem that is to determine a label y E {-1, 1} associated with an input vector x. A useful method for solving this problem is through linear discriminant functions, which consist of linear combinations of the components ofthe input variable.

algorithm, artificial intelligence, machine learning, (19 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)

Shriki, Oren, Sompolinsky, Haim, Lee, Daniel D.

An Information Maximization Approach to Overcomplete and Recurrent Representations

The principle of maximizing mutual information is applied to learning overcomplete and recurrent representations. The underlying model consists ofa network of input units driving a larger number of output units with recurrent interactions. In the limit of zero noise, the network is deterministic andthe mutual information can be related to the entropy of the output units.

artificial intelligence, machine learning, representation, (14 more...)

Country: Asia > Middle East > Israel (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Cadez, Igor V., Smyth, Padhraic

Model Complexity, Goodness of Fit and Diminishing Returns

Igor V. Cadez Information and Computer Science University of California Irvine, CA 92697-3425, U.S.A. PadhraicSmyth Information and Computer Science University of California Irvine, CA 92697-3425, U.S.A. Abstract We investigate a general characteristic of the tradeoff in learning problems between goodness-of-fit and model complexity. Specifically wecharacterize a general class of learning problems where the goodness-of-fit function can be shown to be convex within firstorder asa function of model complexity. This general property of "diminishing returns" is illustrated on a number of real data sets and learning problems, including finite mixture modeling and multivariate linear regression. 1 Introduction, Motivation, and Related Work Assume we have a data set D Such learning tasks can typically be characterized by the existence of a model and a loss function. A fitted model of complexity k is a function of the data points D and depends on a specific set of fitted parameters B. The loss function (goodnessof-fit) isa functional of the model and maps each specific model to a scalar used to evaluate the model, e.g., likelihood for density estimation or sum-of-squares for regression. Figure 1 illustrates a typical empirical curve for loss function versus complexity, for mixtures of Markov models fitted to a large data set of 900,000 sequences.

artificial intelligence, loss function, machine learning, (13 more...)

Country: North America > United States > California > Orange County > Irvine (0.95)

Industry: Education > Focused Education > Special Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.37)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Andre, David, Russell, Stuart J.

Programmable Reinforcement Learning Agents

We present an expressive agent design language for reinforcement learning thatallows the user to constrain the policies considered by the learning process.Thelanguage includes standard features such as parameterized subroutines,temporary interrupts, aborts, and memory variables, but also allows for unspecified choices in the agent program. For learning that which isn't specified, we present provably convergent learning algorithms. Wedemonstrate by example that agent programs written in the language are concise as well as modular. This facilitates state abstraction and the transferability of learned skills. 1 Introduction The field of reinforcement learning has recently adopted the idea that the application of prior knowledge may allow much faster learning and may indeed be essential if realworld environmentsare to be addressed. For learning behaviors, the most obvious form of prior knowledge provides a partial description of desired behaviors. Several languages for partial descriptions have been proposed, including Hierarchical Abstract Machines (HAMs) [8], semi-Markov options [12], and the MAXQ framework [4]. This paper describes extensions to the HAM language that substantially increase its expressive power,using constructs borrowed from programming languages. Obviously, increasing expressivenessmakes it easier for the user to supply whatever prior knowledge is available, and to do so more concisely.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)