Goto

Collaborating Authors

 Technology


Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms

Neural Information Processing Systems

In this paper, we address two issues of longstanding interest in the reinforcement learningliterature. First, what kinds of performance guarantees can be made for Q-learning after only a finite number of actions? Second, what quantitative comparisons can be made between Q-learning and model-based (indirect) approaches, which use experience to estimate next-state distributions for off-line value iteration? We first show that both Q-learning and the indirect approach enjoy rather rapid convergence to the optimal policy as a function of the number ofstate transitions observed.


Using Collective Intelligence to Route Internet Traffic

Neural Information Processing Systems

A COllective INtelligence (COIN) is a set of interacting reinforcement learning(RL) algorithms designed in an automated fashion so that their collective behavior optimizes a global utility function. We summarize the theory of COINs, then present experiments using thattheory to design COINs to control internet traffic routing. These experiments indicate that COINs outperform all previously investigated RL-based, shortest path routing algorithms. 1 INTRODUCTION COllective INtelligences (COINs) are large, sparsely connected recurrent neural networks, whose "neurons" are reinforcement learning (RL) algorithms. The distinguishing featureof COINs is that their dynamics involves no centralized control, but only the collective effects of the individual neurons each modifying their behavior viatheir individual RL algorithms. This restriction holds even though the goal of the COIN concerns the system's global behavior.


Phase Diagram and Storage Capacity of Sequence-Storing Neural Networks

Neural Information Processing Systems

We solve the dynamics of Hopfield-type neural networks which store sequences ofpatterns, close to saturation. The asymmetry of the interaction matrix in such models leads to violation of detailed balance, ruling out an equilibrium statistical mechanical analysis. Using generating functional methods we derive exact closed equations for dynamical order parameters, viz.the sequence overlap and correlation and response functions.


Recurrent Cortical Amplification Produces Complex Cell Responses

Neural Information Processing Systems

Cortical amplification has been proposed as a mechanism for enhancing the selectivity of neurons in the primary visual cortex. Less appreciated is the fact that the same form of amplification can also be used to de-tune or broaden selectivity. Using a network model with recurrent cortical circuitry, we propose that the spatial phase invariance of complex cell responses arises through recurrent amplification of feedforward input.


On-Line Learning with Restricted Training Sets: Exact Solution as Benchmark for General Theories

Neural Information Processing Systems

Calculation of Q(t) and R(t) using (4, 5, 7, 9) to execute the path average and the average over sets is relatively straightforward, albeit tedious. We find that -"Yt(l -"Yt)


Attentional Modulation of Human Pattern Discrimination Psychophysics Reproduced by a Quantitative Model

Neural Information Processing Systems

We previously proposed a quantitative model of early visual processing inprimates, based on non-linearly interacting visual filters and statistically efficient decision. We now use this model to interpret theobserved modulation of a range of human psychophysical thresholds with and without focal visual attention. Our model - calibrated by an automatic fitting procedure - simultaneously reproduces thresholdsfor four classical pattern discrimination tasks, performed while attention was engaged by another concurrent task. Our model then predicts that the seemingly complex improvements of certain thresholds, which we observed when attention was fully available for the discrimination tasks, can best be explained by a strengthening of competition among early visual filters. 1 INTRODUCTION What happens when we voluntarily focus our attention to a restricted part of our visual field? We here investigate the possibility that attention might have a specific computational modulatory effect on early visual processing.


Using Analytic QP and Sparseness to Speed Training of Support Vector Machines

Neural Information Processing Systems

SVMs have empirically been shown to give good generalization performance on a wide variety of problems. However, the use of SVMs is stilI limited to a small group of researchers. One possible reason is that training algorithms for SVMs are slow, especially for large problems. Another explanation is that SVM training algorithms are complex, subtle, and sometimes difficult to implement. This paper describes a new SVM learning algorithm that is easy to implement, often faster, and has better scaling properties than the standard SVM training algorithm. The new SVM learning algorithm is called Sequential Minimal Optimization (or SMO).


Learning Curves for Gaussian Processes

Neural Information Processing Systems

I consider the problem of calculating learning curves (i.e., average generalization performance) of Gaussian processes used for regression. Asimple expression for the generalization error in terms of the eigenvalue decomposition of the covariance function is derived, and used as the starting point for several approximation schemes. I identify where these become exact, and compare with existing bounds on learning curves; the new approximations, which can be used for any input space dimension, generally get substantially closer to the truth. 1 INTRODUCTION: GAUSSIAN PROCESSES Within the neural networks community, there has in the last few years been a good deal of excitement about the use of Gaussian processes as an alternative to feedforward networks [lJ. The advantages of Gaussian processes are that prior assumptions about the problem to be learned are encoded in a very transparent way, and that inference-at least in the case of regression that I will consider-is relatively straightforward. One crucial question for applications is then how'fast' Gaussian processes learn, i.e., how many training examples are needed to achieve a certain level of generalization performance.