Information Technology
Statistically Efficient Estimations Using Cortical Lateral Connections
Pouget, Alexandre, Zhang, Kechen
Coarse codes are widely used throughout the brain to encode sensory andmotor variables. Methods designed to interpret these codes, such as population vector analysis, are either inefficient, i.e., the variance of the estimate is much larger than the smallest possible variance,or biologically implausible, like maximum likelihood. Moreover, these methods attempt to compute a scalar or vector estimate of the encoded variable. Neurons are faced with a similar estimationproblem. They must read out the responses of the presynaptic neurons, but, by contrast, they typically encode the variable with a further population code rather than as a scalar. We show how a nonlinear recurrent network can be used to perform theseestimation in an optimal way while keeping the estimate in a coarse code format. This work suggests that lateral connections inthe cortex may be involved in cleaning up uncorrelated noise among neurons representing similar variables.
ARTEX: A Self-organizing Architecture for Classifying Image Regions
Grossberg, Stephen, Williamson, James R.
Automatic processing of visual scenes often begins by detecting regions of an image with common values of simple local features, such as texture, and mapping the pattern offeatureactivation into a predicted region label. We develop a self-organizing neural architecture, called the ARTEX algorithm, for automatically extracting a novel and effective array of such features and mapping them to output region labels. ARTEXis made up of biologically motivated networks, the Boundary Contour System and Feature Contour System (BCS/FCS) networks for visual feature extraction (Cohen& Grossberg, 1984; Grossberg & Mingolla, 1985a, 1985b; Grossberg & Todorovic, 1988; Grossberg, Mingolla, & Williamson, 1995), and the Gaussian ARTMAP (GAM) network for classification (Williamson, 1996). ARTEX is first evaluated on a difficult real-world task, classifying regions of synthetic apertureradar (SAR) images, where it reliably achieves high resolution (single 874 S.Grossberg and 1. R. Williamson pixel) classification results, and creates accurate probability maps for its class predictions. ARTEXis then evaluated on classification of natural textures, where it outperforms the texture classification system in Greenspan, Goodman, Chellappa, & Anderson (1994) using comparable preprocessing and training conditions. 2 FEATURE EXTRACTION NETWORKS
A Mean Field Algorithm for Bayes Learning in Large Feed-forward Neural Networks
In the Bayes approach to statistical inference [Berger, 1985] one assumes that the prior uncertainty about parameters of an unknown data generating mechanism can be encoded in a probability distribution, the so called prior. Using the prior and the likelihood of the data given the parameters, the posterior distribution of the parameters can be derived from Bayes rule. From this posterior, various estimates for functions ofthe parameter, like predictions about unseen data, can be calculated. However, in general, those predictions cannot be realised by specific parameter values, but only by an ensemble average over parameters according to the posterior probability. Hence,exact implementations of Bayes method for neural networks require averages over network parameters which in general can be performed by time consuming 226 M.Opper and O. Winther Monte Carlo procedures.
Multi-Task Learning for Stock Selection
Ghosn, Joumana, Bengio, Yoshua
Artificial Neural Networks can be used to predict future returns of stocks in order to take financial decisions. Should one build a separate network for each stock or share the same network for all the stocks? In this paper we also explore other alternatives, in which some layers are shared and others are not shared. When the prediction of future returns for different stocks are viewed as different tasks, sharing some parameters across stocks is a form of multi-task learning. In a series of experiments with Canadian stocks, we obtain yearly returns that are more than 14% above various benchmarks. 1 Introd uction Previous applications of ANNs to financial time-series suggest that several of these prediction and decision-taking tasks present sufficient non-linearities to justify the use of ANNs (Refenes, 1994; Moody, Levin and Rehfuss, 1993).
Unsupervised Learning by Convex and Conic Coding
Lee, Daniel D., Seung, H. Sebastian
Unsupervised learning algorithms based on convex and conic encoders areproposed. The encoders find the closest convex or conic combination of basis vectors to the input. The learning algorithms produce basis vectors that minimize the reconstruction error of the encoders. The convex algorithm develops locally linear models of the input, while the conic algorithm discovers features. Both algorithms areused to model handwritten digits and compared with vector quantization and principal component analysis.
The Effect of Correlated Input Data on the Dynamics of Learning
Halkjรฆr, Sรธren, Winther, Ole
The convergence properties of the gradient descent algorithm in the case of the linear perceptron may be obtained from the response function. We derive a general expression for the response function and apply it to the case of data with simple input correlations. It is found that correlations severely may slow down learning. This explains the success of PCA as a method for reducing training time. Motivated by this finding we furthermore propose to transform the input data by removing the mean across input variables as well as examples to decrease correlations. Numerical findings for a medical classification problem are in fine agreement with the theoretical results. 1 INTRODUCTION Learning and generalization are important areas of research within the field of neural networks.Although good generalization is the ultimate goal in feed-forward networks (perceptrons), it is of practical importance to understand the mechanism which control the amount of time required for learning, i. e. the dynamics of learning. Thisis of course particularly important in the case of a large data set. An exact analysis of this mechanism is possible for the linear perceptron and as usual it is hoped that the results to some extend may be carried over to explain the behaviour of nonlinear perceptrons.
Analytical Mean Squared Error Curves in Temporal Difference Learning
Singh, Satinder P., Dayan, Peter
We have calculated analytical expressions for how the bias and variance of the estimators provided by various temporal difference value estimation algorithms change with offline updates over trials in absorbing Markov chains using lookup table representations. We illustrate classes of learning curve behavior in various chains, and show the manner in which TD is sensitive to the choice of its stepsize andeligibility trace parameters. 1 INTRODUCTION A reassuring theory of asymptotic convergence is available for many reinforcement learning (RL) algorithms.
Statistical Mechanics of the Mixture of Experts
Kukjin Kang and Jong-Hoon Oh Department of Physics Pohang University of Science and Technology Hyoja San 31, Pohang, Kyongbuk 790-784, Korea Email: kkj.jhohOgalaxy.postech.ac.kr Abstract We study generalization capability of the mixture of experts learning fromexamples generated by another network with the same architecture. When the number of examples is smaller than a critical value,the network shows a symmetric phase where the role of the experts is not specialized. Upon crossing the critical point, the system undergoes a continuous phase transition to a symmetry breakingphase where the gating network partitions the input space effectively and each expert is assigned to an appropriate subspace. Wealso find that the mixture of experts with multiple level of hierarchy shows multiple phase transitions. 1 Introduction Recently there has been considerable interest among neural network community in techniques that integrate the collective predictions of a set of networks[l, 2, 3, 4]. The mixture of experts [1, 2] is a well known example which implements the philosophy ofdivide-and-conquer elegantly.
Viewpoint Invariant Face Recognition using Independent Component Analysis and Attractor Networks
Bartlett, Marian Stewart, Sejnowski, Terrence J.
We have explored two approaches to recogmzmg faces across changes in pose. First, we developed a representation of face images based on independent component analysis (ICA) and compared it to a principal component analysis (PCA) representation for face recognition. The ICA basis vectors for this data set were more spatially local than the PCA basis vectors and the ICA representation hadgreater invariance to changes in pose. Second, we present a model for the development of viewpoint invariant responses to faces from visual experience in a biological system. The temporal continuity of natural visual experience was incorporated into an attractor network model by Hebbian learning following a lowpass temporal filter on unit activities.