United States
Rules and Similarity in Concept Learning
This paper argues that two apparently distinct modes of generalizing concepts -abstracting rules and computing similarity to exemplars - should both be seen as special cases of a more general Bayesian learning framework. Bayesexplains the specific workings of these two modes - which rules are abstracted, how similarity is measured - as well as why generalization shouldappear rule-or similarity-based in different situations. This analysis also suggests why the rules/similarity distinction, even if not computationally fundamental, may still be useful at the algorithmic level as part of a principled approximation to fully Bayesian learning.
Lower Bounds on the Complexity of Approximating Continuous Functions by Sigmoidal Neural Networks
This is one of the theoretical results most frequently cited to justify the use of sigmoidal neural networks in applications. By this statement one refers to the fact that sigmoidal neural networks have been shown to be able to approximate any continuous function arbitrarily well. Numerous results in the literature have established variants of this universal approximation property by considering distinct function classes to be approximated by network architectures using different types of neural activation functions with respect to various approximation criteria, see for instance [1, 2, 3, 5, 6, 11, 12, 14, 15].
Reinforcement Learning Using Approximate Belief States
Rodriguez, Andres C., Parr, Ronald, Koller, Daphne
The problem of developing good policies for partially observable Markov decision problems (POMDPs) remains one of the most challenging areas ofresearch in stochastic planning. One line of research in this area involves the use of reinforcement learning with belief states, probability distributionsover the underlying model states. This is a promising methodfor small problems, but its application is limited by the intractability ofcomputing or representing a full belief state for large problems. Recent work shows that, in many settings, we can maintain an approximate belief state, which is fairly close to the true belief state. In particular, great success has been shown with approximate belief states that marginalize out correlations between state variables. In this paper, we investigate two methods of full belief state reinforcement learning and one novel method for reinforcement learning using factored approximate belief states. We compare the performance of these algorithms on several well-known problem from the literature. Our results demonstrate the importance ofapproximate belief state representations for large problems.
Neural System Model of Human Sound Localization
This paper examines the role of biological constraints in the human auditory localizationprocess. A psychophysical and neural system modeling approach was undertaken in which performance comparisons between competing models and a human subject explore the relevant biologically plausible"realism constraints". The directional acoustical cues, upon which sound localization is based, were derived from the human subject's head-related transfer functions (HRTFs). Sound stimuli were generated by convolving bandpass noise with the HRTFs and were presented toboth the subject and the model. The input stimuli to the model was processed using the Auditory Image Model of cochlear processing.
Online Independent Component Analysis with Local Learning Rate Adaptation
Schraudolph, Nicol N., Giannakopoulos, Xavier
Stochastic meta-descent (SMD) is a new technique for online adaptation oflocal learning rates in arbitrary twice-differentiable systems. Like matrix momentum it uses full second-order information while retaining O(n) computational complexity by exploiting the efficient computation of Hessian-vector products. Here we apply SMD to independent component analysis, and employ the resulting algorithmfor the blind separation of time-varying mixtures. By matching individual learning rates to the rate of change in each source signal's mixture coefficients, our technique is capable of simultaneously trackingsources that move at very different, a priori unknown speeds. 1 Introduction Independent component analysis (ICA) methods are typically run in batch mode in order to keep the stochasticity of the empirical gradient low. Often this is combined with a global learning rate annealing scheme that negotiates the tradeoff between fast convergence and good asymptotic performance.
Data Visualization and Feature Selection: New Algorithms for Nongaussian Data
Visualization of input data and feature selection are intimately related. A good feature selection algorithm can identify meaningful coordinate projections for low dimensional data visualization. Conversely, a good visualization technique can suggest meaningfulfeatures to include in a model. Input variable selection is the most important step in the model selection process. Given a target variable, a set of input variables can be selected as explanatory variables by some prior knowledge.
Broadband Direction-Of-Arrival Estimation Based on Second Order Statistics
Rosca, Justinian P., Ruanaidh, Joseph Ó, Jourjine, Alexander, Rickard, Scott
N wideband sources recorded using N closely spaced receivers can feasibly be separated based only on second order statistics when using a physical model of the mixing process. In this case we show that the parameter estimation problem can be essentially reduced to considering directions of arrival and attenuations of each signal. The paper presents two demixing methods operating in the time and frequency domain and experimentally shows that it is always possible to demix signals arriving at different angles. Moreover, one can use spatial cues to solve the channel selection problem and a post-processing Wiener filter to ameliorate the artifacts caused by demixing.
Information Factorization in Connectionist Models of Perception
Movellan, Javier R., McClelland, James L.
We examine a psychophysical law that describes the influence of stimulus and context on perception. According to this law choice probability ratios factorize into components independently controlled bystimulus and context. It has been argued that this pattern of results is incompatible with feedback models of perception. In this paper we examine this claim using neural network models defined via stochastic differential equations. We show that the law is related to a condition named channel separability and has little to do with the existence of feedback connections. In essence, channels areseparable if they converge into the response units without direct lateral connections to other channels and if their sensors are not directly contaminated by external inputs to the other channels. Implicationsof the analysis for cognitive and computational neurosicence are discussed.
Probabilistic Methods for Support Vector Machines
One of the open questions that remains is how to set the'tunable' parameters of an SVM algorithm: While methods forchoosing the width of the kernel function and the noise parameter C (which controls how closely the training data are fitted) have been proposed [4, 5] (see also, very recently, [6]), the effect of the overall shape of the kernel function remains imperfectly understood [1]. Error bars (class probabilities) for SVM predictions - important for safety-critical applications, for example - are also difficult to obtain. In this paper I suggest that a probabilistic interpretation of SVMs could be used to tackle these problems. It shows that the SVM kernel defines a prior over functions on the input space, avoiding the need to think in terms of high-dimensional feature spaces. It also allows one to define quantities such as the evidence (likelihood) for a set of hyperparameters (C, kernel amplitude Ko etc). I give a simple approximation to the evidence which can then be maximized to set such hyperparameters. The evidence is sensitive to the values of C and Ko individually, in contrast to properties (such as cross-validation error) of the deterministic solution, which only depends on the product CKo. It can thfrefore be used to assign an unambiguous value to C, from which error bars can be derived.