Technology
Combining Neural Network Regression Estimates with Regularized Linear Weights
Merz, Christopher J., Pazzani, Michael J.
When combining a set of learned models to form an improved estimator, theissue of redundancy or multicollinearity in the set of models must be addressed. A progression of existing approaches and their limitations with respect to the redundancy is discussed. A new approach, PCR*, based on principal components regression isproposed to address these limitations. An evaluation of the new approach on a collection of domains reveals that: 1) PCR* was the most robust combination method as the redundancy of the learned models increased, 2) redundancy could be handled without eliminating any of the learned models, and 3) the principal components ofthe learned models provided a continuum of "regularized" weights from which PCR* could choose.
Bangs, Clicks, Snaps, Thuds and Whacks: An Architecture for Acoustic Transient Processing
Pineda, Fernando J., Cauwenberghs, Gert, Edwards, R. Timothy
We show how judicious normalization of a time-frequency signal allows an elegant and robust implementation of a correlation algorithm. The algorithm uses binary multiplexing instead of analog-analog multiplication. This removes the need for analog storage and analog-multiplication. Simulations show that the resulting algorithm has the same out-of-sample classification performance (-93% correct) as a baseline template-matching algorithm.
An Apobayesian Relative of Winnow
Littlestone, Nick, Mesterharm, Chris
We study a mistake-driven variant of an online Bayesian learning algorithm(similar to one studied by Cesa-Bianchi, Helmbold, and Panizza [CHP96]). This variant only updates its state (learns) on trials in which it makes a mistake. The algorithm makes binary classifications using a linear-threshold classifier and runs in time linear inthe number of attributes seen by the learner. We have been able to show, theoretically and in simulations, that this algorithm performs well under assumptions quite different from those embodied inthe prior of the original Bayesian algorithm. It can handle situations that we do not know how to handle in linear time with Bayesian algorithms. We expect our techniques to be useful in deriving and analyzing other apobayesian algorithms. 1 Introduction We consider two styles of online learning.
Hidden Markov Decision Trees
Jordan, Michael I., Ghahramani, Zoubin, Saul, Lawrence K.
We study a time series model that can be viewed as a decision tree with Markov temporal structure. The model is intractable for exact calculations, thus we utilize variational approximations. We consider three different distributions for the approximation: one in which the Markov calculations are performed exactly and the layers of the decision tree are decoupled, one in which the decision tree calculations are performed exactly and the time steps of the Markov chain are decoupled, and one in which a Viterbi-like assumption is made to pick out a single most likely state sequence.
Visual Cortex Circuitry and Orientation Tuning
Mundel, Trevor, Dimitrov, Alexander, Cowan, Jack D.
A simple mathematical model for the large-scale circuitry of primary visualcortex is introduced. It is shown that a basic cortical architecture of recurrent local excitation and lateral inhibition canaccount quantitatively for such properties as orientation tuning.The model can also account for such local effects as cross-orientation suppression. It is also shown that nonlocal state-dependent coupling between similar orientation patches, when added to the model, can satisfactorily reproduce such effects asnon-local iso--orientation suppression, and non-local crossorientation enhancement.Following this an account is given of perceptual phenomena involving object segmentation, such as "popout", and the direct and indirect tilt illusions.
GTM: A Principled Alternative to the Self-Organizing Map
Bishop, Christopher M., Svensén, Markus, Williams, Christopher K. I.
The Self-Organizing Map (SOM) algorithm has been extensively studied and has been applied with considerable success to a wide variety of problems. However, the algorithm is derived from heuristic ideasand this leads to a number of significant limitations. In this paper, we consider the problem of modelling the probability densityof data in a space of several dimensions in terms of a smaller number of latent, or hidden, variables. We introduce a novel form of latent variable model, which we call the GTM algorithm (forGenerative Topographic Mapping), which allows general nonlinear transformations from latent space to data space, and which is trained using the EM (expectation-maximization) algorithm. Ourapproach overcomes the limitations of the SOM, while introducing no significant disadvantages. We demonstrate the performance ofthe GTM algorithm on simulated data from flow diagnostics for a multiphase oil pipeline.
Size of Multilayer Networks for Exact Learning: Analytic Approach
Elisseeff, André, Paugam-Moisy, Hélène
The architecture of the network is feedforward, with one hidden layer and several outputs. Starting from a fixed training set, we consider the network as a function of its weights. We derive, for a wide family of transfer functions, a lower and an upper bound on the number of hidden units for exact learning, given the size of the dataset and the dimensions of the input and output spaces. 1 RELATED WORKS The context of our work is rather similar to the well-known results of Baum et al. [1, 2,3,5, 10], but we consider both real inputs and outputs, instead ofthe dichotomies usually addressed. We are interested in learning exactly all the examples of a fixed database, hence our work is different from stating that multilayer networks are universal approximators [6, 8, 9]. Since we consider real outputs and not only dichotomies, it is not straightforward to compare our results to the recent works about the VC-dimension of multilayer networks [11, 12, 13]. Our study is more closely related to several works of Sontag [14, 15], but with different hypotheses on the transfer functions of the units. Finally, our approach is based on geometrical considerations and is close to the model of Coetzee and Stonick [4]. First we define the model of network and the notations and second we develop our analytic approach and prove the fundamental theorem. In the last section, we discuss our point of view and propose some practical consequences of the result.
Triangulation by Continuous Embedding
Meila, Marina, Jordan, Michael I.
When triangulating a belief network we aim to obtain a junction tree of minimum state space. According to (Rose, 1970), searching for the optimal triangulation can be cast as a search over all the permutations of the graph's vertices. Our approach is to embed the discrete set of permutations in a convex continuous domain D. By suitably extending the cost function over D and solving the continous nonlinear optimization task we hope to obtain a good triangulation with respect to the aformentioned cost. This paper presents two ways of embedding the triangulation problem into continuous domain and shows that they perform well compared to the best known heuristic.
Probabilistic Interpretation of Population Codes
Zemel, Richard S., Dayan, Peter, Pouget, Alexandre
We present a theoretical framework for population codes which generalizes naturally to the important case where the population provides information about a whole probability distribution over an underlying quantity rather than just a single value. We use the framework to analyze two existing models, and to suggest and evaluate a third model for encoding such probability distributions. 1 Introduction Population codes, where information is represented in the activities of whole populations ofunits, are ubiquitous in the brain. There has been substantial work on how animals should and/or actually do extract information about the underlying encoded quantity.