Goto

Collaborating Authors

 Asia


Adaptive On-line Learning in Changing Environments

Neural Information Processing Systems

An adaptive online algorithm extending the learning of learning idea is proposed and theoretically motivated. Relying only on gradient flowinformation it can be applied to learning continuous functions or distributions, even when no explicit loss function is given andthe Hessian is not available. Its efficiency is demonstrated for a non-stationary blind separation task of acoustic signals. 1 Introduction Neural networks provide powerful tools to capture the structure in data by learning. Often the batch learning paradigm is assumed, where the learner is given all training examplessimultaneously and allowed to use them as often as desired. In large practical applications batch learning is often experienced to be rather infeasible and instead online learning is employed.


Unification of Information Maximization and Minimization

Neural Information Processing Systems

In the present paper, we propose a method to unify information maximization and minimization in hidden units. The information maximization and minimization are performed on two different levels: collective and individual level. Thus, two kinds of information: collective and individual information are defined. By maximizing collective information and by minimizing individual information, simple networks can be generated in terms of the number of connections and the number of hidden units. Obtained networks are expected to give better generalization and improved interpretation of internal representations.


Learning Exact Patterns of Quasi-synchronization among Spiking Neurons from Data on Multi-unit Recordings

Neural Information Processing Systems

This paper develops arguments for a family of temporal log-linear models to represent spatiotemporal correlations among the spiking events in a group of neurons. The models can represent not just pairwise correlations but also correlations of higher order. Methods are discussed for inferring the existence or absence of correlations and estimating their strength. A frequentist and a Bayesian approach to correlation detection are compared.


Learning Exact Patterns of Quasi-synchronization among Spiking Neurons from Data on Multi-unit Recordings

Neural Information Processing Systems

This paper develops arguments for a family of temporal log-linear models to represent spatiotemporal correlations among the spiking events in a group of neurons. The models can represent not just pairwise correlations but also correlations of higher order. Methods are discussed for inferring the existence or absence of correlations and estimating their strength. A frequentist and a Bayesian approach to correlation detection are compared.


Microscopic Equations in Rough Energy Landscape for Neural Networks

Neural Information Processing Systems

We consider the microscopic equations for learning problems in neural networks. The aligning fields of an example are obtained from the cavity fields, which are the fields if that example were absent in the learning process. In a rough energy landscape, we assume that the density of the local minima obey an exponential distribution, yielding macroscopic properties agreeing with the first step replica symmetry breaking solution. Iterating the microscopic equations provide a learning algorithm, which results in a higher stability than conventional algorithms. 1 INTRODUCTION Most neural networks learn iteratively by gradient descent. As a result, closed expressions for the final network state after learning are rarely known. This precludes further analysis of their properties, and insights into the design of learning algorithms.


Consistent Classification, Firm and Soft

Neural Information Processing Systems

A classifier is called consistent with respect to a given set of classlabeled points if it correctly classifies the set. We consider classifiers defined by unions of local separators and propose algorithms for consistent classifier reduction. The expected complexities of the proposed algorithms are derived along with the expected classifier sizes. In particular, the proposed approach yields a consistent reduction of the nearest neighbor classifier, which performs "firm" classification, assigning each new object to a class, regardless of the data structure. The proposed reduction method suggests a notion of "soft" classification, allowing for indecision with respect to objects which are insufficiently or ambiguously supported by the data. The performances of the proposed classifiers in predicting stock behavior are compared to that achieved by the nearest neighbor method.


Learning Appearance Based Models: Mixtures of Second Moment Experts

Neural Information Processing Systems

This paper describes a new technique for object recognition based on learning appearance models. The image is decomposed into local regions which are described by a new texture representation called "Generalized Second Moments" thatare derived from the output of multiscale, multiorientation filter banks. Class-characteristic local texture features and their global composition is learned by a hierarchical mixture of experts architecture (Jordan & Jacobs). The technique is applied to a vehicle database consisting of 5 general car categories (Sedan, Van with backdoors, Van without backdoors, old Sedan, and Volkswagen Bug). This is a difficult problem with considerable in-class variation. The new technique has a 6.5% misclassification rate, compared to eigen-images which give 17.4% misclassification rate, and nearest neighbors which give 15 .7%


Second-order Learning Algorithm with Squared Penalty Term

Neural Information Processing Systems

This paper compares three penalty terms with respect to the efficiency ofsupervised learning, by using first-and second-order learning algorithms. Our experiments showed that for a reasonably adequate penaltyfactor, the combination of the squared penalty term and the second-order learning algorithm drastically improves the convergence performance more than 20 times over the other combinations, atthe same time bringing about a better generalization performance.


Maximum Likelihood Blind Source Separation: A Context-Sensitive Generalization of ICA

Neural Information Processing Systems

We cast the problem as one of maximum likelihood density estimation, andin that framework introduce an algorithm that searches for independent components using both temporal and spatial cues. We call the resulting algorithm "Contextual ICA," after the (Bell and Sejnowski 1995) Infomax algorithm, which we show to be a special case of cICA. Because cICA can make use of the temporal structure of its input, it is able separate in a number of situations where standard methods cannot, including sources with low kurtosis, coloredGaussian sources, and sources which have Gaussian histograms. 1 The Blind Source Separation Problem Consider a set of n indepent sources


Consistent Classification, Firm and Soft

Neural Information Processing Systems

A classifier is called consistent with respect to a given set of classlabeled pointsif it correctly classifies the set. We consider classifiers defined by unions of local separators and propose algorithms for consistent classifier reduction. The expected complexities of the proposed algorithms are derived along with the expected classifier sizes. In particular, the proposed approach yields a consistent reduction ofthe nearest neighbor classifier, which performs "firm" classification, assigning each new object to a class, regardless of the data structure. The proposed reduction method suggests a notion of "soft" classification, allowing for indecision with respect to objects which are insufficiently or ambiguously supported by the data. The performances of the proposed classifiers in predicting stockbehavior are compared to that achieved by the nearest neighbor method. 1 Introduction Certain classification problems, such as recognizing the digits of a hand written zipcode, requirethe assignment of each object to a class. Others, involving relatively small amounts of data and high risk, call for indecision until more data become available. Examples in such areas as medical diagnosis, stock trading and radar detection are well known. The training data for the classifier in both cases will correspond to firmly labeled members of the competing classes.