Goto

Collaborating Authors

 Technology


Dynamics of Supervised Learning with Restricted Training Sets

Neural Information Processing Systems

We study the dynamics of supervised learning in layered neural networks, inthe regime where the size p of the training set is proportional to the number N of inputs. Here the local fields are no longer described by Gaussian distributions.


Blind Separation of Filtered Sources Using State-Space Approach

Neural Information Processing Systems

In this paper we present a novel approach to multichannel blind that both mixingseparation/generalized deconvolution, assuming and demixing models are described by stable linear state-space systems. Based on the minimization of Kullback-Leibler Divergence, we develop a novel learning algorithm to train the matrices in the output equation. To estimate the state of the demixing model, we introduce a new concept, called to numerically implement the Kalman filter.hidden Referany priori knowledge of to review papers [lJ and [5J for the current state of theory and methods in the field. There are several reasons why as blind deconvolution models.


Stationarity and Stability of Autoregressive Neural Network Processes

Neural Information Processing Systems

AR-NNs are a natural generalization of the classic linear autoregressive AR(p) process (2) See, e.g., Brockwell & Davis (1987) for a comprehensive introduction into AR and ARMA (autoregressive moving average) models. F. Leisch, A. Trapletti and K. Hornik 268 One of the most central questions in linear time series theory is the stationarity of the model, i.e., whether the probabilistic structure of the series is constant over time or at least asymptotically constant (when not started in equilibrium). Surprisingly, this question has not gained much interest in the NN literature, especially there are-up to our knowledge-no results giving conditions for the stationarity of AR NN models. There are results on the stationarity of Hopfield nets (Wang & Sheng, 1996), but these nets cannot be used to estimate conditional expectations for time series prediction. The rest of this paper is organized as follows: In Section 2 we recall some results from time series analysis and Markov chain theory defining the relationship between a time series and its associated Markov chain. In Section 3 we use these results to establish that standard AR-NN models without shortcut connections are stationary. We also give conditions for AR-NN models with shortcut connections to be stationary. Section 4 examines the NN modeling of an important class of non-stationary to the appendix.time


Maximum Conditional Likelihood via Bound Maximization and the CEM Algorithm

Neural Information Processing Systems

Advantages in feature selection, robustness andlimited resource allocation have been studied. Ultimately, tasks such as regression and classification reduce to the evaluation of a conditional density. However, popularity of maximumjoint likelihood and EM techniques remains strong in part due to their elegance and convergence properties. Thus, many conditional problems are solved by first estimating joint models then conditioning them.


A Polygonal Line Algorithm for Constructing Principal Curves

Neural Information Processing Systems

Principal curves have been defined as "self consistent" smooth curves which pass through the "middle" of a d-dimensional probability distribution ordata cloud. Recently, we [1] have offered a new approach by defining principal curves as continuous curves of a given length which minimize the expected squared distance between the curve and points of the space randomly chosen according to a given distribution. The new definition made it possible to carry out a theoretical analysis of learning principal curves from training data. In this paper we propose a practical construction based on the new definition. Simulation results demonstrate that the new algorithm compares favorably with previous methods both in terms of performance and computational complexity.


Optimizing Correlation Algorithms for Hardware-Based Transient Classification

Neural Information Processing Systems

The perfonnance of dedicated VLSI neural processing hardware depends critically on the design of the implemented algorithms. We have previously proposedan algorithm for acoustic transient classification [1]. Having implemented and demonstrated this algorithm in a mixed-mode architecture, we now investigate variants on the algorithm, using time and frequency channel differencing, input and output nonnalization, and schemes to binarize and train the template values, with the goal of achieving optimalclassification perfonnance for the chosen hardware.


Learning from Dyadic Data

Neural Information Processing Systems

Dyadzc data refers to a domain with two finite sets of objects in which observations are made for dyads, i.e., pairs with one element from either set. This type of data arises naturally in many application ranging from computational linguistics and information retrieval to preference analysis and computer vision. In this paper, we present a systematic, domain-independent framework of learning from dyadic data by statistical mixture models. Our approach covers different models with fiat and hierarchical latent class structures. We propose an annealed version of the standard EM algorithm for model fitting which is empirically evaluated on a variety of data sets from different domains. 1 Introduction Over the past decade learning from data has become a highly active field of research distributed over many disciplines like pattern recognition, neural computation, statistics, machine learning, and data mining.


Approximate Learning of Dynamic Models

Neural Information Processing Systems

Inference is a key component in learning probabilistic models from partially observable data. When learning temporal models, each of the many inference phases requires a traversal over an entire long data sequence; furthermore, the data structures manipulated are exponentially large, making this process computationally expensive. In [2], we describe an approximate inference algorithm for monitoring stochastic processes, and prove bounds on its approximation error. In this paper, we apply this algorithm as an approximate forward propagation step in an EM algorithm for learning temporal Bayesian networks. We provide a related approximation for the backward step, and prove error bounds for the combined algorithm.


Risk Sensitive Reinforcement Learning

Neural Information Processing Systems

A directed generative model for binary data using a small number of hidden continuous units is investigated. The relationships between the correlations of the underlying continuous Gaussian variables and the binary output variables are utilized to learn the appropriate weights of the network. The advantages of this approach are illustrated on a translationally invariant binary distribution and on handwritten digit images. Introduction Principal Components Analysis (PCA) is a widely used statistical technique for representing data with a large number of variables [1]. It is based upon the assumption that although the data is embedded in a high dimensional vector space, most of the variability in the data is captured by a much lower climensional manifold. In particular for PCA, this manifold is described by a linear hyperplane whose characteristic directions are given by the eigenvectors of the correlation matrix with the largest eigenvalues. The success of PCA and closely related techniques such as Factor Analysis (FA) and PCA mixtures clearly indicate that much real world data exhibit the low dimensional manifold structure assumed by these models [2, 3]. However, the linear manifold structure of PCA is not appropriate for data with binary valued variables.


Semi-Supervised Support Vector Machines

Neural Information Processing Systems

We introduce a semi-supervised support vector machine (S3yM) method. Given a training set of labeled data and a working set of unlabeled data, S3YM constructs a support vector machine using both the training and working sets. We use S3 YM to solve the transduction problem using overall risk minimization (ORM) posed by Yapnik. The transduction problem is to estimate the value of a classification function at the given points in the working set. This contrasts with the standard inductive learning problem of estimating the classification function at all possible values and then using the fixed function to deduce the classes of the working set data.