Goto

Collaborating Authors

 Uncertainty



Modeling Time Varying Systems Using Hidden Control Neural Architecture

Neural Information Processing Systems

This paper introduces a generalization of the layered neural network that can implement a time-varying nonlinear mapping between its observable input and output. The variation of the network's mapping is due to an additional, hidden control input, while the network parameters remain unchanged. We proposed an algorithm for finding the network parameters and the hidden control sequence from a training set of examples of observable input and output. This algorithm implements an approximate maximum likelihood estimation of parameters of an equivalent statistical model, when only the dominant control sequence is taken into account. The conceptual difference between the proposed model and the HMM is that in the HMM approach, the observable data in each of the states is modeled as though it was produced by a memoryless source, and a parametric description of this source is obtained during training, while in the proposed model the observations in each state are produced by a nonlinear dynamical system driven by noise, and both the parametric form of the dynamics and the noise are estimated. The perfonnance of the model was illustrated for the tasks of nonlinear time-varying system modeling and continuously spoken digit recognition. The reported results show the potential of this model for providing high performance speech recognition capability. Acknowledgment Special thanks are due to N. Merhav for numerous comments and helpful discussions.


Asymptotic slowing down of the nearest-neighbor classifier

Neural Information Processing Systems

M2/n' for sufficiently large values of M. Here, Poo(error) denotes the probability of error in the infinite sample limit, and is at most twice the error of a Bayes classifier. Although the value of the coefficient a depends upon the underlying probability distributions, the exponent of M is largely distribution free. We thus obtain a concise relation between a classifier's ability to generalize from a finite reference sample and the dimensionality of the feature space, as well as an analytic validation of Bellman's well known "curse of dimensionality." 1 INTRODUCTION One of the primary tasks assigned to neural networks is pattern classification. Common applications include recognition problems dealing with speech, handwritten characters, DNA sequences, military targets, and (in this conference) sexual identity. Two fundamental concepts associated with pattern classification are generalization (how well does a classifier respond to input data it has never encountered before?) and scalability (how are a classifier's processing and training requirements affected by increasing the number of features that describe the input patterns?).


On Stochastic Complexity and Admissible Models for Neural Network Classifiers

Neural Information Processing Systems

For a detailed rationale the reader is referred to the work of Rissanen (1984) or Wallace and Freeman (1987) and the references therein. Note that the Minimum Description Length (MDL) technique (as Rissanen's approach has become known) is implicitly related to Maximum A Posteriori (MAP) Bayesian estimation techniques if cast in the appropriate framework.



Basis-Function Trees as a Generalization of Local Variable Selection Methods for Function Approximation

Neural Information Processing Systems

Function approximation on high-dimensional spaces is often thwarted by a lack of sufficient data to adequately "fill" the space, or lack of sufficient computational resources. The technique of local variable selection provides a partial solution to these problems by attempting to approximate functions locally using fewer than the complete set of input dimensions.



Modeling Time Varying Systems Using Hidden Control Neural Architecture

Neural Information Processing Systems

This paper introduces a generalization of the layered neural network that can implement a time-varying nonlinear mapping between its observable input and output. The variation of the network's mapping is due to an additional, hidden control input, while the network parameters remain unchanged. We proposed an algorithm for finding the network parameters and the hidden control sequence from a training set of examples of observable input and output. This algorithm implements an approximate maximum likelihood estimation of parameters of an equivalent statistical model, when only the dominant control sequence is taken into account. The conceptual difference between the proposed model and the HMM is that in the HMM approach, the observable data in each of the states is modeled as though it was produced by a memoryless source, and a parametric description of this source is obtained during training, while in the proposed model the observations in each state are produced by a nonlinear dynamical system driven by noise, and both the parametric form of the dynamics and the noise are estimated. The perfonnance of the model was illustrated for the tasks of nonlinear time-varying system modeling and continuously spoken digit recognition. The reported results show the potential of this model for providing high performance speech recognition capability. Acknowledgment Special thanks are due to N. Merhav for numerous comments and helpful discussions.


Modeling Time Varying Systems Using Hidden Control Neural Architecture

Neural Information Processing Systems

This paper introduces a generalization of the layered neural network that can implement a time-varying nonlinear mapping between its observable input and output. The variation of the network's mapping is due to an additional, hidden control input, while the network parameters remain unchanged. We proposed an algorithm for finding the network parameters and the hidden control sequence from a training set of examples of observable input and output. This algorithm implements an approximate maximum likelihood estimation of parameters of an equivalent statistical model, when only the dominant control sequence is taken into account. The conceptual difference between the proposed model and the HMM is that in the HMM approach, the observable data in each of the states is modeled as though it was produced by a memoryless source, and a parametric description of this source is obtained during training, while in the proposed model the observations in each state are produced by a nonlinear dynamical system driven by noise, and both the parametric form of the dynamics and the noise are estimated. The perfonnance of the model was illustrated for the tasks of nonlinear time-varying system modeling and continuously spoken digit recognition. The reported results show the potential of this model for providing high performance speech recognition capability. Acknowledgment Specialthanks are due to N. Merhav for numerous comments and helpful discussions.


Asymptotic slowing down of the nearest-neighbor classifier

Neural Information Processing Systems

Santosh S. Venkatesh Electrical Engineering University of Pennsylvania Philadelphia, PA 19104 If patterns are drawn from an n-dimensional feature space according to a probability distribution that obeys a weak smoothness criterion, we show that the probability that a random input pattern is misclassified by a nearest-neighbor classifier using M random reference patterns asymptotically satisfies a PM(error) "" Poo(error) M2/n' for sufficiently large values of M. Here, Poo(error) denotes the probability of error in the infinite sample limit, and is at most twice the error of a Bayes classifier. Although the value of the coefficient a depends upon the underlying probability distributions, the exponent of M is largely distribution free.We thus obtain a concise relation between a classifier's ability to generalize from a finite reference sample and the dimensionality of the feature space, as well as an analytic validation of Bellman's well known "curse of dimensionality." 1 INTRODUCTION One of the primary tasks assigned to neural networks is pattern classification.