Asia
Optimizing Correlation Algorithms for Hardware-Based Transient Classification
Edwards, R. Timothy, Cauwenberghs, Gert, Pineda, Fernando J.
The perfonnance of dedicated VLSI neural processing hardware depends critically on the design of the implemented algorithms. We have previously proposedan algorithm for acoustic transient classification [1]. Having implemented and demonstrated this algorithm in a mixed-mode architecture, we now investigate variants on the algorithm, using time and frequency channel differencing, input and output nonnalization, and schemes to binarize and train the template values, with the goal of achieving optimalclassification perfonnance for the chosen hardware.
Maximum Conditional Likelihood via Bound Maximization and the CEM Algorithm
Advantages in feature selection, robustness andlimited resource allocation have been studied. Ultimately, tasks such as regression and classification reduce to the evaluation of a conditional density. However, popularity of maximumjoint likelihood and EM techniques remains strong in part due to their elegance and convergence properties. Thus, many conditional problems are solved by first estimating joint models then conditioning them.
Almost Linear VC Dimension Bounds for Piecewise Polynomial Networks
Bartlett, Peter L., Maiorov, Vitaly, Meir, Ron
VitalyMaiorov Department of Mathematics Technion, Haifa 32000 Israel Ron Meir Department of Electrical Engineering Technion, Haifa 32000 Israel rmeir@dumbo.technion.ac.il Abstract We compute upper and lower bounds on the VC dimension of feedforward networks of units with piecewise polynomial activation functions.We show that if the number of layers is fixed, then the VC dimension grows as W log W, where W is the number of parameters in the network. The VC dimension is an important measure of the complexity of a class of binaryvalued functions,since it characterizes the amount of data required for learning in the PAC setting (see [BEHW89, Vap82]). In this paper, we establish upper and lower bounds on the VC dimension of a specific class of multi-layered feedforward neural networks. Let F be the class of binary-valued functions computed by a feedforward neural network with W weights and k computational (non-input) units, each with a piecewise polynomial activation function. O(W2), which would lead one to conclude that the bounds Almost Linear VC Dimension Bounds for Piecewise Polynomial Networks 191 are in fact tight up to a constant.
Tractable Variational Structures for Approximating Graphical Models
Barber, David, Wiegerinck, Wim
Graphical models provide a broad probabilistic framework with applications inspeech recognition (Hidden Markov Models), medical diagnosis (Belief networks) and artificial intelligence (Boltzmann Machines). However, the computing time is typically exponential in the number of nodes in the graph. Within the variational framework forapproximating these models, we present two classes of distributions, decimatableBoltzmann Machines and Tractable Belief Networks that go beyond the standard factorized approach. We give generalised mean-field equations for both these directed and undirected approximations. Simulation results on a small benchmark problemsuggest using these richer approximations compares favorably against others previously reported in the literature. 1 Introduction Graphical models provide a powerful framework for probabilistic inference[l] but suffer intractability when applied to large scale problems.
SMEM Algorithm for Mixture Models
Ueda, Naonori, Nakano, Ryohei, Ghahramani, Zoubin, Hinton, Geoffrey E.
We present a split and merge EM (SMEM) algorithm to overcome the local maximum problem in parameter estimation of finite mixture models. In the case of mixture models, non-global maxima often involve having too many components of a mixture model in one part of the space and too few in another, widelyseparated part of the space. To escape from such configurations we repeatedly perform simultaneous split and merge operations using a new criterion for efficiently selecting the split and merge candidates. We apply the proposed algorithm to the training of Gaussian mixtures and mixtures of factor analyzers using synthetic and real data and show the effectiveness of using the split and merge operations to improve the likelihood of both the training data and of held-out test data. 1 INTRODUCTION Mixture density models, in particular normal mixtures, have been extensively used in the field of statistical pattern recognition [1]. Recently, more sophisticated mixture densitymodels such as mixtures of latent variable models (e.g., probabilistic PCA or factor analysis) have been proposed to approximate the underlying data manifold [2]-[4].
Inference in Multilayer Networks via Large Deviation Bounds
Kearns, Michael J., Saul, Lawrence K.
Arguably oneabilities of the most important types of information processing is the capacity for probabilistic reasoning. The properties of undirectedproDabilistic models represented as symmetric networks ethave been studied extensively using methods from statistical mechanics (Hertz aI, 1991). Detailed analyses of these models are possible by exploiting averaging that occur in the thermodynamic limit of large networks.phenomena In this paper, we analyze the limit of large, multilayer networks for probabilistic models represented as directed acyclic graphs. These models are known as Bayesian networks (Pearl, 1988; Neal, 1992), and they have different probabilistic semantics than symmetric neural networks (such as Hopfield models or Boltzmann machines). We show that the intractability of exact inference in multilayer Bayesian networks 261 Inference in Multilayer Networks via Large Deviation Bounds does not preclude their effective use. Our work builds on earlier studies of variational methods (Jordan et aI, 1997).
A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory
Suematsu, Nobuo, Hayashi, Akira
Since BLHT learns a stochastic model based on Bayesian Learning, the overfitting problemis reasonably solved. Moreover, BLHT has an efficient implementation. This paper shows that the model learned by BLHT converges toone which provides the most accurate predictions of percepts and rewards, given short-term memory. 1 INTRODUCTION Research on Reinforcement Learning (RL) problem forpartially observable environments is gaining more attention recently. This is mainly because the assumption that perfect and complete perception of the state of the environment is available for the learning agent, which many previous RL algorithms require, is not valid for many realistic environments.