Uncertainty
A Hybrid Linear/Nonlinear Approach to Channel Equalization Problems
Channel equalization problem is an important problem in high-speed communications. The sequences of symbols transmitted are distorted by neighboring symbols. Traditionally, the channel equalization problem is considered as a channel-inversion operation. One problem of this approach is that there is no direct correspondence between error probability and residual error produced by the channel inversion operation. In this paper, the optimal equalizer design is formulated as a classification problem. The optimal classifier can be constructed by Bayes decision rule. In general it is nonlinear. An efficient hybrid linear/nonlinear equalizer approach has been proposed to train the equalizer. The error probability of new linear/nonlinear equalizer has been shown to be better than a linear equalizer in an experimental channel. 1 INTRODUCTION
History-Dependent Attractor Neural Networks
Meilijson, Isaac, Ruppin, Eytan
We present a methodological framework enabling a detailed description of the performance of Hopfield-like attractor neural networks (ANN) in the first two iterations. Using the Bayesian approach, we find that performance is improved when a history-based term is included in the neuron's dynamics. A further enhancement of the network's performance is achieved by judiciously choosing the censored neurons (those which become active in a given iteration) on the basis of the magnitude of their post-synaptic potentials. The contribution of biologically plausible, censored, historydependent dynamics is especially marked in conditions of low firing activity and sparse connectivity, two important characteristics of the mammalian cortex. In such networks, the performance attained is higher than the performance of two'independent' iterations, which represents an upper bound on the performance of history-independent networks.
On the Use of Evidence in Neural Networks
The Bayesian "evidence" approximation has recently been employed to determine the noise and weight-penalty terms used in back-propagation. This paper shows that for neural nets it is far easier to use the exact result than it is to use the evidence approximation. Moreover, unlike the evidence approximation, the exact result neither has to be re-calculated for every new data set, nor requires the running of computer code (the exact result is closed form). In addition, it turns out that the evidence procedure's MAP estimate for neural nets is, in toto, approximation error. Another advantage of the exact analysis is that it does not lead one to incorrect intuition, like the claim that using evidence one can "evaluate different priors in light of the data". This paper also discusses sufficiency conditions for the evidence approximation to hold, why it can sometimes give "reasonable" results, etc.
Information, Prediction, and Query by Committee
Freund, Yoav, Seung, H. Sebastian, Shamir, Eli, Tishby, Naftali
We analyze the "query by committee" algorithm, a method for filtering informative queries from a random stream of inputs. We show that if the two-member committee algorithm achieves information gain with positive lower bound, then the prediction error decreases exponentially with the number of queries. We show that, in particular, this exponential decrease holds for query learning of thresholded smooth functions.
Bayesian Learning via Stochastic Dynamics
The attempt to find a single "optimal" weight vector in conventional network training can lead to overfitting and poor generalization. Bayesian methods avoid this, without the need for a validation set, by averaging the outputs of many networks with weights sampled from the posterior distribution given the training data. This sample can be obtained by simulating a stochastic dynamical system that has the posterior as its stationary distribution.
Improving Convergence in Hierarchical Matching Networks for Object Recognition
We are interested in the use of analog neural networks for recognizing visual objects. Objects are described by the set of parts they are composed of and their structural relationship. Structural models are stored in a database and the recognition problem reduces to matching data to models in a structurally consistent way. The object recognition problem is in general very difficult in that it involves coupled problems of grouping, segmentation and matching. We limit the problem here to the simultaneous labelling of the parts of a single object and the determination of analog parameters. This coupled problem reduces to a weighted match problem in which an optimizing neural network must minimize E(M, p) LO'i MO'i WO'i(p), where the {MO'd are binary match variables for data parts i to model parts a and {Wai(P)} are weights dependent on parameters p.
Learning Fuzzy Rule-Based Neural Networks for Control
Higgins, Charles M., Goodman, Rodney M.
A three-step method for function approximation with a fuzzy system is proposed. First, the membership functions and an initial rule representation are learned; second, the rules are compressed as much as possible using information theory; and finally, a computational network is constructed to compute the function value. This system is applied to two control examples: learning the truck and trailer backer-upper control system, and learning a cruise control system for a radio-controlled model car. 1 Introduction Function approximation is the problem of estimating a function from a set of examples of its independent variables and function value. If there is prior knowledge of the type of function being learned, a mathematical model of the function can be constructed and the parameters perturbed until the best match is achieved. However, if there is no prior knowledge of the function, a model-free system such as a neural network or a fuzzy system may be employed to approximate an arbitrary nonlinear function. A neural network's inherent parallel computation is efficient for speed; however, the information learned is expressed only in the weights of the network. The advantage of fuzzy systems over neural networks is that the information learned is expressed in terms of linguistic rules. In this paper, we propose a method for learning a complete fuzzy system to approximate example data.
Hidden Markov Model Induction by Bayesian Model Merging
Stolcke, Andreas, Omohundro, Stephen
This paper describes a technique for learning both the number of states and the topology of Hidden Markov Models from examples. The induction process starts with the most specific model consistent with the training data and generalizes by successively merging states. Both the choice of states to merge and the stopping criterion are guided by the Bayesian posterior probability. We compare our algorithm with the Baum-Welch method of estimating fixed-size models, and find that it can induce minimal HMMs from data in cases where fixed estimation does not converge or requires redundant parameters to converge. 1 INTRODUCTION AND OVERVIEW Hidden Markov Models (HMMs) are a well-studied approach to the modelling of sequence data. HMMs can be viewed as a stochastic generalization of finite-state automata, where both the transitions between states and the generation of output symbols are governed by probability distributions. HMMs have been important in speech recognition (Rabiner & Juang, 1986), cryptography, and more recently in other areas such as protein classification and alignment (Haussler, Krogh, Mian & SjOlander, 1992; Baldi, Chauvin, Hunkapiller & McClure, 1993). Practitioners have typically chosen the HMM topology by hand, so that learning the HMM from sample data means estimating only a fixed number of model parameters. The standard approach is to find a maximum likelihood (ML) or maximum a posteriori probability (MAP) estimate of the HMM parameters.
Bayesian Learning via Stochastic Dynamics
The attempt to find a single "optimal" weight vector in conventional networktraining can lead to overfitting and poor generalization. Bayesian methods avoid this, without the need for a validation set, by averaging the outputs of many networks with weights sampled from the posterior distribution given the training data. This sample can be obtained by simulating a stochastic dynamical system that has the posterior as its stationary distribution.
Information, Prediction, and Query by Committee
Freund, Yoav, Seung, H. Sebastian, Shamir, Eli, Tishby, Naftali
We analyze the "query by committee" algorithm, a method for filtering informativequeries from a random stream of inputs. We show that if the two-member committee algorithm achieves information gainwith positive lower bound, then the prediction error decreases exponentially with the number of queries. We show that, in particular, this exponential decrease holds for query learning of thresholded smooth functions.