Genre
A Generic Approach for Identification of Event Related Brain Potentials via a Competitive Neural Network Structure
Lange, Daniel H., Siegelmann, Hava T., Pratt, Hillel, Inbar, Gideon F.
We present a novel generic approach to the problem of Event Related Potential identification and classification, based on a competitive N eural Net architecture. The network weights converge to the embedded signal patterns, resulting in the formation of a matched filter bank. The network performance is analyzed via a simulation study, exploring identification robustness under low SNR conditions and compared to the expected performance from an information theoretic perspective. The classifier is applied to real event-related potential data recorded during a classic oddball type paradigm; for the first time, withinsession variable signal patterns are automatically identified, dismissing the strong and limiting requirement of a-priori stimulus-related selective grouping of the recorded data.
Extended ICA Removes Artifacts from Electroencephalographic Recordings
Jung, Tzyy-Ping, Humphries, Colin, Lee, Te-Won, Makeig, Scott, McKeown, Martin J., Iragui, Vicente, Sejnowski, Terrence J.
Severe contamination of electroencephalographic (EEG) activity by eye movements, blinks, muscle, heart and line noise is a serious problem for EEG interpretation and analysis. Rejecting contaminated EEG segments results in a considerable loss of information and may be impractical for clinical data. Many methods have been proposed to remove eye movement and blink artifacts from EEG recordings. Often regression in the time or frequency domain is performed on simultaneous EEG and electrooculographic (EOG) recordings to derive parameters characterizing the appearance and spread of EOG artifacts in the EEG channels. However, EOG records also contain brain signals [1, 2], so regressing out EOG activity inevitably involves subtracting a portion of the relevant EEG signal from each recording as well. Regression cannot be used to remove muscle noise or line noise, since these have no reference channels. Here, we propose a new and generally applicable method for removing a wide variety of artifacts from EEG records. The method is based on an extended version of a previous Independent Component Analysis (lCA) algorithm [3, 4] for performing blind source separation on linear mixtures of independent source signals with either sub-Gaussian or super-Gaussian distributions. Our results show that ICA can effectively detect, separate and remove activity in EEG records from a wide variety of artifactual sources, with results comparing favorably to those obtained using regression-based methods.
Hybrid NN/HMM-Based Speech Recognition with a Discriminant Neural Feature Extraction
Willett, Daniel, Rigoll, Gerhard
In this paper, we present a novel hybrid architecture for continuous speech recognition systems. It consists of a continuous HMM system extended by an arbitrary neural network that is used as a preprocessor that takes several frames of the feature vector as input to produce more discriminative feature vectors with respect to the underlying HMM system. This hybrid system is an extension of a state-of-the-art continuous HMM system, and in fact, it is the first hybrid system that really is capable of outperforming these standard systems with respect to the recognition accuracy. Experimental results show an relative error reduction of about 10% that we achieved on a remarkably good recognition system based on continuous HMMs for the Resource Management 1 OOO-word continuous speech recognition task.
Modeling Acoustic Correlations by Factor Analysis
Saul, Lawrence K., Rahim, Mazin G.
Hidden Markov models (HMMs) for automatic speech recognition rely on high dimensional feature vectors to summarize the shorttime properties of speech. Correlations between features can arise when the speech signal is non-stationary or corrupted by noise. We investigate how to model these correlations using factor analysis, a statistical method for dimensionality reduction. Factor analysis uses a small number of parameters to model the covariance structure of high dimensional data. These parameters are estimated by an Expectation-Maximization (EM) algorithm that can be embedded in the training procedures for HMMs.
Bayesian Robustification for Audio Visual Fusion
Movellan, Javier R., Mineiro, Paul
Department of Cognitive Science Department of Cognitive Science University of California, San Diego University of California, San Diego La Jolla, CA 92092-0515 La Jolla, CA 92092-0515 Abstract We discuss the problem of catastrophic fusion in multimodal recognition systems. This problem arises in systems that need to fuse different channels in non-stationary environments. Practice shows that when recognition modules within each modality are tested in contexts inconsistent with their assumptions, their influence on the fused product tends to increase, with catastrophic results. We explore a principled solution to this problem based upon Bayesian ideas of competitive models and inference robustification: each sensory channel is provided with simple white-noise context models, and the perceptual hypothesis and context are jointly estimated. Consequently, context deviations are interpreted as changes in white noise contamination strength, automatically adjusting the influence of the module.
Active Data Clustering
Hofmann, Thomas, Buhmann, Joachim M.
Active data clustering is a novel technique for clustering of proximity data which utilizes principles from sequential experiment design in order to interleave data generation and data analysis. The proposed active data sampling strategy is based on the expected value of information, a concept rooting in statistical decision theory. This is considered to be an important step towards the analysis of largescale data sets, because it offers a way to overcome the inherent data sparseness of proximity data.
Unsupervised On-line Learning of Decision Trees for Hierarchical Data Analysis
Held, Marcus, Buhmann, Joachim M.
An adaptive online algorithm is proposed to estimate hierarchical data structures for non-stationary data sources. The approach is based on the principle of minimum cross entropy to derive a decision tree for data clustering and it employs a metalearning idea (learning to learn) to adapt to changes in data characteristics. Its efficiency is demonstrated by grouping non-stationary artifical data and by hierarchical segmentation of LANDSAT images. 1 Introduction Unsupervised learning addresses the problem to detect structure inherent in unlabeled and unclassified data. N. The encoding usually is represented by an assignment matrix M (Mia), where Mia 1 if and only if Xi belongs to cluster L: 1 MiaV (Xi, Ya) measures the quality of a data partition, Le., optimal assignments and prototypes (M,y)OPt argminM,y1i (M,Y) minimize the inhomogeneity of clusters w.r.t. a given distance measure V. For reasons of simplicity we restrict the presentation to the ' sum-of-squared-error criterion V(x, y) To facilitate this minimization a deterministic annealing approach was proposed in [5] signments, which maps the discrete optimization problem, i.e. how to determine the data as via the Maximum Entropy Principle [2] to a continuous parameter es- Unsupervised Online Learning of Decision Trees for Data Analysis 515 timation problem.
Linear Concepts and Hidden Variables: An Empirical Study
Some learning techniques for classification tasks work indirectly, by first trying to fit a full probabilistic model to the observed data. Whether this is a good idea or not depends on the robustness with respect to deviations from the postulated model. We study this question experimentally in a restricted, yet nontrivial and interesting case: we consider a conditionally independent attribute (CIA) model which postulates a single binary-valued hidden variable z on which all other attributes (i.e., the target and the observables) depend. In this model, finding the most likely value of anyone variable (given known values for the others) reduces to testing a linear function of the observed values. We learn CIA with two techniques: the standard EM algorithm, and a new algorithm we develop based on covariances. We compare these, in a controlled fashion, against an algorithm (a version of Winnow) that attempts to find a good linear classifier directly. Our conclusions help delimit the fragility of using the CIA model for classification: once the data departs from this model, performance quickly degrades and drops below that of the directly-learned linear classifier.
Approximating Posterior Distributions in Belief Networks Using Mixtures
Bishop, Christopher M., Lawrence, Neil D., Jaakkola, Tommi, Jordan, Michael I.
Exact inference in densely connected Bayesian networks is computationally intractable, and so there is considerable interest in developing effective approximation schemes. One approach which has been adopted is to bound the log likelihood using a mean-field approximating distribution. While this leads to a tractable algorithm, the mean field distribution is assumed to be factorial and hence unimodal. In this paper we demonstrate the feasibility of using a richer class of approximating distributions based on mixtures of mean field distributions. We derive an efficient algorithm for updating the mixture parameters and apply it to the problem of learning in sigmoid belief networks. Our results demonstrate a systematic improvement over simple mean field theory as the number of mixture components is increased.
On-line Learning from Finite Training Sets in Nonlinear Networks
Online learning is one of the most common forms of neural network training. We present an analysis of online learning from finite training sets for nonlinear networks (namely, soft-committee machines), advancing the theory to more realistic learning scenarios. Dynamical equations are derived for an appropriate set of order parameters; these are exact in the limiting case of either linear networks or infinite training sets. Preliminary comparisons with simulations suggest that the theory captures some effects of finite training sets, but may not yet account correctly for the presence of local minima.