Information Technology
Continuous Sigmoidal Belief Networks Trained using Slice Sampling
These include Boltzmann machines (Hinton and Sejnowski 1986),binary sigmoidal belief networks (Neal 1992) and Helmholtz machines (Hinton et al. 1995; Dayan et al. 1995). However, some hidden variables, such as translation or scaling in images of shapes, are best represented using continuous values.Continuous-valued Boltzmann machines have been developed (Movellan and McClelland 1993), but these suffer from long simulation settling times and the requirement of a "negative phase" during learning. Tibshirani (1992) and Bishop et al. (1996) consider learning mappings from a continuous latent variable space to a higher-dimensional input space. MacKay (1995) has developed "density networks" that can model both continuous and categorical latent spaces using stochasticity at the topmost network layer. In this paper I consider a new hierarchical top-down connectionist model that has stochastic hidden variables at all layers; moreover, these variables can adapt to be continuous or categorical. The proposed top-down model can be viewed as a continuous-valued belief network, whichcan be simulated by performing a quick top-down pass (Pearl 1988).
A Comparison between Neural Networks and other Statistical Techniques for Modeling the Relationship between Tobacco and Alcohol and Cancer
Plate, Tony, Band, Pierre, Bert, Joel, Grace, John
PierreBand BC Cancer Agency 601 West 10th Ave, Epidemiology Vancouver BC Canada V5Z 1L3 Joel Bert Dept of Chemical Engineering University of British Columbia 2216 Main Mall Vancouver BC Canada V6T 1Z4 JohnGrace Dept of Chemical Engineering University of British Columbia 2216 Main Mall Vancouver BC Canada V6T 1Z4 Abstract Epidemiological data is traditionally analyzed with very simple techniques. Flexible models, such as neural networks, have the potential to discover unanticipated features in the data. However, to be useful, flexible models must have effective control on overfitting. Thispaper reports on a comparative study of the predictive quality of neural networks and other flexible models applied to real and artificial epidemiological data. The results suggest that there are no major unanticipated complex features in the real data, and also demonstrate that MacKay's [1995] Bayesian neural network methodology provides effective control on overfitting while retaining theability to discover complex features in the artificial data. 1 Introduction Traditionally, very simple statistical techniques are used in the analysis of epidemiological studies.The predominant technique is logistic regression, in which the effects of predictors are linear (or categorical) and additive on the log-odds scale.
Edges are the 'Independent Components' of Natural Scenes.
Bell, Anthony J., Sejnowski, Terrence J.
Field (1994) has suggested that neurons with line and edge selectivities found in primary visual cortex of cats and monkeys form a sparse, distributed representationof natural scenes, and Barlow (1989) has reasoned that such responses should emerge from an unsupervised learning algorithm that attempts to find a factorial code of independent visual features. We show here that nonlinear'infomax', when applied to an ensemble of natural scenes,produces sets of visual filters that are localised and oriented. Some of these filters are Gabor-like and resemble those produced by the sparseness-maximisation network of Olshausen & Field (1996). In addition, the outputs of these filters are as independent as possible, since the infomax networkis able to perform Independent Components Analysis (ICA). We compare the resulting ICA filters and their associated basis functions, with other decorrelating filters produced by Principal Components Analysis (PCA) and zero-phase whitening filters (ZCA). The ICA filters have more sparsely distributed (kurtotic) outputs on natural scenes. They also resemble thereceptive fields of simple cells in visual cortex, which suggests that these neurons form an information-theoretic coordinate system for images. 1 Introduction. Both the classic experiments of Rubel & Wiesel [8] on neurons in visual cortex, and several decadesof theorising about feature detection in vision, have left open the question most succinctly phrased by Barlow "Why do we have edge detectors?" That is: are there any coding principles which would predict the formation of localised, oriented receptive 832 A.1.
A Variational Principle for Model-based Morphing
Saul, Lawrence K., Jordan, Michael I.
Given a multidimensional data set and a model of its density, we consider how to define the optimal interpolation between two points. This is done by assigning a cost to each path through space, based on two competing goals-one to interpolate through regions of high density, the other to minimize arc length. From this path functional, we derive the Euler-Lagrange equations for extremal motionj given two points, the desired interpolation is found by solving aboundary value problem. We show that this interpolation can be done efficiently, in high dimensions, for Gaussian, Dirichlet, and mixture models. 1 Introduction The problem of nonlinear interpolation arises frequently in image, speech, and signal processing. Consider the following two examples: (i) given two profiles of the same face, connect them by a smooth animation of intermediate poses[l]j (ii) given a telephone signal masked by intermittent noise, fill in the missing speech. Both these examples may be viewed as instances of the same abstract problem. In qualitative terms, we can state the problem as follows[2]: given a multidimensional data set, and two points from this set, find a smooth adjoining path that is consistent with available models of the data. We will refer to this as the problem of model-based morphing. In this paper, we examine this problem it arises from statistical models of multidimensional data.Specifically, our focus is on models that have been derived from Current address: AT&T Labs, 600 Mountain Ave 2D-439, Murray Hill, NJ 07974 268 LK.
Clustering Sequences with Hidden Markov Models
This paper discusses a probabilistic model-based approach to clustering sequences,using hidden Markov models (HMMs) . The problem can be framed as a generalization of the standard mixture model approach to clustering in feature space. Two primary issues are addressed. First, a novel parameter initialization procedure is proposed, and second, the more difficult problem of determining the number of clusters K, from the data, is investigated. Experimental resultsindicate that the proposed techniques are useful for revealing hidden cluster structure in data sets of sequences.
Ensemble Methods for Phoneme Classification
Waterhouse, Steve R., Cook, Gary
There is now considerable interest in using ensembles or committees of learning machines to improve the performance of the system over that of a single learning machine. In most neural network ensembles, the ensemble members are trained on either the same data (Hansen & Salamon 1990) or different subsets of the data (Perrone & Cooper 1993) . The ensemble members typically have different initial conditions and/ordifferent architectures. The subsets of the data may be chosen at random, with prior knowledge or by some principled approach e.g.
Self-Organizing and Adaptive Algorithms for Generalized Eigen-Decomposition
Chatterjee, Chanchal, Roychowdhury, Vwani P.
The paper is developed in two parts where we discuss a new approach to self-organization in a single-layer linear feed-forward network. First, two novel algorithms for self-organization are derived from a two-layer linear hetero-associative network performing a one-of-m classification, and trained with the constrained least-mean-squared classification error criterion. Second, two adaptive algorithms are derived from these selforganizing proceduresto compute the principal generalized eigenvectors of two correlation matrices from two sequences of random vectors. These novel adaptive algorithms can be implemented in a single-layer linear feed-forward network. We give a rigorous convergence analysis of the adaptive algorithms by using stochastic approximation theory. As an example, we consider a problem of online signal detection in digital mobile communications.
Representing Face Images for Emotion Classification
Padgett, Curtis, Cottrell, Garrison W.
Curtis Padgett Department of Computer Science University of California, San Diego La Jolla, CA 92034 GarrisonCottrell Department of Computer Science University of California, San Diego La Jolla, CA 92034 Abstract We compare the generalization performance of three distinct representation schemesfor facial emotions using a single classification strategy (neural network). The face images presented to the classifiers arerepresented as: full face projections of the dataset onto their eigenvectors (eigenfaces); a similar projection constrained to eye and mouth areas (eigenfeatures); and finally a projection of the eye and mouth areas onto the eigenvectors obtained from 32x32 random image patches from the dataset. The latter system achieves 86% generalization on novel face images (individuals the networks were not trained on) drawn from a database in which human subjects consistentlyidentify a single emotion for the face . 1 Introduction Some of the most successful research in machine perception of complex natural image objects (like faces), has relied heavily on reduction strategies that encode an object as a set of values that span the principal component subspace of the object's images [Cottrell and Metcalfe, 1991, Pentland et al., 1994]. This approach has gained wide acceptance for its success in classification, for the efficiency in which the eigenvectors can be calculated, and because the technique permits an implementation thatis biologically plausible. The procedure followed in generating these face representations requires normalizing a large set of face views (" mugshots") and from these, identifying a statistically relevant subspace.
Spectroscopic Detection of Cervical Pre-Cancer through Radial Basis Function Networks
Tumer, Kagan, Ramanujam, Nirmala, Richards-Kortum, Rebecca R., Ghosh, Joydeep
The mortality related to cervical cancer can be substantially reduced throughearly detection and treatment. However, current detection techniques, such as Pap smear and colposcopy, fail to achieve a concurrently high sensitivity and specificity. In vivo fluorescence spectroscopy is a technique which quickly, noninvasively andquantitatively probes the biochemical and morphological changes that occur in precancerous tissue. RBF ensemble algorithms based on such spectra provide automated, and near realtime implementationof pre-cancer detection in the hands of nonexperts. Theresults are more reliable, direct and accurate than those achieved by either human experts or multivariate statistical algorithms. 1 Introduction Cervical carcinoma is the second most common cancer in women worldwide, exceeded onlyby breast cancer (Ramanujam et al., 1996). The mortality related to cervical cancer can be reduced if this disease is detected at the precancerous state, known as squamous intraepitheliallesion (SIL).