Statistical Learning
Spectroscopic Detection of Cervical Pre-Cancer through Radial Basis Function Networks
Tumer, Kagan, Ramanujam, Nirmala, Richards-Kortum, Rebecca R., Ghosh, Joydeep
The mortality related to cervical cancer can be substantially reduced throughearly detection and treatment. However, current detection techniques, such as Pap smear and colposcopy, fail to achieve a concurrently high sensitivity and specificity. In vivo fluorescence spectroscopy is a technique which quickly, noninvasively andquantitatively probes the biochemical and morphological changes that occur in precancerous tissue. RBF ensemble algorithms based on such spectra provide automated, and near realtime implementationof pre-cancer detection in the hands of nonexperts. Theresults are more reliable, direct and accurate than those achieved by either human experts or multivariate statistical algorithms. 1 Introduction Cervical carcinoma is the second most common cancer in women worldwide, exceeded onlyby breast cancer (Ramanujam et al., 1996). The mortality related to cervical cancer can be reduced if this disease is detected at the precancerous state, known as squamous intraepitheliallesion (SIL).
A Comparison between Neural Networks and other Statistical Techniques for Modeling the Relationship between Tobacco and Alcohol and Cancer
Plate, Tony, Band, Pierre, Bert, Joel, Grace, John
Tony Plate BC Cancer Agency 601 West 10th Ave, Epidemiology Vancouver BC Canada V5Z 1L3 tap@comp.vuw.ac.nz PierreBand BC Cancer Agency 601 West 10th Ave, Epidemiology Vancouver BC Canada V5Z 1L3 Joel Bert Dept of Chemical Engineering University of British Columbia 2216 Main Mall Vancouver BC Canada V6T 1Z4 JohnGrace Dept of Chemical Engineering University of British Columbia 2216 Main Mall Vancouver BC Canada V6T 1Z4 Abstract Epidemiological data is traditionally analyzed with very simple techniques. Flexible models, such as neural networks, have the potential to discover unanticipated features in the data. However, to be useful, flexible models must have effective control on overfitting. Thispaper reports on a comparative study of the predictive quality of neural networks and other flexible models applied to real and artificial epidemiological data.
Adaptive Access Control Applied to Ethernet Data
In a communication network in which traffic sources can be dynamically added or removed, an access controller must decide when to accept or reject a new traffic source based on whether, if added, acceptable service would be given to all carried sources. Unlike best-effort services such as the internet, we consider the case where traffic sources are given quality of service (QoS) guarantees such as maximum delay, delay variation, or loss rate. The goal of the controller is to accept the maximal number of users while guaranteeing QoS.To accommodate diverse sources such as constant bit rate voice, variablerate video, and bursty computer data, packet-based protocols are used. We consider QOS in terms of lost packets (Le.
Learning Appearance Based Models: Mixtures of Second Moment Experts
Bregler, Christoph, Malik, Jitendra
This paper describes a new technique for object recognition based on learning appearance models. The image is decomposed into local regions which are described by a new texture representation called "Generalized Second Moments" thatare derived from the output of multiscale, multiorientation filter banks. Class-characteristic local texture features and their global composition is learned by a hierarchical mixture of experts architecture (Jordan & Jacobs). The technique is applied to a vehicle database consisting of 5 general car categories (Sedan, Van with backdoors, Van without backdoors, old Sedan, and Volkswagen Bug). This is a difficult problem with considerable in-class variation. The new technique has a 6.5% misclassification rate, compared to eigen-images which give 17.4% misclassification rate, and nearest neighbors which give 15 .7%
Edges are the 'Independent Components' of Natural Scenes.
Bell, Anthony J., Sejnowski, Terrence J.
Field (1994) has suggested that neurons with line and edge selectivities found in primary visual cortex of cats and monkeys form a sparse, distributed representationof natural scenes, and Barlow (1989) has reasoned that such responses should emerge from an unsupervised learning algorithm that attempts to find a factorial code of independent visual features. We show here that nonlinear'infomax', when applied to an ensemble of natural scenes,produces sets of visual filters that are localised and oriented. Some of these filters are Gabor-like and resemble those produced by the sparseness-maximisation network of Olshausen & Field (1996). In addition, the outputs of these filters are as independent as possible, since the infomax networkis able to perform Independent Components Analysis (ICA). We compare the resulting ICA filters and their associated basis functions, with other decorrelating filters produced by Principal Components Analysis (PCA) and zero-phase whitening filters (ZCA). The ICA filters have more sparsely distributed (kurtotic) outputs on natural scenes. They also resemble thereceptive fields of simple cells in visual cortex, which suggests that these neurons form an information-theoretic coordinate system for images. 1 Introduction. Both the classic experiments of Rubel & Wiesel [8] on neurons in visual cortex, and several decadesof theorising about feature detection in vision, have left open the question most succinctly phrased by Barlow "Why do we have edge detectors?" That is: are there any coding principles which would predict the formation of localised, oriented receptive 832 A.1.
A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data
Miller, David J., Uyar, Hasan S.
We address statistical classifier design given a mixed training set consisting ofa small labelled feature set and a (generally larger) set of unlabelled features. This situation arises, e.g., for medical images, where although training features may be plentiful, expensive expertise is required toextract their class labels. We propose a classifier structure and learning algorithm that make effective use of unlabelled data to improve performance.The learning is based on maximization of the total data likelihood, i.e. over both the labelled and unlabelled data subsets. Twodistinct EM learning algorithms are proposed, differing in the EM formalism applied for unlabelled data. The classifier, based on a joint probability model for features and labels, is a "mixture of experts" structure that is equivalent to the radial basis function (RBF) classifier, but unlike RBFs, is amenable to likelihood-based training. The scope of application for the new method is greatly extended by the observation that test data, or any new data to classify, is in fact additional, unlabelled data - thus, a combined learning/classification operation - much akin to what is done in image segmentation - can be invoked whenever there is new data to classify. Experiments with data sets from the UC Irvine database demonstrate that the new learning algorithms and structure achieve substantial performance gains over alternative approaches.
Combining Neural Network Regression Estimates with Regularized Linear Weights
Merz, Christopher J., Pazzani, Michael J.
When combining a set of learned models to form an improved estimator, theissue of redundancy or multicollinearity in the set of models must be addressed. A progression of existing approaches and their limitations with respect to the redundancy is discussed. A new approach, PCR*, based on principal components regression isproposed to address these limitations. An evaluation of the new approach on a collection of domains reveals that: 1) PCR* was the most robust combination method as the redundancy of the learned models increased, 2) redundancy could be handled without eliminating any of the learned models, and 3) the principal components ofthe learned models provided a continuum of "regularized" weights from which PCR* could choose.
One-unit Learning Rules for Independent Component Analysis
Neural one-unit learning rules for the problem of Independent Component Analysis(ICA) and blind source separation are introduced. In these new algorithms, every ICA neuron develops into a separator thatfinds one of the independent components. The learning rules use very simple constrained Hebbianjanti-Hebbian learning in which decorrelating feedback may be added. To speed up the convergence of these stochastic gradient descent rules, a novel computationally efficientfixed-point algorithm is introduced. 1 Introduction Independent Component Analysis (ICA) (Comon, 1994; Jutten and Herault, 1991) is a signal processing technique whose goal is to express a set of random variables aslinear combinations of statistically independent component variables. The main applications of ICA are in blind source separation, feature extraction, and blind deconvolution.
Gaussian Processes for Bayesian Classification via Hybrid Monte Carlo
Barber, David, Williams, Christopher K. I.
The full Bayesian method for applying neural networks to a prediction problemis to set up the prior/hyperprior structure for the net and then perform the necessary integrals. However, these integrals arenot tractable analytically, and Markov Chain Monte Carlo (MCMC) methods are slow, especially if the parameter space is high-dimensional. Using Gaussian processes we can approximate the weight space integral analytically, so that only a small number of hyperparameters need be integrated over by MCMC methods. We have applied this idea to classification problems, obtaining excellent resultson the real-world problems investigated so far. 1 INTRODUCTION To make predictions based on a set of training data, fundamentally we need to combine our prior beliefs about possible predictive functions with the data at hand. In the Bayesian approach to neural networks a prior on the weights in the net induces a prior distribution over functions.
Consistent Classification, Firm and Soft
A classifier is called consistent with respect to a given set of classlabeled pointsif it correctly classifies the set. We consider classifiers defined by unions of local separators and propose algorithms for consistent classifier reduction. The expected complexities of the proposed algorithms are derived along with the expected classifier sizes. In particular, the proposed approach yields a consistent reduction ofthe nearest neighbor classifier, which performs "firm" classification, assigning each new object to a class, regardless of the data structure. The proposed reduction method suggests a notion of "soft" classification, allowing for indecision with respect to objects which are insufficiently or ambiguously supported by the data. The performances of the proposed classifiers in predicting stockbehavior are compared to that achieved by the nearest neighbor method. 1 Introduction Certain classification problems, such as recognizing the digits of a hand written zipcode, requirethe assignment of each object to a class. Others, involving relatively small amounts of data and high risk, call for indecision until more data become available. Examples in such areas as medical diagnosis, stock trading and radar detection are well known. The training data for the classifier in both cases will correspond to firmly labeled members of the competing classes.