Accuracy
Supervised Classification of Flow Cytometric Samples via the Joint Clustering and Matching (JCM) Procedure
Lee, Sharon X., McLachlan, Geoffrey J., Pyne, Saumyadipta
We consider the use of the Joint Clustering and Matching (JCM) procedure for the supervised classification of a flow cytometric sample with respect to a number of predefined classes of such samples. The JCM procedure has been proposed as a method for the unsupervised classification of cells within a sample into a number of clusters and in the case of multiple samples, the matching of these clusters across the samples. The two tasks of clustering and matching of the clusters are performed simultaneously within the JCM framework. In this paper, we consider the case where there is a number of distinct classes of samples whose class of origin is known, and the problem is to classify a new sample of unknown class of origin to one of these predefined classes. For example, the different classes might correspond to the types of a particular disease or to the various health outcomes of a patient subsequent to a course of treatment. We show and demonstrate on some real datasets how the JCM procedure can be used to carry out this supervised classification task. A mixture distribution is used to model the distribution of the expressions of a fixed set of markers for each cell in a sample with the components in the mixture model corresponding to the various populations of cells in the composition of the sample. For each class of samples, a class template is formed by the adoption of random-effects terms to model the inter-sample variation within a class. The classification of a new unclassified sample is undertaken by assigning the unclassified sample to the class that minimizes the Kullback-Leibler distance between its fitted mixture density and each class density provided by the class templates.
Deterministic Bayesian Information Fusion and the Analysis of its Performance
Sensor networks are ubiquitous across many different domains, including wireless communications, temperature and process control, area surveillance, object tracking and numerous other fields [2, 6]. Large performance gains can be achieved in such networks by performing data fusion between the sensors, or combining information from the individual sensors to reach system-level decisions [9, 16, 24, 26]. The sensors are typically connected by wireless links to either a separate information collector (centralized fusion) or to each other (distributed fusion). Elementary fusion rules based on Boolean logic are used in many contexts due to their simplicity and ease of implementation. On the other hand, in most situations we have some knowledge of the statistical properties of the sensors' outputs, and designing fusion rules that take this into account can provide much better performance [17, 24]. The fusion rule can be built to satisfy any of various statistical optimality criteria, such as achieving the maximum likelihood or the minimum Bayes risk, under any other constraints of the problem [17].
Understanding Touch Gestures on a Humanoid Robot
Lawson, Wallace E. (Naval Research Lab) | Sullivan, Keith (Excelis) | Trafton, Greg (Naval Research Lab)
Touch can be a powerful means of communication especially when it is combined with other sensing modalities, such as speech. The challenge on a humanoid robot is to sense touch in a way that can be sensitive to subtle cues, such as the hand used and amount of force applied. We propose a novel combination of sensing modalities to extract touch information. We extract hand information using the Leap Motion active sensor, then determine force information from force sensitive resistors. We combine these sensing modalities at the feature level, then train a support vector machine to recognize specific touch gestures. We demonstrate a high level of accuracy recognizing four different touch gestures from the firefighting domain.
Learning Pronunciation and Accent from The Crowd
Liu, Frederick (National Taiwan University) | Yang, Jeremy Chiaming (National Taiwan University) | Hsu, Jane Yung-jen (National Taiwan University)
Learning a second language is becoming a more popular trend around the world. But the act of learning another language in a place removed from native speakers is difficult as there is often no one to correct mistakes nor examples to imitate. With the idea of crowd sourcing, we would like to propose an efficient way to learn a second language better.
Estimating the Accuracies of Multiple Classifiers Without Labeled Data
Jaffe, Ariel, Nadler, Boaz, Kluger, Yuval
In various situations one is given only the predictions of multiple classifiers over a large unlabeled test data. This scenario raises the following questions: Without any labeled data and without any a-priori knowledge about the reliability of these different classifiers, is it possible to consistently and computationally efficiently estimate their accuracies? Furthermore, also in a completely unsupervised manner, can one construct a more accurate unsupervised ensemble classifier? In this paper, focusing on the binary case, we present simple, computationally efficient algorithms to solve these questions. Furthermore, under standard classifier independence assumptions, we prove our methods are consistent and study their asymptotic error. Our approach is spectral, based on the fact that the off-diagonal entries of the classifiers' covariance matrix and 3-d tensor are rank-one. We illustrate the competitive performance of our algorithms via extensive experiments on both artificial and real datasets.
An ensemble-based system for automatic screening of diabetic retinopathy
In this paper, an ensemble-based method for the screening of diabetic retinopathy (DR) is proposed. This approach is based on features extracted from the output of several retinal image processing algorithms, such as image-level (quality assessment, pre-screening, AM/FM), lesion-specific (microaneurysms, exudates) and anatomical (macula, optic disc) components. The actual decision about the presence of the disease is then made by an ensemble of machine learning classifiers. We have tested our approach on the publicly available Messidor database, where 90% sensitivity, 91% specificity and 90% accuracy and 0.989 AUC are achieved in a disease/no-disease setting. These results are highly competitive in this field and suggest that retinal image processing is a valid approach for automatic DR screening.
An Ensemble-based System for Microaneurysm Detection and Diabetic Retinopathy Grading
Reliable microaneurysm detection in digital fundus images is still an open issue in medical image processing. We propose an ensemble-based framework to improve microaneurysm detection. Unlike the well-known approach of considering the output of multiple classifiers, we propose a combination of internal components of microaneurysm detectors, namely preprocessing methods and candidate extractors. We have evaluated our approach for microaneurysm detection in an online competition, where this algorithm is currently ranked as first and also on two other databases. Since microaneurysm detection is decisive in diabetic retinopathy grading, we also tested the proposed method for this task on the publicly available Messidor database, where a promising AUC 0.90 with 0.01 uncertainty is achieved in a 'DR/non-DR'-type classification based on the presence or absence of the microaneurysms.
The Falling Factorial Basis and Its Statistical Applications
Wang, Yu-Xiang, Smola, Alex, Tibshirani, Ryan J.
We study a novel spline-like basis, which we name the "falling factorial basis", bearing many similarities to the classic truncated power basis. The advantage of the falling factorial basis is that it enables rapid, linear-time computations in basis matrix multiplication and basis matrix inversion. The falling factorial functions are not actually splines, but are close enough to splines that they provably retain some of the favorable properties of the latter functions. We examine their application in two problems: trend filtering over arbitrary input points, and a higher-order variant of the two-sample Kolmogorov-Smirnov test.
A General Statistic Framework for Genome-based Disease Risk Prediction
Ma, L., Lin, N., Amos, C. I., Xiong, M. M.
Advances of modern sensing and sequencing technologies generate a deluge of high dimensional space-temporal physiological and next-generation sequencing (NGS) data. Physiological traits are observed either as continuous random functions, or on a dense grid and referred to as function-valued traits. Both physiological and NGS data are highly correlated data with their inherent order, spacing, and functional nature which are ignored by traditional summary-based univariate and multivariate regression methods designed for quantitative genetic analysis of scalar trait and common variants. To capture morphological and dynamic features of the data and utilize their dependent structure, we propose a functional linear model (FLM) in which a trait curve is modeled as a response function, the genetic variation in a genomic region or gene is modeled as a functional predictor, and the genetic effects are modeled as a function of both time and genomic position (FLMF) for genetic analysis of function-valued trait with both GWAS and NGS data. By extensive simulations, we demonstrate that the FLMF has the correct type 1 error rates and much higher power to detect association than the existing methods. The FLMF is applied to sleep data from Starr County health studies where oxygen saturation were measured in 22,670 seconds on average for 833 individuals. We found 65 genes that were significantly associated with oxygen saturation functional trait with P-values ranging from 2.40E-06 to 2.53E-21. The results clearly demonstrate that the FLMF substantially outperforms the traditional genetic models with scalar trait.
Learning-Assisted Automated Reasoning with Flyspeck
Kaliszyk, Cezary, Urban, Josef
The considerable mathematical knowledge encoded by the Flyspeck project is combined with external automated theorem provers (ATPs) and machine-learning premise selection methods trained on the proofs, producing an AI system capable of answering a wide range of mathematical queries automatically. The performance of this architecture is evaluated in a bootstrapping scenario emulating the development of Flyspeck from axioms to the last theorem, each time using only the previous theorems and proofs. It is shown that 39% of the 14185 theorems could be proved in a push-button mode (without any high-level advice and user interaction) in 30 seconds of real time on a fourteen-CPU workstation. The necessary work involves: (i) an implementation of sound translations of the HOL Light logic to ATP formalisms: untyped first-order, polymorphic typed first-order, and typed higher-order, (ii) export of the dependency information from HOL Light and ATP proofs for the machine learners, and (iii) choice of suitable representations and methods for learning from previous proofs, and their integration as advisors with HOL Light. This work is described and discussed here, and an initial analysis of the body of proofs that were found fully automatically is provided.