Performance Analysis
Maximin affinity learning of image segmentation
Briggman, Kevin, Denk, Winfried, Seung, Sebastian, Helmstaedter, Moritz N., Turaga, Srinivas C.
Images can be segmented by first using a classifier to predict an affinity graph that reflects the degree to which image pixels must be grouped together and then partitioning the graph to yield a segmentation. Machine learning has been applied to the affinity classifier to produce affinity graphs that are good in the sense of minimizing edge misclassification rates. However, this error measure is only indirectly related to the quality of segmentations produced by ultimately partitioning the affinity graph. We present the first machine learning algorithm for training a classifier to produce affinity graphs that are good in the sense of producing segmentations that directly minimize the Rand index, a well known segmentation performance measure. The Rand index measures segmentation performance by quantifying the classification of the connectivity of image pixel pairs after segmentation. By using the simple graph partitioning algorithm of finding the connected components of the thresholded affinity graph, we are able to train an affinity classifier to directly minimize the Rand index of segmentations resulting from the graph partitioning. Our learning algorithm corresponds to the learning of maximin affinities between image pixel pairs, which are predictive of the pixel-pair connectivity.
Positive Semidefinite Metric Learning with Boosting
Shen, Chunhua, Kim, Junae, Wang, Lei, Hengel, Anton
The learning of appropriate distance metrics is a critical problem in classification. In this work, we propose a boosting-based technique, termed BoostMetric, for learning a Mahalanobis distance metric. One of the primary difficulties in learning such a metric is to ensure that the Mahalanobis matrix remains positive semidefinite. Semidefinite programming is sometimes used to enforce this constraint, but does not scale well. BoostMetric is instead based on a key observation that any positive semidefinite matrix can be decomposed into a linear positive combination of trace-one rank-one matrices. BoostMetric thus uses rank-one positive semidefinite matrices as weak learners within an efficient and scalable boosting-based learning process. The resulting method is easy to implement, does not require tuning, and can accommodate various types of constraints. Experiments on various datasets show that the proposed algorithm compares favorably to those state-of-the-art methods in terms of classification accuracy and running time.
Unsupervised Detection of Regions of Interest Using Iterative Link Analysis
Kim, Gunhee, Torralba, Antonio
This paper proposes a fast and scalable alternating optimization technique to detect regionsof interest (ROIs) in cluttered Web images without labels. The proposed approachdiscovers highly probable regions of object instances by iteratively repeating the following two functions: (1) choose the exemplar set (i.e. a small number of highly ranked reference ROIs) across the dataset and (2) refine the ROIs of each image with respect to the exemplar set. These two subproblems are formulated as ranking in two different similarity networks of ROI hypotheses by link analysis. The experiments with the PASCAL 06 dataset show that our unsupervised localization performance is better than one of state-of-the-art techniques andcomparable to supervised methods. Also, we test the scalability of our approach with five objects in Flickr dataset consisting of more than 200K images.
Bayesian Sparse Factor Models and DAGs Inference and Comparison
In this paper we present a novel approach to learn directed acyclic graphs (DAG) and factor models within the same framework while also allowing for model comparison between them. For this purpose, we exploit the connection between factor models and DAGs to propose Bayesian hierarchies based on spike and slab priors to promote sparsity, heavy-tailed priors to ensure identifiability and predictive densities to perform the model comparison. We require identifiability to be able to produce variable orderings leading to valid DAGs and sparsity to learn the structures. The effectiveness of our approach is demonstrated through extensive experiments on artificial and biological data showing that our approach outperform a number of state of the art methods.
From PAC-Bayes Bounds to KL Regularization
Germain, Pascal, Lacasse, Alexandre, Marchand, Mario, Shanian, Sara, Laviolette, Franรงois
We show that convex KL-regularized objective functions are obtained from a PAC-Bayes risk bound when using convex loss functions for the stochastic Gibbs classifier that upper-bound the standard zero-one loss used for the weighted majority vote. By restricting ourselves to a class of posteriors, that we call quasi uniform, we propose a simple coordinate descent learning algorithm to minimize the proposed KL-regularized cost function. We show that standard ell_p-regularized objective functions currently used, such as ridge regression and ell_p-regularized boosting, are obtained from a relaxation of the KL divergence between the quasi uniform posterior and the uniform prior. We present numerical experiments where the proposed learning algorithm generally outperforms ridge regression and AdaBoost.
Evaluating multi-class learning strategies in a generative hierarchical framework for object detection
Fidler, Sanja, Boben, Marko, Leonardis, Ales
Multiple object class learning and detection is a challenging problem due to the large number of object classes and their high visual variability. Specialized detectors usually excel in performance, while joint representations optimize sharing and reduce inference time --- but are complex to train. Conveniently, sequential learning of categories cuts down training time by transferring existing knowledge to novel classes, but cannot fully exploit the richness of shareability and might depend on ordering in learning. In hierarchical frameworks these issues have been little explored. In this paper, we show how different types of multi-class learning can be done within one generative hierarchical framework and provide a rigorous experimental analysis of various object class learning strategies as the number of classes grows. Specifically, we propose, evaluate and compare three important types of multi-class learning: 1.) independent training of individual categories, 2.) joint training of classes, 3.) sequential learning of classes. We explore and compare their computational behavior (space and time) and detection performance as a function of the number of learned classes on several recognition data sets.
Discriminative Network Models of Schizophrenia
Rish, Irina, Thyreau, Benjamin, Thirion, Bertrand, Plaze, Marion, Paillere-martinot, Marie-laure, Martelli, Catherine, Martinot, Jean-luc, Poline, Jean-baptiste, Cecchi, Guillermo A.
Schizophrenia is a complex psychiatric disorder that has eluded a characterization in terms of local abnormalities of brain activity, and is hypothesized to affect the collective, ``emergent working of the brain. We propose a novel data-driven approach to capture emergent features using functional brain networks [Eguiluzet al] extracted from fMRI data, and demonstrate its advantage over traditional region-of-interest (ROI) and local, task-specific linear activation analyzes. Our results suggest that schizophrenia is indeed associated with disruption of global, emergent brain properties related to its functioning as a network, which cannot be explained by alteration of local activation patterns. Moreover, further exploitation of interactions by sparse Markov Random Field classifiers shows clear gain over linear methods, such as Gaussian Naive Bayes and SVM, allowing to reach 86% accuracy (over 50% baseline - random guess), which is quite remarkable given that it is based on a single fMRI experiment using a simple auditory task.
Data-driven calibration of linear estimators with minimal penalties
Arlot, Sylvain, Bach, Francis R.
This paper tackles the problem of selecting among several linear estimators in non-parametric regression; this includes model selection for linear regression, the choice of a regularization parameter in kernel ridge regression or spline smoothing, and the choice of a kernel in multiple kernel learning. We propose a new algorithm which first estimates consistently the variance of the noise, based upon the concept of minimal penalty which was previously introduced in the context of model selection. Then, plugging our variance estimate in Mallows $C_L$ penalty is proved to lead to an algorithm satisfying an oracle inequality. Simulation experiments with kernel ridge regression and multiple kernel learning show that the proposed algorithm often improves significantly existing calibration procedures such as 10-fold cross-validation or generalized cross-validation.
A Trend Pattern Approach to Forecasting Socio-Political Violence
Rohloff, Kurt (BBN Technologies) | Battle, Rob (BBN Technologies) | Chatigny, Jim (BBN Technologies) | Schantz, Rick (BBN Technologies) | Asal, Victor (SUNY Albany)
We present an approach to identifying concurrent patterns of behavior in in-sample temporal factor training data that precede Events of Interest (EoIs). We also present how to use discovered patterns to forecast EoIs in out-of-sample test data. The forecasting methodology is based on matching entities' observed behaviors to patterns discovered in retrospective data. This pattern concept is a generalization of previous pattern definitions. The new pattern concept, based around patterns observed in trends of factor data is based on a finite-state model where observed, sustained trends in a factor map to pattern states. Discovered patterns can be used as a diagnostic tool to better understand the dynamic conditions leading up to specific Event of Interest occurrences and hint at underlying causal structures leading to onsets and terminations of socio-political violence. We present a computationally efficient data-mining method to discover trend patterns. We give an example of using our pattern forecasting methodology to correctly forecast the advent and cessation of ethnic-religious violence in nation states with a low false-alarm rate.
How to Explain Individual Classification Decisions
Baehrens, David, Schroeter, Timon, Harmeling, Stefan, Kawanabe, Motoaki, Hansen, Katja, Mueller, Klaus-Robert
After building a classifier with modern tools of machine learning we typically have a black box at hand that is able to predict well for unseen data. Thus, we get an answer to the question what is the most likely label of a given unseen data point. However, most methods will provide no answer why the model predicted the particular label for a single instance and what features were most influential for that particular instance. The only method that is currently able to provide such explanations are decision trees. This paper proposes a procedure which (based on a set of assumptions) allows to explain the decisions of any classification method.