Goto

Collaborating Authors

 Statistical Learning


Multiclass Data Segmentation using Diffuse Interface Methods on Graphs

arXiv.org Machine Learning

We present two graph-based algorithms for multiclass segmentation of high-dimensional data. The algorithms use a diffuse interface model based on the Ginzburg-Landau functional, related to total variation compressed sensing and image processing. A multiclass extension is introduced using the Gibbs simplex, with the functional's double-well potential modified to handle the multiclass case. The first algorithm minimizes the functional using a convex splitting numerical scheme. The second algorithm is a uses a graph adaptation of the classical numerical Merriman-Bence-Osher (MBO) scheme, which alternates between diffusion and thresholding. We demonstrate the performance of both algorithms experimentally on synthetic data, grayscale and color images, and several benchmark data sets such as MNIST, COIL and WebKB. We also make use of fast numerical solvers for finding the eigenvectors and eigenvalues of the graph Laplacian, and take advantage of the sparsity of the matrix. Experiments indicate that the results are competitive with or better than the current state-of-the-art multiclass segmentation algorithms.


Embedding Graphs under Centrality Constraints for Network Visualization

arXiv.org Machine Learning

In this case, the vertex dissimilarity structure is preserved through pairwise distance metrics between vertices. Principal component analysis (PCA) of the graph adjacency matrix is advocated in [3], leading to a spectral embedding whose vertices correspond to entries of the leading component vectors. The structure preserving embedding algorithm [4] solves a semidefinite program with linear topology constraints so that a nearest neighbor algorithm can recover the graph edges from the embedding. Visual analytics approaches developed in [7] and [12] emphasize community structures with applications to community browsing in graphs. Concentric graph layouts developed in [39] and [30] capture notions of node hierarchy by placing the highest ranked nodes at the center of the embedding. Although the graph embedding problem has been studied for years, development of fast and optimal visualization algorithms with hierarchical constraints is challenging and existing methods typically resort to heuristic approaches. The growing interest in analysis of very large networks has prioritized the need for effectively capturing hierarchy over aesthetic appeal in visualization. For instance, a hierarchy-aware visual analysis of a global computer network is naturally more useful to security experts trying to protect the most critical nodes from a viral infection. Layouts of metro-transit networks that clearly show terminals routing the bulk of traffic convey a better picture about the most critical nodes in the event of a terrorist attack.


Nonparametric Latent Tree Graphical Models: Inference, Estimation, and Structure Learning

arXiv.org Machine Learning

Modern data acquisition routinely produces massive amounts of high dimensional data with complex statistical dependency structures. Latent variable graphical models provide a succinct representation of such complex dependency structures by relating the observed variables to a set of latent ones. By defining a joint distribution over observed and latent variables, the marginal distribution of the observed variables can be obtained by integrating out the latent ones. This allows complex distributions over observed variables (e.g., clique models) to be expressed in terms of more tractable joint models (e.g., tree models) over the augmented variable space. Probabilistic graphical models with latent variables have been deployed successfully to a diverse range of problems such as in document analysis (Blei et al., 2002), social network modeling (Hoff et al., 2002), speech recognition (Rabiner and Juang, 1986) and bioinformatics (Clark, 1990). In this paper, we focus on latent variable models where the latent structures are trees (we call it a "latent tree" for short).


Co-clustering separately exchangeable network data

arXiv.org Machine Learning

This article establishes the performance of stochastic blockmodels in addressing the co-clustering problem of partitioning a binary array into subsets, assuming only that the data are generated by a nonparametric process satisfying the condition of separate exchangeability. We provide oracle inequalities with rate of convergence $\mathcal{O}_P(n^{-1/4})$ corresponding to profile likelihood maximization and mean-square error minimization, and show that the blockmodel can be interpreted in this setting as an optimal piecewise-constant approximation to the generative nonparametric model. We also show for large sample sizes that the detection of co-clusters in such data indicates with high probability the existence of co-clusters of equal size and asymptotically equivalent connectivity in the underlying generative process.


Detection of Anomalous Crowd Behavior Using Spatio-Temporal Multiresolution Model and Kronecker Sum Decompositions

arXiv.org Machine Learning

In this work we consider the problem of detecting anomalous spatio-temporal behavior in videos. Our approach is to learn the normative multiframe pixel joint distribution and detect deviations from it using a likelihood based approach. Due to the extreme lack of available training samples relative to the dimension of the distribution, we use a mean and covariance approach and consider methods of learning the spatio-temporal covariance in the low-sample regime. Our approach is to estimate the covariance using parameter reduction and sparse models. The first method considered is the representation of the covariance as a sum of Kronecker products as in (Greenewald et al 2013), which is found to be an accurate approximation in this setting. We propose learning algorithms relevant to our problem. We then consider the sparse multiresolution model of (Choi et al 2010) and apply the Kronecker product methods to it for further parameter reduction, as well as introducing modifications for enhanced efficiency and greater applicability to spatio-temporal covariance matrices. We apply our methods to the detection of crowd behavior anomalies in the University of Minnesota crowd anomaly dataset, and achieve competitive results.


Frequency Recognition in SSVEP-based BCI using Multiset Canonical Correlation Analysis

arXiv.org Machine Learning

Canonical correlation analysis (CCA) has been one of the most popular methods for frequency recognition in steady-state visual evoked potential (SSVEP)-based brain-computer interfaces (BCIs). Despite its efficiency, a potential problem is that using pre-constructed sine-cosine waves as the required reference signals in the CCA method often does not result in the optimal recognition accuracy due to their lack of features from the real EEG data. To address this problem, this study proposes a novel method based on multiset canonical correlation analysis (MsetCCA) to optimize the reference signals used in the CCA method for SSVEP frequency recognition. The MsetCCA method learns multiple linear transforms that implement joint spatial filtering to maximize the overall correlation among canonical variates, and hence extracts SSVEP common features from multiple sets of EEG data recorded at the same stimulus frequency. The optimized reference signals are formed by combination of the common features and completely based on training data. Experimental study with EEG data from ten healthy subjects demonstrates that the MsetCCA method improves the recognition accuracy of SSVEP frequency in comparison with the CCA method and other two competing methods (multiway CCA (MwayCCA) and phase constrained CCA (PCCA)), especially for a small number of channels and a short time window length. The superiority indicates that the proposed MsetCCA method is a new promising candidate for frequency recognition in SSVEP-based BCIs.


An Empirical Evaluation of Similarity Measures for Time Series Classification

arXiv.org Machine Learning

Time series are ubiquitous, and a measure to assess their similarity is a core part of many computational systems. In particular, the similarity measure is the most essential ingredient of time series clustering and classification systems. Because of this importance, countless approaches to estimate time series similarity have been proposed. However, there is a lack of comparative studies using empirical, rigorous, quantitative, and large-scale assessment strategies. In this article, we provide an extensive evaluation of similarity measures for time series classification following the aforementioned principles. We consider 7 different measures coming from alternative measure `families', and 45 publicly-available time series data sets coming from a wide variety of scientific domains. We focus on out-of-sample classification accuracy, but in-sample accuracies and parameter choices are also discussed. Our work is based on rigorous evaluation methodologies and includes the use of powerful statistical significance tests to derive meaningful conclusions. The obtained results show the equivalence, in terms of accuracy, of a number of measures, but with one single candidate outperforming the rest. Such findings, together with the followed methodology, invite researchers on the field to adopt a more consistent evaluation criteria and a more informed decision regarding the baseline measures to which new developments should be compared.


Coordinate Descent with Online Adaptation of Coordinate Frequencies

arXiv.org Machine Learning

Coordinate descent (CD) algorithms have become the method of choice for solving a number of optimization problems in machine learning. They are particularly popular for training linear models, including linear support vector machine classification, LASSO regression, and logistic regression. We consider general CD with non-uniform selection of coordinates. Instead of fixing selection frequencies beforehand we propose an online adaptation mechanism for this important parameter, called the adaptive coordinate frequencies (ACF) method. This mechanism removes the need to estimate optimal coordinate frequencies beforehand, and it automatically reacts to changing requirements during an optimization run. We demonstrate the usefulness of our ACF-CD approach for a variety of optimization problems arising in machine learning contexts. Our algorithm offers significant speed-ups over state-of-the-art training methods.


On the Estimation of Pointwise Dimension

arXiv.org Machine Learning

Our goal in this paper is to develop an effective estimator of fractal dimension. We survey existing ideas in dimension estimation, with a focus on the currently popular method of Grassberger and Procaccia for the estimation of correlation dimension. There are two major difficulties in estimation based on this method. The first is the insensitivity of correlation dimension itself to differences in dimensionality over data, which we term "dimension blindness". The second comes from the reliance of the method on the inference of limiting behavior from finite data. We propose pointwise dimension as an object for estimation in response to the dimension blindness of correlation dimension. Pointwise dimension is a local quantity, and the distribution of pointwise dimensions over the data contains the information to which correlation dimension is blind. We use a "limit-free" description of pointwise dimension to develop a new estimator. We conclude by discussing potential applications of our estimator as well as some challenges it raises.


Structured Priors for Sparse-Representation-Based Hyperspectral Image Classification

arXiv.org Machine Learning

Pixel-wise classification, where each pixel is assigned to a predefined class, is one of the most important procedures in hyperspectral image (HSI) analysis. By representing a test pixel as a linear combination of a small subset of labeled pixels, a sparse representation classifier (SRC) gives rather plausible results compared with that of traditional classifiers such as the support vector machine (SVM). Recently, by incorporating additional structured sparsity priors, the second generation SRCs have appeared in the literature and are reported to further improve the performance of HSI. These priors are based on exploiting the spatial dependencies between the neighboring pixels, the inherent structure of the dictionary, or both. In this paper, we review and compare several structured priors for sparse-representation-based HSI classification. We also propose a new structured prior called the low rank group prior, which can be considered as a modification of the low rank prior. Furthermore, we will investigate how different structured priors improve the result for the HSI classification.