Goto

Collaborating Authors

 Statistical Learning


Reducing statistical time-series problems to binary classification

arXiv.org Machine Learning

We show how binary classification methods developed to work on i.i.d. data can be used for solving statistical problems that are seemingly unrelated to classification and concern highly-dependent time series. Specifically, the problems of time-series clustering, homogeneity testing and the three-sample problem are addressed. The algorithms that we construct for solving these problems are based on a new metric between time-series distributions, which can be evaluated using binary classification methods. Universal consistency of the proposed algorithms is proven under most general assumptions. The theoretical results are illustrated with experiments on synthetic and real-world data.


Orbital-free Bond Breaking via Machine Learning

arXiv.org Machine Learning

Machine learning is used to approximate the kinetic energy of one dimensional diatomics as a functional of the electron density. The functional can accurately dissociate a diatomic, and can be systematically improved with training. Highly accurate self-consistent densities and molecular forces are found, indicating the possibility for ab-initio molecular dynamics simulations.


Multiclass Semi-Supervised Learning on Graphs using Ginzburg-Landau Functional Minimization

arXiv.org Machine Learning

We present a graph-based variational algorithm for classification of high-dimensional data, generalizing the binary diffuse interface model to the case of multiple classes. Motivated by total variation techniques, the method involves minimizing an energy functional made up of three terms. The first two terms promote a stepwise continuous classification function with sharp transitions between classes, while preserving symmetry among the class labels. The third term is a data fidelity term, allowing us to incorporate prior information into the model in a semi-supervised framework. The performance of the algorithm on synthetic data, as well as on the COIL and MNIST benchmark datasets, is competitive with state-of-the-art graph-based multiclass segmentation methods.


Verdict Accuracy of Quick Reduct Algorithm using Clustering and Classification Techniques for Gene Expression Data

arXiv.org Machine Learning

In most gene expression data, the number of training samples is very small compared to the large number of genes involved in the experiments. However, among the large amount of genes, only a small fraction is effective for performing a certain task. Furthermore, a small subset of genes is desirable in developing gene expression based diagnostic tools for delivering reliable and understandable results. With the gene selection results, the cost of biological experiment and decision can be greatly reduced by analyzing only the marker genes. An important application of gene expression data in functional genomics is to classify samples according to their gene expression profiles. Feature selection (FS) is a process which attempts to select more informative features. It is one of the important steps in knowledge discovery. Conventional supervised FS methods evaluate various feature subsets using an evaluation function or metric to select only those features which are related to the decision classes of the data under consideration. This paper studies a feature selection method based on rough set theory. Further K-Means, Fuzzy C-Means (FCM) algorithm have implemented for the reduced feature set without considering class labels. Then the obtained results are compared with the original class labels. Back Propagation Network (BPN) has also been used for classification. Then the performance of K-Means, FCM, and BPN are analyzed through the confusion matrix. It is found that the BPN is performing well comparatively.


Kernel Mean Estimation and Stein's Effect

arXiv.org Machine Learning

A mean function in reproducing kernel Hilbert space, or a kernel mean, is an important part of many applications ranging from kernel principal component analysis to Hilbert-space embedding of distributions. Given finite samples, an empirical average is the standard estimate for the true kernel mean. We show that this estimator can be improved via a well-known phenomenon in statistics called Stein's phenomenon. After consideration, our theoretical analysis reveals the existence of a wide class of estimators that are better than the standard. Focusing on a subset of this class, we propose efficient shrinkage estimators for the kernel mean. Empirical evaluations on several benchmark applications clearly demonstrate that the proposed estimators outperform the standard kernel mean estimator.


Multiclass Total Variation Clustering

arXiv.org Machine Learning

Many clustering models rely on the minimization of an energy over possible partitions of the data set. These discrete optimizations usually pose NPhard problems, however. A natural resolution of this issue involves relaxing the discrete minimization space into a continuous one to obtain an easier minimization procedure. Many current algorithms, such as spectral clustering methods or nonnegative matrix factorization (NMF) methods, follow this relaxation approach. A fundamental problem arises when using this approach, however; in general the solution of the relaxed continuous problem and that of the discrete NPhard problem can differ substantially.


$\propto$SVM for learning with label proportions

arXiv.org Machine Learning

We study the problem of learning with label proportions in which the training data is provided in groups and only the proportion of each class in each group is known. We propose a new method called proportion-SVM, or $\propto$SVM, which explicitly models the latent unknown instance labels together with the known group label proportions in a large-margin framework. Unlike the existing works, our approach avoids making restrictive assumptions about the data. The $\propto$SVM model leads to a non-convex integer programming problem. In order to solve it efficiently, we propose two algorithms: one based on simple alternating optimization and the other based on a convex relaxation. Extensive experiments on standard datasets show that $\propto$SVM outperforms the state-of-the-art, especially for larger group sizes.


The Randomized Dependence Coefficient

arXiv.org Machine Learning

We introduce the Randomized Dependence Coefficient (RDC), a measure of non-linear dependence between random variables of arbitrary dimension based on the Hirschfeld-Gebelein-R\'enyi Maximum Correlation Coefficient. RDC is defined in terms of correlation of random non-linear copula projections; it is invariant with respect to marginal distribution transformations, has low computational cost and is easy to implement: just five lines of R code, included at the end of the paper.


Provable Inductive Matrix Completion

arXiv.org Machine Learning

Consider a movie recommendation system where apart from the ratings information, side information such as user's age or movie's genre is also available. Unlike standard matrix completion, in this setting one should be able to predict inductively on new users/movies. In this paper, we study the problem of inductive matrix completion in the exact recovery setting. That is, we assume that the ratings matrix is generated by applying feature vectors to a low-rank matrix and the goal is to recover back the underlying matrix. Furthermore, we generalize the problem to that of low-rank matrix estimation using rank-1 measurements. We study this generic problem and provide conditions that the set of measurements should satisfy so that the alternating minimization method (which otherwise is a non-convex method with no convergence guarantees) is able to recover back the {\em exact} underlying low-rank matrix. In addition to inductive matrix completion, we show that two other low-rank estimation problems can be studied in our framework: a) general low-rank matrix sensing using rank-1 measurements, and b) multi-label regression with missing labels. For both the problems, we provide novel and interesting bounds on the number of measurements required by alternating minimization to provably converges to the {\em exact} low-rank matrix. In particular, our analysis for the general low rank matrix sensing problem significantly improves the required storage and computational cost than that required by the RIP-based matrix sensing methods \cite{RechtFP2007}. Finally, we provide empirical validation of our approach and demonstrate that alternating minimization is able to recover the true matrix for the above mentioned problems using a small number of measurements.


One-Class Support Measure Machines for Group Anomaly Detection

arXiv.org Machine Learning

We propose one-class support measure machines (OCSMMs) for group anomaly detection which aims at recognizing anomalous aggregate behaviors of data points. The OCSMMs generalize well-known one-class support vector machines (OCSVMs) to a space of probability measures. By formulating the problem as quantile estimation on distributions, we can establish an interesting connection to the OCSVMs and variable kernel density estimators (VKDEs) over the input space on which the distributions are defined, bridging the gap between large-margin methods and kernel density estimators. In particular, we show that various types of VKDEs can be considered as solutions to a class of regularization problems studied in this paper. Experiments on Sloan Digital Sky Survey dataset and High Energy Particle Physics dataset demonstrate the benefits of the proposed framework in real-world applications.