Goto

Collaborating Authors

 Statistical Learning


Training Support Vector Machines Using Frank-Wolfe Optimization Methods

arXiv.org Machine Learning

Training a Support Vector Machine (SVM) requires the solution of a quadratic programming problem (QP) whose computational complexity becomes prohibitively expensive for large scale datasets. Traditional optimization methods cannot be directly applied in these cases, mainly due to memory restrictions. By adopting a slightly different objective function and under mild conditions on the kernel used within the model, efficient algorithms to train SVMs have been devised under the name of Core Vector Machines (CVMs). This framework exploits the equivalence of the resulting learning problem with the task of building a Minimal Enclosing Ball (MEB) problem in a feature space, where data is implicitly embedded by a kernel function. In this paper, we improve on the CVM approach by proposing two novel methods to build SVMs based on the Frank-Wolfe algorithm, recently revisited as a fast method to approximate the solution of a MEB problem. In contrast to CVMs, our algorithms do not require to compute the solutions of a sequence of increasingly complex QPs and are defined by using only analytic optimization steps. Experiments on a large collection of datasets show that our methods scale better than CVMs in most cases, sometimes at the price of a slightly lower accuracy. As CVMs, the proposed methods can be easily extended to machine learning problems other than binary classification. However, effective classifiers are also obtained using kernels which do not satisfy the condition required by CVMs and can thus be used for a wider set of problems.


Low-rank Matrix Completion using Alternating Minimization

arXiv.org Machine Learning

Alternating minimization represents a widely applicable and empirically successful approach for finding low-rank matrices that best fit the given data. For example, for the problem of low-rank matrix completion, this method is believed to be one of the most accurate and efficient, and formed a major component of the winning entry in the Netflix Challenge. In the alternating minimization approach, the low-rank target matrix is written in a bi-linear form, i.e. $X = UV^\dag$; the algorithm then alternates between finding the best $U$ and the best $V$. Typically, each alternating step in isolation is convex and tractable. However the overall problem becomes non-convex and there has been almost no theoretical understanding of when this approach yields a good result. In this paper we present first theoretical analysis of the performance of alternating minimization for matrix completion, and the related problem of matrix sensing. For both these problems, celebrated recent results have shown that they become well-posed and tractable once certain (now standard) conditions are imposed on the problem. We show that alternating minimization also succeeds under similar conditions. Moreover, compared to existing results, our paper shows that alternating minimization guarantees faster (in particular, geometric) convergence to the true matrix, while allowing a simpler analysis.


Hypergraph and protein function prediction with gene expression data

arXiv.org Machine Learning

Most network-based protein (or gene) function prediction methods are based on the assumption that the labels of two adjacent proteins in the network are likely to be the same. However, assuming the pairwise relationship between proteins or genes is not complete, the information a group of genes that show very similar patterns of expression and tend to have similar functions (i.e. the functional modules) is missed. The natural way overcoming the information loss of the above assumption is to represent the gene expression data as the hypergraph. Thus, in this paper, the three un-normalized, random walk, and symmetric normalized hypergraph Laplacian based semi-supervised learning methods applied to hypergraph constructed from the gene expression data in order to predict the functions of yeast proteins are introduced. Experiment results show that the average accuracy performance measures of these three hypergraph Laplacian based semi-supervised learning methods are the same. However, their average accuracy performance measures of these three methods are much greater than the average accuracy performance measures of un-normalized graph Laplacian based semi-supervised learning method (i.e. the baseline method of this paper) applied to gene co-expression network created from the gene expression data.


Simulation-based optimal Bayesian experimental design for nonlinear systems

arXiv.org Machine Learning

The optimal selection of experimental conditions is essential to maximizing the value of data for inference and prediction, particularly in situations where experiments are time-consuming and expensive to conduct. We propose a general mathematical framework and an algorithmic approach for optimal experimental design with nonlinear simulation-based models; in particular, we focus on finding sets of experiments that provide the most information about targeted sets of parameters. Our framework employs a Bayesian statistical setting, which provides a foundation for inference from noisy, indirect, and incomplete data, and a natural mechanism for incorporating heterogeneous sources of information. An objective function is constructed from information theoretic measures, reflecting expected information gain from proposed combinations of experiments. Polynomial chaos approximations and a two-stage Monte Carlo sampling method are used to evaluate the expected information gain. Stochastic approximation algorithms are then used to make optimization feasible in computationally intensive and high-dimensional settings. These algorithms are demonstrated on model problems and on nonlinear parameter estimation problems arising in detailed combustion kinetics.


Approximate Rank-Detecting Factorization of Low-Rank Tensors

arXiv.org Machine Learning

We present an algorithm, AROFAC2, which detects the (CP-)rank of a degree 3 tensor and calculates its factorization into rank-one components. We provide generative conditions for the algorithm to work and demonstrate on both synthetic and real world data that AROFAC2 is a potentially outperforming alternative to the gold standard PARAFAC over which it has the advantages that it can intrinsically detect the true rank, avoids spurious components, and is stable with respect to outliers and non-Gaussian noise.


A recursive divide-and-conquer approach for sparse principal component analysis

arXiv.org Machine Learning

In this paper, a new method is proposed for sparse PCA based on the recursive divide-and-conquer methodology. The main idea is to separate the original sparse PCA problem into a series of much simpler sub-problems, each having a closed-form solution. By recursively solving these sub-problems in an analytical way, an efficient algorithm is constructed to solve the sparse PCA problem. The algorithm only involves simple computations and is thus easy to implement. The proposed method can also be very easily extended to other sparse PCA problems with certain constraints, such as the nonnegative sparse PCA problem. Furthermore, we have shown that the proposed algorithm converges to a stationary point of the problem, and its computational complexity is approximately linear in both data size and dimensionality. The effectiveness of the proposed method is substantiated by extensive experiments implemented on a series of synthetic and real data in both reconstruction-error-minimization and data-variance-maximization viewpoints.


Overlapping clustering based on kernel similarity metric

arXiv.org Machine Learning

Producing overlapping schemes is a major issue in clustering. Recent proposed overlapping methods relies on the search of an optimal covering and are based on different metrics, such as Euclidean distance and I-Divergence, used to measure closeness between observations. In this paper, we propose the use of another measure for overlapping clustering based on a kernel similarity metric .We also estimate the number of overlapped clusters using the Gram matrix. Experiments on both Iris and EachMovie datasets show the correctness of the estimation of number of clusters and show that measure based on kernel similarity metric improves the precision, recall and f-measure in overlapping clustering.


Classification Recouvrante Bas\'ee sur les M\'ethodes \`a Noyau

arXiv.org Machine Learning

Overlapping clustering problem is an important learning issue in which clusters are not mutually exclusive and each object may belongs simultaneously to several clusters. This paper presents a kernel based method that produces overlapping clusters on a high feature space using mercer kernel techniques to improve separability of input patterns. The proposed method, called OKM-K(Overlapping $k$-means based kernel method), extends OKM (Overlapping $k$-means) method to produce overlapping schemes. Experiments are performed on overlapping dataset and empirical results obtained with OKM-K outperform results obtained with OKM.


A recursive procedure for density estimation on the binary hypercube

arXiv.org Machine Learning

This paper describes a recursive estimation procedure for multivariate binary densities (probability distributions of vectors of Bernoulli random variables) using orthogonal expansions. For $d$ covariates, there are $2^d$ basis coefficients to estimate, which renders conventional approaches computationally prohibitive when $d$ is large. However, for a wide class of densities that satisfy a certain sparsity condition, our estimator runs in probabilistic polynomial time and adapts to the unknown sparsity of the underlying density in two key ways: (1) it attains near-minimax mean-squared error for moderate sample sizes, and (2) the computational complexity is lower for sparser densities. Our method also allows for flexible control of the trade-off between mean-squared error and computational complexity.


Dynamic Network Cartography

arXiv.org Machine Learning

Communication networks have evolved from specialized, research and tactical transmission systems to large-scale and highly complex interconnections of intelligent devices, increasingly becoming more commercial, consumer-oriented, and heterogeneous. Propelled by emergent social networking services and high-definition streaming platforms, network traffic has grown explosively thanks to the advances in processing speed and storage capacity of state-of-the-art communication technologies. As "netizens" demand a seamless networking experience that entails not only higher speeds, but also resilience and robustness to failures and malicious cyber-attacks, ample opportunities for signal processing (SP) research arise. The vision is for ubiquitous smart network devices to enable data-driven statistical learning algorithms for distributed, robust, and online network operation and management, adaptable to the dynamically-evolving network landscape with minimal need for human intervention. The present paper aims at delineating the analytical background and the relevance of SP tools to dynamic network monitoring, introducing the SP readership to the concept of dynamic network cartography -- a framework to construct maps of the dynamic network state in an efficient and scalable manner tailored to large-scale heterogeneous networks.