Goto

Collaborating Authors

 Country


Causal Discovery in a Binary Exclusive-or Skew Acyclic Model: BExSAM

arXiv.org Machine Learning

Discovering causal relations among observed variables in a given data set is a major objective in studies of statistics and artificial intelligence. Recently, some techniques to discover a unique causal model have been explored based on non-Gaussianity of the observed data distribution. However, most of these are limited to continuous data. In this paper, we present a novel causal model for binary data and propose an efficient new approach to deriving the unique causal model governing a given binary data set under skew distributions of external binary noises. Experimental evaluation shows excellent performance for both artificial and real world data sets.


On the symmetrical Kullback-Leibler Jeffreys centroids

arXiv.org Machine Learning

Due to the success of the bag-of-word modeling paradigm, clustering histograms has become an important ingredient of modern information processing. Clustering histograms can be performed using the celebrated $k$-means centroid-based algorithm. From the viewpoint of applications, it is usually required to deal with symmetric distances. In this letter, we consider the Jeffreys divergence that symmetrizes the Kullback-Leibler divergence, and investigate the computation of Jeffreys centroids. We first prove that the Jeffreys centroid can be expressed analytically using the Lambert $W$ function for positive histograms. We then show how to obtain a fast guaranteed approximation when dealing with frequency histograms. Finally, we conclude with some remarks on the $k$-means histogram clustering.


Collaborative Regression

arXiv.org Machine Learning

We consider the scenario where one observes an outcome variable and sets of features from multiple assays, all measured on the same set of samples. One approach that has been proposed for dealing with this type of data is ``sparse multiple canonical correlation analysis'' (sparse mCCA). All of the current sparse mCCA techniques are biconvex and thus have no guarantees about reaching a global optimum. We propose a method for performing sparse supervised canonical correlation analysis (sparse sCCA), a specific case of sparse mCCA when one of the datasets is a vector. Our proposal for sparse sCCA is convex and thus does not face the same difficulties as the other methods. We derive efficient algorithms for this problem, and illustrate their use on simulated and real data.


On change point detection using the fused lasso method

arXiv.org Machine Learning

In this paper we analyze the asymptotic properties of l1 penalized maximum likelihood estimation of signals with piece-wise constant mean values and/or variances. The focus is on segmentation of a non-stationary time series with respect to changes in these model parameters. This change point detection and estimation problem is also referred to as total variation denoising or l1 -mean filtering and has many important applications in most fields of science and engineering. We establish the (approximate) sparse consistency properties, including rate of convergence, of the so-called fused lasso signal approximator (FLSA). We show that this only holds if the sign of the corresponding consecutive changes are all different, and that this estimator is otherwise incapable of correctly detecting the underlying sparsity pattern. The key idea is to notice that the optimality conditions for this problem can be analyzed using techniques related to brownian bridge theory.


Consistency of weighted majority votes

arXiv.org Machine Learning

We revisit the classical decision-theoretic problem of weighted expert voting from a statistical learning perspective. In particular, we examine the consistency (both asymptotic and finitary) of the optimal Nitzan-Paroush weighted majority and related rules. In the case of known expert competence levels, we give sharp error estimates for the optimal rule. When the competence levels are unknown, they must be empirically estimated. We provide frequentist and Bayesian analyses for this situation. Some of our proof techniques are non-standard and may be of independent interest. The bounds we derive are nearly optimal, and several challenging open problems are posed. Experimental results are provided to illustrate the theory.


Hilbert Space Methods for Reduced-Rank Gaussian Process Regression

arXiv.org Machine Learning

This paper proposes a novel scheme for reduced-rank Gaussian process regression. The method is based on an approximate series expansion of the covariance function in terms of an eigenfunction expansion of the Laplace operator in a compact subset of $\mathbb{R}^d$. On this approximate eigenbasis the eigenvalues of the covariance function can be expressed as simple functions of the spectral density of the Gaussian process, which allows the GP inference to be solved under a computational cost scaling as $\mathcal{O}(nm^2)$ (initial) and $\mathcal{O}(m^3)$ (hyperparameter learning) with $m$ basis functions and $n$ data points. The approach also allows for rigorous error analysis with Hilbert space theory, and we show that the approximation becomes exact when the size of the compact subset and the number of eigenfunctions go to infinity. The expansion generalizes to Hilbert spaces with an inner product which is defined as an integral over a specified input density. The method is compared to previously proposed methods theoretically and through empirical tests with simulated and real data.


Compositional Operators in Distributional Semantics

arXiv.org Artificial Intelligence

This survey presents in some detail the main advances that have been recently taking place in Computational Linguistics towards the unification of the two prominent semantic paradigms: the compositional formal semantics view and the distributional models of meaning based on vector spaces. After an introduction to these two approaches, I review the most important models that aim to provide compositionality in distributional semantics. Then I proceed and present in more detail a particular framework by Coecke, Sadrzadeh and Clark (2010) based on the abstract mathematical setting of category theory, as a more complete example capable to demonstrate the diversity of techniques and scientific disciplines that this kind of research can draw from. This paper concludes with a discussion about important open issues that need to be addressed by the researchers in the future.


Anomaly detection in reconstructed quantum states using a machine-learning technique

arXiv.org Machine Learning

The accurate detection of small deviations in given density matrices is important for quantum information processing. Here we propose a new method based on the concept of data mining. We demonstrate that the proposed method can more accurately detect small erroneous deviations in reconstructed density matrices, which contain intrinsic fluctuations due to the limited number of samples, than a naive method of checking the trace distance from the average of the given density matrices. This method has the potential to be a key tool in broad areas of physics where the detection of small deviations of quantum states reconstructed using a limited number of samples are essential.


Embedding Graphs under Centrality Constraints for Network Visualization

arXiv.org Machine Learning

Visual rendering of graphs is a key task in the mapping of complex network data. Although most graph drawing algorithms emphasize aesthetic appeal, certain applications such as travel-time maps place more importance on visualization of structural network properties. The present paper advocates two graph embedding approaches with centrality considerations to comply with node hierarchy. The problem is formulated first as one of constrained multi-dimensional scaling (MDS), and it is solved via block coordinate descent iterations with successive approximations and guaranteed convergence to a KKT point. In addition, a regularization term enforcing graph smoothness is incorporated with the goal of reducing edge crossings. A second approach leverages the locally-linear embedding (LLE) algorithm which assumes that the graph encodes data sampled from a low-dimensional manifold. Closed-form solutions to the resulting centrality-constrained optimization problems are determined yielding meaningful embeddings. Experimental results demonstrate the efficacy of both approaches, especially for visualizing large networks on the order of thousands of nodes.


Multiclass Data Segmentation using Diffuse Interface Methods on Graphs

arXiv.org Machine Learning

We present two graph-based algorithms for multiclass segmentation of high-dimensional data. The algorithms use a diffuse interface model based on the Ginzburg-Landau functional, related to total variation compressed sensing and image processing. A multiclass extension is introduced using the Gibbs simplex, with the functional's double-well potential modified to handle the multiclass case. The first algorithm minimizes the functional using a convex splitting numerical scheme. The second algorithm is a uses a graph adaptation of the classical numerical Merriman-Bence-Osher (MBO) scheme, which alternates between diffusion and thresholding. We demonstrate the performance of both algorithms experimentally on synthetic data, grayscale and color images, and several benchmark data sets such as MNIST, COIL and WebKB. We also make use of fast numerical solvers for finding the eigenvectors and eigenvalues of the graph Laplacian, and take advantage of the sparsity of the matrix. Experiments indicate that the results are competitive with or better than the current state-of-the-art multiclass segmentation algorithms.