Goto

Collaborating Authors

 Genre


Using Latent Binary Variables for Online Reconstruction of Large Scale Systems

arXiv.org Machine Learning

We propose a probabilistic graphical model realizing a minimal encoding of real variables dependencies based on possibly incomplete observation and an empirical cumulative distribution function per variable. The target application is a large scale partially observed system, like e.g. a traffic network, where a small proportion of real valued variables are observed, and the other variables have to be predicted. Our design objective is therefore to have good scalability in a real-time setting. Instead of attempting to encode the dependencies of the system directly in the description space, we propose a way to encode them in a latent space of binary variables, reflecting a rough perception of the observable (congested/non-congested for a traffic road). The method relies in part on message passing algorithms, i.e. belief propagation, but the core of the work concerns the definition of meaningful latent variables associated to the variables of interest and their pairwise dependencies. Numerical experiments demonstrate the applicability of the method in practice.


Covariance Estimation in High Dimensions via Kronecker Product Expansions

arXiv.org Machine Learning

This paper presents a new method for estimating high dimensional covariance matrices. The method, permuted rank-penalized least-squares (PRLS), is based on a Kronecker product series expansion of the true covariance matrix. Assuming an i.i.d. Gaussian random sample, we establish high dimensional rates of convergence to the true covariance as both the number of samples and the number of variables go to infinity. For covariance matrices of low separation rank, our results establish that PRLS has significantly faster convergence than the standard sample covariance matrix (SCM) estimator. The convergence rate captures a fundamental tradeoff between estimation error and approximation error, thus providing a scalable covariance estimation framework in terms of separation rank, similar to low rank approximation of covariance matrices. The MSE convergence rates generalize the high dimensional rates recently obtained for the ML Flip-flop algorithm for Kronecker product covariance estimation. We show that a class of block Toeplitz covariance matrices is approximatable by low separation rank and give bounds on the minimal separation rank $r$ that ensures a given level of bias. Simulations are presented to validate the theoretical bounds. As a real world application, we illustrate the utility of the proposed Kronecker covariance estimator for spatio-temporal linear least squares prediction of multivariate wind speed measurements.


Bounded Rational Decision-Making in Changing Environments

arXiv.org Artificial Intelligence

A perfectly rational decision-maker chooses the best action with the highest utility gain from a set of possible actions. The optimality principles that describe such decision processes do not take into account the computational costs of finding the optimal action. Bounded rational decision-making addresses this problem by specifically trading off information-processing costs and expected utility. Interestingly, a similar trade-off between energy and entropy arises when describing changes in thermodynamic systems. This similarity has been recently used to describe bounded rational agents. Crucially, this framework assumes that the environment does not change while the decision-maker is computing the optimal policy. When this requirement is not fulfilled, the decision-maker will suffer inefficiencies in utility, that arise because the current policy is optimal for an environment in the past. Here we borrow concepts from non-equilibrium thermodynamics to quantify these inefficiencies and illustrate with simulations its relationship with computational resources.


The SP theory of intelligence: benefits and applications

arXiv.org Artificial Intelligence

Tel.: 44-1248-712962; 44-7746-290775 Received: 26 May 2013; in revised form: 13 December 2013 / Accepted: 13 December 2013 / Published: xx Abstract: This article describes existing and expected benefits of the SP theory of intelligence, and some potential applications. The theory aims to simplify and integrate ideas across artificial intelligence, mainstream computing, and human perception and cognition, with information compression as a unifying theme. It combines conceptual simplicity with descriptive and explanatory power across several areas of computing and cognition. In the SP machine--an expression of the SP theory which is currently realized in the form of a computer model--there is potential for an overall simplification of computing systems, including software. The SP theory promises deeper insights and better solutions in several areas of application including, most notably, unsupervised learning, natural language processing, autonomous robots, computer vision, intelligent databases, software engineering, information compression, medical diagnosis and big data. There is also potential in areas such as the semantic web, bioinformatics, structuring of documents, the detection of computer viruses, data fusion, new kinds of computer, and the development of scientific theories. The theory promises seamless integration of structures and functions within and between different areas of application. The potential value, worldwide, of these benefits and applications is at least $190 billion each year. Further development would be facilitated by the creation of a high-parallel, open-source version of the SP machine, available to researchers everywhere. Keywords: artificial intelligence; information compression; unsupervised learning; natural language processing; pattern recognition Information 2013, xx 2 1. Introduction The SP theory of intelligence aims to simplify and integrate concepts across artificial intelligence, mainstream computing and human perception and cognition, with information compression as a unifying theme. This article describes existing and expected benefits of the SP theory and some of its potential applications. The theory is described most fully in [1] and more briefly in an extended overview [2]. This article should be read in conjunction with either or both of those accounts. In brief, the existing and expected benefits of the theory are: - Conceptual simplicity combined with descriptive and explanatory power.


Parallel architectures for fuzzy triadic similarity learning

arXiv.org Machine Learning

In a context of document co-clustering, we define a new similarity measure which iteratively computes similarity while combining fuzzy sets in a three-partite graph. The fuzzy triadic similarity (FT-Sim) model can deal with uncertainty offers by the fuzzy sets. Moreover, with the development of the Web and the high availability of storage spaces, more and more documents become accessible. Documents can be provided from multiple sites and make similarity computation an expensive processing. This problem motivated us to use parallel computing. In this paper, we introduce parallel architectures which are able to treat large and multi-source data sets by a sequential, a merging or a splitting-based process. Then, we proceed to a local and a central (or global) computing using the basic FT-Sim measure. The idea behind these architectures is to reduce both time and space complexities thanks to parallel computation.


Large-Scale Paralleled Sparse Principal Component Analysis

arXiv.org Machine Learning

Principal component analysis (PCA) is a statistical technique commonly used in multivariate data analysis. However, PCA can be difficult to interpret and explain since the principal components (PCs) are linear combinations of the original variables. Sparse PCA (SPCA) aims to balance statistical fidelity and interpretability by approximating sparse PCs whose projections capture the maximal variance of original data. In this paper we present an efficient and paralleled method of SPCA using graphics processing units (GPUs), which can process large blocks of data in parallel. Specifically, we construct parallel implementations of the four optimization formulations of the generalized power method of SPCA (GP-SPCA), one of the most efficient and effective SPCA approaches, on a GPU. The parallel GPU implementation of GP-SPCA (using CUBLAS) is up to eleven times faster than the corresponding CPU implementation (using CBLAS), and up to 107 times faster than a MatLab implementation. Extensive comparative experiments in several real-world datasets confirm that SPCA offers a practical advantage.


The Sparse Principal Component of a Constant-rank Matrix

arXiv.org Machine Learning

The computation of the sparse principal component of a matrix is equivalent to the identification of its principal submatrix with the largest maximum eigenvalue. Finding this optimal submatrix is what renders the problem ${\mathcal{NP}}$-hard. In this work, we prove that, if the matrix is positive semidefinite and its rank is constant, then its sparse principal component is polynomially computable. Our proof utilizes the auxiliary unit vector technique that has been recently developed to identify problems that are polynomially solvable. Moreover, we use this technique to design an algorithm which, for any sparsity value, computes the sparse principal component with complexity ${\mathcal O}\left(N^{D+1}\right)$, where $N$ and $D$ are the matrix size and rank, respectively. Our algorithm is fully parallelizable and memory efficient.


Non-parametric Bayesian modeling of complex networks

arXiv.org Machine Learning

Modeling structure in complex networks using Bayesian non-parametrics makes it possible to specify flexible model structures and infer the adequate model complexity from the observed data. This paper provides a gentle introduction to non-parametric Bayesian modeling of complex networks: Using an infinite mixture model as running example we go through the steps of deriving the model as an infinite limit of a finite parametric model, inferring the model parameters by Markov chain Monte Carlo, and checking the model's fit and predictive performance. We explain how advanced non-parametric models for complex networks can be derived and point out relevant literature.


High-Dimensional Regression with Gaussian Mixtures and Partially-Latent Response Variables

arXiv.org Machine Learning

In this work we address the problem of approximating high-dimensional data with a low-dimensional representation. We make the following contributions. We propose an inverse regression method which exchanges the roles of input and response, such that the low-dimensional variable becomes the regressor, and which is tractable. We introduce a mixture of locally-linear probabilistic mapping model that starts with estimating the parameters of inverse regression, and follows with inferring closed-form solutions for the forward parameters of the high-dimensional regression problem of interest. Moreover, we introduce a partially-latent paradigm, such that the vector-valued response variable is composed of both observed and latent entries, thus being able to deal with data contaminated by experimental artifacts that cannot be explained with noise models. The proposed probabilistic formulation could be viewed as a latent-variable augmentation of regression. We devise expectation-maximization (EM) procedures based on a data augmentation strategy which facilitates the maximum-likelihood search over the model parameters. We propose two augmentation schemes and we describe in detail the associated EM inference procedures that may well be viewed as generalizations of a number of EM regression, dimension reduction, and factor analysis algorithms. The proposed framework is validated with both synthetic and real data. We provide experimental evidence that our method outperforms several existing regression techniques.


Time-varying Learning and Content Analytics via Sparse Factor Analysis

arXiv.org Machine Learning

We propose SPARFA-Trace, a new machine learning-based framework for time-varying learning and content analytics for education applications. We develop a novel message passing-based, blind, approximate Kalman filter for sparse factor analysis (SPARFA), that jointly (i) traces learner concept knowledge over time, (ii) analyzes learner concept knowledge state transitions (induced by interacting with learning resources, such as textbook sections, lecture videos, etc, or the forgetting effect), and (iii) estimates the content organization and intrinsic difficulty of the assessment questions. These quantities are estimated solely from binary-valued (correct/incorrect) graded learner response data and a summary of the specific actions each learner performs (e.g., answering a question or studying a learning resource) at each time instance. Experimental results on two online course datasets demonstrate that SPARFA-Trace is capable of tracing each learner's concept knowledge evolution over time, as well as analyzing the quality and content organization of learning resources, the question-concept associations, and the question intrinsic difficulties. Moreover, we show that SPARFA-Trace achieves comparable or better performance in predicting unobserved learner responses than existing collaborative filtering and knowledge tracing approaches for personalized education.