Goto

Collaborating Authors

 Statistical Learning


A Ternary Non-Commutative Latent Factor Model for Scalable Three-Way Real Tensor Completion

arXiv.org Machine Learning

Motivated by large-scale Collaborative-Filtering applications, we present a Non-Commuting Latent Factor (NCLF) tensor-completion approach for modeling three-way arrays, which is diagonal like the standard PARAFAC, but wherein different terms distinguish different kinds of three-way relations of co-clusters, as determined by permutations of latent factors. The first key component of the algebraic representation is the usage of two non-commutative real trilinear operations as the building blocks of the approximation. These operations are the standard three dimensional triple-product and a trilinear product on a two-dimensional real vector space, which is a representation of the real Clifford Algebra Cl(1,1) (a certain Majorana spinor). Both operations are purely ternary in that they cannot be decomposed into two group-operations on the relevant spaces. The second key component of the method is combining these operations using permutation-symmetry preserving linear combinations. We apply the model to the MovieLens and Fannie Mae datasets, and find that it outperforms the PARAFAC model. We propose some future directions, such as unsupervised-learning.


Sparse Fisher's Linear Discriminant Analysis for Partially Labeled Data

arXiv.org Machine Learning

Classification is an important tool with many useful applications. Among the many classification methods, Fisher's Linear Discriminant Analysis (LDA) is a traditional model-based approach which makes use of the covariance information. However, in the high-dimensional, low-sample size setting, LDA cannot be directly deployed because the sample covariance is not invertible. While there are modern methods designed to deal with high-dimensional data, they may not fully use the covariance information as LDA does. Hence in some situations, it is still desirable to use a model-based method such as LDA for classification. This article exploits the potential of LDA in more complicated data settings. In many real applications, it is costly to manually place labels on observations; hence it is often that only a small portion of labeled data is available while a large number of observations are left without a label. It is a great challenge to obtain good classification performance through the labeled data alone, especially when the dimension is greater than the size of the labeled data. In order to overcome this issue, we propose a semi-supervised sparse LDA classifier to take advantage of the seemingly useless unlabeled data. They provide additional information which helps to boost the classification performance in some situations. A direct estimation method is used to reconstruct LDA and achieve the sparsity; meanwhile we employ the difference-convex algorithm to handle the non-convex loss function associated with the unlabeled data. Theoretical properties of the proposed classifier are studied. Our simulated examples help to understand when and how the information extracted from the unlabeled data can be useful. A real data example further illustrates the usefulness of the proposed method.


(Blue) Taxi Destination and Trip Time Prediction from Partial Trajectories

arXiv.org Machine Learning

Real-time estimation of destination and travel time for taxis is of great importance for existing electronic dispatch systems. We present an approach based on trip matching and ensemble learning, in which we leverage the patterns observed in a dataset of roughly 1.7 million taxi journeys to predict the corresponding final destination and travel time for ongoing taxi trips, as a solution for the ECML/PKDD Discovery Challenge 2015 competition. The results of our empirical evaluation show that our approach is effective and very robust, which led our team -- BlueTaxi -- to the 3rd and 7th position of the final rankings for the trip time and destination prediction tasks, respectively. Given the fact that the final rankings were computed using a very small test set (with only 320 trips) we believe that our approach is one of the most robust solutions for the challenge based on the consistency of our good results across the test sets.


Decadal climate predictions using sequential learning algorithms

arXiv.org Machine Learning

Ensembles of climate models are commonly used to improve climate predictions and assess the uncertainties associated with them. Weighting the models according to their performances holds the promise of further improving their predictions. Here, we use an ensemble of decadal climate predictions to demonstrate the ability of sequential learning algorithms (SLAs) to reduce the forecast errors and reduce the uncertainties. Three different SLAs are considered, and their performances are compared with those of an equally weighted ensemble, a linear regression and the climatology. Predictions of four different variables--the surface temperature, the zonal and meridional wind, and pressure--are considered. The spatial distributions of the performances are presented, and the statistical significance of the improvements achieved by the SLAs is tested. Based on the performances of the SLAs, we propose one to be highly suitable for the improvement of decadal climate predictions.


The Advantage of Cross Entropy over Entropy in Iterative Information Gathering

arXiv.org Machine Learning

Gathering the most information by picking the least amount of data is a common task in experimental design or when exploring an unknown environment in reinforcement learning and robotics. A widely used measure for quantifying the information contained in some distribution of interest is its entropy. Greedily minimizing the expected entropy is therefore a standard method for choosing samples in order to gain strong beliefs about the underlying random variables. We show that this approach is prone to temporally getting stuck in local optima corresponding to wrongly biased beliefs. We suggest instead maximizing the expected cross entropy between old and new belief, which aims at challenging refutable beliefs and thereby avoids these local optima. We show that both criteria are closely related and that their difference can be traced back to the asymmetry of the Kullback-Leibler divergence. In illustrative examples as well as simulated and real-world experiments we demonstrate the advantage of cross entropy over simple entropy for practical applications.


Efficient Nonnegative Tucker Decompositions: Algorithms and Uniqueness

arXiv.org Machine Learning

Abstract--Nonnegative T ucker decomposition (NTD) is a powerful tool for the extraction of nonnegative parts-based and physically meaningful latent components from high-dimensional tensor data while preserving the natural multilinear structure of data. However, as the data tensor often has multiple modes and is large-scale, existing NTD algorithms suffer from a very high computational complexity in terms of both storage and computation time, which has been one major obstacle for practical applications of NTD. T o overcome these disadvantages, we show how low (multilinear) rank approximation (LRA) of tensors is able to significantly simplify the computation of the gradients of the cost function, upon which a family of efficient first-order NTD algorithms are developed. Besides dramatically reducing the storage complexity and running time, the new algorithms are quite flexible and robust to noise because any well-established LRA approaches can be applied. We also show how nonnegativity incorporating sparsity substantially improves the uniqueness property and partially alleviates the curse of dimensionality of the T ucker decompositions. Simulation results on synthetic and real-world data justify the validity and high efficiency of the proposed NTD algorithms. INDING information-rich and task-relevant variables hidden behind observation data is a fundamental task in data analysis and has been widely studied in the fields of signal and image processing and machine learning. Although the observation data can be very large, a much lower number of latent variables or components can capture the most significant features of the original data. Manuscript received ...This work was partially supported by the National Natural Science Foundation of China (grants U1201253), the Guangdong Province Natural Science Foundation (2014A030308009), the Guangdong Province Excellent Thesis Foundation (SYBZZXM201316), and the JSPS KAKENHI (26730125, 15K15955). Guoxu Zhou is with the School of Automation at Guangdong University of Technology, Guangzhou, China and the Laboratory for Advanced Brain Signal Processing, RIKEN Brain Science Institute, Wako-shi, Saitama, Japan. Andrzej Cichocki is with the Laboratory for Advanced Brain Signal Processing, RIKEN Brain Science Institute, Wako-shi, Saitama, Japan and with Systems Research Institute, Polish Academy of Science, Warsaw, Poland. Qibin Zhao is with the Laboratory for Advanced Brain Signal Processing, RIKEN Brain Science Institute, Japan. Shengli Xie is with the Faculty of Automation, Guangdong University of Technology, Guangzhou 510006, China. This important topic has been extensively studied in the last several decades, particularly witnessed by the great success of blind source separation (BSS) techniques [1].


Exponential Family Matrix Completion under Structural Constraints

arXiv.org Machine Learning

We consider the matrix completion problem of recovering a structured matrix from noisy and partial measurements. Recent works have proposed tractable estimators with strong statistical guarantees for the case where the underlying matrix is low--rank, and the measurements consist of a subset, either of the exact individual entries, or of the entries perturbed by additive Gaussian noise, which is thus implicitly suited for thin--tailed continuous data. Arguably, common applications of matrix completion require estimators for (a) heterogeneous data--types, such as skewed--continuous, count, binary, etc., (b) for heterogeneous noise models (beyond Gaussian), which capture varied uncertainty in the measurements, and (c) heterogeneous structural constraints beyond low--rank, such as block--sparsity, or a superposition structure of low--rank plus elementwise sparseness, among others. In this paper, we provide a vastly unified framework for generalized matrix completion by considering a matrix completion setting wherein the matrix entries are sampled from any member of the rich family of exponential family distributions; and impose general structural constraints on the underlying matrix, as captured by a general regularizer $\mathcal{R}(.)$. We propose a simple convex regularized $M$--estimator for the generalized framework, and provide a unified and novel statistical analysis for this general class of estimators. We finally corroborate our theoretical results on simulated datasets.


Group Membership Prediction

arXiv.org Machine Learning

The group membership prediction (GMP) problem involves predicting whether or not a collection of instances share a certain semantic property. For instance, in kinship verification given a collection of images, the goal is to predict whether or not they share a {\it familial} relationship. In this context we propose a novel probability model and introduce latent {\em view-specific} and {\em view-shared} random variables to jointly account for the view-specific appearance and cross-view similarities among data instances. Our model posits that data from each view is independent conditioned on the shared variables. This postulate leads to a parametric probability model that decomposes group membership likelihood into a tensor product of data-independent parameters and data-dependent factors. We propose learning the data-independent parameters in a discriminative way with bilinear classifiers, and test our prediction algorithm on challenging visual recognition tasks such as multi-camera person re-identification and kinship verification. On most benchmark datasets, our method can significantly outperform the current state-of-the-art.


Dirichlet Fragmentation Processes

arXiv.org Machine Learning

Tree structures are ubiquitous in data across many domains, and many datasets are naturally modelled by unobserved tree structures. In this paper, first we review the theory of random fragmentation processes [Bertoin, 2006], and a number of existing methods for modelling trees, including the popular nested Chinese restaurant process (nCRP). Then we define a general class of probability distributions over trees: the Dirichlet fragmentation process (DFP) through a novel combination of the theory of Dirichlet processes and random fragmentation processes. This DFP presents a stick-breaking construction, and relates to the nCRP in the same way the Dirichlet process relates to the Chinese restaurant process. Furthermore, we develop a novel hierarchical mixture model with the DFP, and empirically compare the new model to similar models in machine learning. Experiments show the DFP mixture model to be convincingly better than existing state-of-the-art approaches for hierarchical clustering and density modelling. The process of random fragmentation is common to many areas, such as the degradation of large polymer chains in chemistry, or the evolution of phylogenetic trees in biology. An elegant mathematical tool for describing such phenomena is the fragmentation process (FP) [Bertoin, 2006]. As a concrete example of a FP, consider a stick of unit length.


Maximum Correntropy Kalman Filter

arXiv.org Machine Learning

Traditional Kalman filter (KF) is derived under the well-known minimum mean square error (MMSE) criterion, which is optimal under Gaussian assumption. However, when the signals are non-Gaussian, especially when the system is disturbed by some heavy-tailed impulsive noises, the performance of KF will deteriorate seriously. To improve the robustness of KF against impulsive noises, we propose in this work a new Kalman filter, called the maximum correntropy Kalman filter (MCKF), which adopts the robust maximum correntropy criterion (MCC) as the optimality criterion, instead of using the MMSE. Similar to the traditional KF, the state mean and covariance matrix propagation equations are used to give prior estimations of the state and covariance matrix in MCKF. A novel fixed-point algorithm is then used to update the posterior estimations. A sufficient condition that guarantees the convergence of the fixed-point algorithm is given. Illustration examples are presented to demonstrate the effectiveness and robustness of the new algorithm.