Tensor Completion Algorithms in Big Data Analytics

arXiv.org Machine Learning

Tensor completion is a problem of filling the missing or unobserved entries of partially observed tensors. Due to the multidimensional character of tensors in describing complex datasets, tensor completion algorithms and their applications have received wide attention and achievement in data mining, computer vision, signal processing, and neuroscience, etc. In this survey, we provide a modern overview of recent advances in tensor completion algorithms from the perspective of big data analytics characterized by diverse variety, large volume, and high velocity. Towards a better comprehension and comparison of vast existing advances, we summarize and categorize them into four groups including general tensor completion algorithms, tensor completion with auxiliary information (variety), scalable tensor completion algorithms (volume) and dynamic tensor completion algorithms (velocity). Besides, we introduce their applications on real-world data-driven problems and present an open-source package covering several widely used tensor decomposition and completion algorithms. Our goal is to summarize these popular methods and introduce them to researchers for promoting the research process in this field and give an available repository for practitioners. In the end, we also discuss some challenges and promising research directions in this community for future explorations.


A New Sampling Technique for Tensors

arXiv.org Machine Learning

In this paper we propose new techniques to sample arbitrary third-order tensors, with an objective of speeding up tensor algorithms that have recently gained popularity in machine learning. Our main contribution is a new way to select, in a biased random way, only $O(n^{1.5}/\epsilon^2)$ of the possible $n^3$ elements while still achieving each of the three goals: \\ {\em (a) tensor sparsification}: for a tensor that has to be formed from arbitrary samples, compute very few elements to get a good spectral approximation, and for arbitrary orthogonal tensors {\em (b) tensor completion:} recover an exactly low-rank tensor from a small number of samples via alternating least squares, or {\em (c) tensor factorization:} approximating factors of a low-rank tensor corrupted by noise. \\ Our sampling can be used along with existing tensor-based algorithms to speed them up, removing the computational bottleneck in these methods.


Variational Bayesian Inference for Robust Streaming Tensor Factorization and Completion

arXiv.org Machine Learning

Streaming tensor factorization is a powerful tool for processing high-volume and multi-way temporal data in Internet networks, recommender systems and image/video data analysis. Existing streaming tensor factorization algorithms rely on least-squares data fitting and they do not possess a mechanism for tensor rank determination. This leaves them susceptible to outliers and vulnerable to over-fitting. This paper presents a Bayesian robust streaming tensor factorization model to identify sparse outliers, automatically determine the underlying tensor rank and accurately fit low-rank structure. We implement our model in Matlab and compare it with existing algorithms on tensor datasets generated from dynamic MRI and Internet traffic.


Parallel Higher Order Alternating Least Square for Tensor Recommender System

AAAI Conferences

Many modern recommender systems rely on matrix factor-ization techniques to produce personalized recommendationson the basis of the feedback that users provided on differ-ent items in the past. The feedback may take different forms,such as the rating of a movie, or the number of times a userlistened to the songs of a given music band. Nonetheless, insome situations, the user can perform several actions on eachitem, and the feedback is multidimensional (e.g., the user ofan e-commerce website can either click on a product, add theproduct to her cart or buy it). In this case, one can no longerview the recommendation problem as a matrix completion,unless the problem is reduced to a series of multiple inde-pendent problems, thus loosing the correlation between thedifferent actions. In this case, the most suitable approach is touse a tensor approach to learn all dimensions of the feedbacksimultaneously. In this paper, we propose a specific instanceof tensor completion and we show how it can be heavily par-allelized over both the dimensions (i.e., items, users, actions)and within each dimension (i.e., each item separately). Wevalidate the proposed method both in terms of prediction ac-curacy and scalability to large datasets.


Provable Tensor Factorization with Missing Data

Neural Information Processing Systems

We study the problem of low-rank tensor factorization in the presence of missing data. We ask the following question: how many sampled entries do we need, to efficiently and exactly reconstruct a tensor with a low-rank orthogonal decomposition? We propose a novel alternating minimization based method which iteratively refines estimates of the singular vectors. We show that under certain standard assumptions, our method can recover a three-mode $n\times n\times n$ dimensional rank-$r$ tensor exactly from $O(n^{3/2} r^5 \log^4 n)$ randomly sampled entries. In the process of proving this result, we solve two challenging sub-problems for tensors with missing data. First, in analyzing the initialization step, we prove a generalization of a celebrated result by Szemer\'edie et al. on the spectrum of random graphs. Next, we prove global convergence of alternating minimization with a good initialization. Simulations suggest that the dependence of the sample size on dimensionality $n$ is indeed tight.