scalable tucker tensor decomposition
Singleshot : a scalable Tucker tensor decomposition
This paper introduces a new approach for the scalable Tucker decomposition problem. Given a tensor X, the method proposed allows to infer the latent factors by processing one subtensor drawn from X at a time. The key principle of our approach is based on the recursive computations of gradient and on cyclic update of factors involving only one single step of gradient descent. We further improve the computational efficiency of this algorithm by proposing an inexact gradient version. These two algorithms are backed with theoretical guarantees of convergence and convergence rate under mild conditions. The scalabilty of the proposed approaches which can be easily extended to handle some common constraints encountered in tensor decomposition (e.g non-negativity), is proven via numerical experiments on both synthetic and real data sets.
Reviews: Singleshot : a scalable Tucker tensor decomposition
I have read the authors authors rebuttal; I still believe that the experiments are not convincing enough and that a paper claiming to beat the state-of-the-art in scalable tensor decompositions should contain more thorough and clearer experiment setup (larger-scale experiments, fewer ambiguities about experiment setup decisions). The authors demonstrate how the memory blow-up occurring during the Tucker decomposition can be avoided, by formulating the derivations in a way that sub-tensor partitions of the original input tensor can be processed in a sequential manner. The authors provide convergence guarantees of their algorithm to a point where the gradient is equal to zero. Although the approach is sensible and it is backed by theoretical analysis, I find that the experiments are not convincing enough for a work in which improved scalability is regarded as the main advancement. I first list my comments regarding the experiments and provide with additional comments below: - The authors choose to "arbitrarily set the size of the Movielens and Enron tensors to M M 200 and M M 610".
Reviews: Singleshot : a scalable Tucker tensor decomposition
The paper proposes efficient methods for computing the Tucker decomposition of higher-order tensors. The problem is a hard, basic problem in numerical linear algebra with reasonably wide applicability. Tensor decompositions have played an important role in a variety of machine learning applications, see for example: Anandkumar et al "Tensor Decompositions for Learning Latent Variable Models" JMLR 2014; Novikov et al "Tensorizing Neural Networks" NeurIPS 2015, which used tensor decompositions to massively compress the dense layers of VGG; Moitra and Wein "Spectral Methods from Tensor Networks"; and Becker and Osman "Low rank Tucker decompositions of large tensors using tensorsketch" NeurIPS 2018. Singleshot is a coordinate descent based algorithm which applies gradient updates to variables in the Tucker decomposition, which it cycles over. The paper carefully considers the memory usage of Singleshot (and its variants) since tensor computations are often extremely memory intensive.
Singleshot : a scalable Tucker tensor decomposition
This paper introduces a new approach for the scalable Tucker decomposition problem. Given a tensor X, the method proposed allows to infer the latent factors by processing one subtensor drawn from X at a time. The key principle of our approach is based on the recursive computations of gradient and on cyclic update of factors involving only one single step of gradient descent. We further improve the computational efficiency of this algorithm by proposing an inexact gradient version. These two algorithms are backed with theoretical guarantees of convergence and convergence rate under mild conditions.
Singleshot : a scalable Tucker tensor decomposition
Traore, Abraham, Berar, Maxime, Rakotomamonjy, Alain
This paper introduces a new approach for the scalable Tucker decomposition problem. Given a tensor X, the method proposed allows to infer the latent factors by processing one subtensor drawn from X at a time. The key principle of our approach is based on the recursive computations of gradient and on cyclic update of factors involving only one single step of gradient descent. We further improve the computational efficiency of this algorithm by proposing an inexact gradient version. These two algorithms are backed with theoretical guarantees of convergence and convergence rate under mild conditions.