Goto

Collaborating Authors

 factor matrix


A Muon-Accelerated Algorithm for Low Separation Rank Tensor Generalized Linear Models

Liang, Xiao, Li, Shuang

arXiv.org Machine Learning

Tensor-valued data arise naturally in multidimensional signal and imaging problems, such as biomedical imaging. When incorporated into generalized linear models (GLMs), naive vectorization can destroy their multi-way structure and lead to high-dimensional, ill-posed estimation. To address this challenge, Low Separation Rank (LSR) decompositions reduce model complexity by imposing low-rank multilinear structure on the coefficient tensor. A representative approach for estimating LSR-based tensor GLMs (LSR-TGLMs) is the Low Separation Rank Tensor Regression (LSRTR) algorithm, which adopts block coordinate descent and enforces orthogonality of the factor matrices through repeated QR-based projections. However, the repeated projection steps can be computationally demanding and slow convergence. Motivated by the need for scalable estimation and classification from such data, we propose LSRTR-M, which incorporates Muon (MomentUm Orthogonalized by Newton-Schulz) updates into the LSRTR framework. Specifically, LSRTR-M preserves the original block coordinate scheme while replacing the projection-based factor updates with Muon steps. Across synthetic linear, logistic, and Poisson LSR-TGLMs, LSRTR-M converges faster in both iteration count and wall-clock time, while achieving lower normalized estimation and prediction errors. On the Vessel MNIST 3D task, it further improves computational efficiency while maintaining competitive classification performance.


Adaptive Subspace Modeling With Functional Tucker Decomposition

Steidle, Noah, De Jonghe, Joppe, Ishteva, Mariya

arXiv.org Machine Learning

Tensors provide a structured representation for multidimensional data, yet discretization can obscure important information when such data originates from continuous processes. We address this limitation by introducing a functional Tucker decomposition (FTD) that embeds mode-wise continuity constraints directly into the decomposition. The FTD employs reproducing kernel Hilbert spaces (RKHS) to model continuous modes without requiring an a-priori basis, while preserving the multi-linear subspace structure of the Tucker model. Through RKHS-driven representation, the model yields adaptive and expressive factor descriptions that enable targeted modeling of subspaces. The value of this approach is demonstrated in domain-variant tensor classification. In particular, we illustrate its effectiveness with classification tasks in hyperspectral imaging and multivariate time series analysis, highlighting the benefits of combining structural decomposition with functional adaptability.






FusedOrthogonalAlternatingLeastSquaresfor TensorClustering

Neural Information Processing Systems

Our paper adopts the CP decomposition because it handles heterogeneity in each mode, learns the clustering patterns across different modes of data in amore independent way, and provides flexibility for clustering a certain mode of the tensor without being affected by correlation with other modes. Our method is similar to those in a recent series of papers [27, 21] that use the CP decomposition structure. Note that their estimation algorithms use the framework oftensor power method [1].



Generalized Canonical Polyadic Tensor Decompositions with General Symmetry

Mulrooney, Alex, Hong, David

arXiv.org Machine Learning

Canonical Polyadic (CP) tensor decomposition is a workhorse algorithm for discovering underlying low-dimensional structure in tensor data. This is accomplished in conventional CP decomposition by fitting a low-rank tensor to data with respect to the least-squares loss. Generalized CP (GCP) decompositions generalize this approach by allowing general loss functions that can be more appropriate, e.g., to model binary and count data or to improve robustness to outliers. However, GCP decompositions do not explicitly account for any symmetry in the tensors, which commonly arises in modern applications. For example, a tensor formed by stacking the adjacency matrices of a dynamic graph over time will naturally exhibit symmetry along the two modes corresponding to the graph nodes. In this paper, we develop a symmetric GCP (SymGCP) decomposition that allows for general forms of symmetry, i.e., symmetry along any subset of the modes. SymGCP accounts for symmetry by enforcing the corresponding symmetry in the decomposition. We derive gradients for SymGCP that enable its efficient computation via all-at-once optimization with existing tensor kernels. The form of the gradients also leads to various stochastic approximations that enable us to develop stochastic SymGCP algorithms that can scale to large tensors. We demonstrate the utility of the proposed SymGCP algorithms with a variety of experiments on both synthetic and real data.


Sparse Tucker Decomposition and Graph Regularization for High-Dimensional Time Series Forecasting

Xia, Sijia, Ng, Michael K., Zhang, Xiongjun

arXiv.org Machine Learning

Existing methods of vector autoregressive model for multivariate time series analysis make use of low-rank matrix approximation or Tucker decomposition to reduce the dimension of the over-parameterization issue. In this paper, we propose a sparse Tucker decomposition method with graph regularization for high-dimensional vector autoregressive time series. By stacking the time-series transition matrices into a third-order tensor, the sparse Tucker decomposition is employed to characterize important interactions within the transition third-order tensor and reduce the number of parameters. Moreover, the graph regularization is employed to measure the local consistency of the response, predictor and temporal factor matrices in the vector autoregressive model.The two proposed regularization techniques can be shown to more accurate parameters estimation. A non-asymptotic error bound of the estimator of the proposed method is established, which is lower than those of the existing matrix or tensor based methods. A proximal alternating linearized minimization algorithm is designed to solve the resulting model and its global convergence is established under very mild conditions. Extensive numerical experiments on synthetic data and real-world datasets are carried out to verify the superior performance of the proposed method over existing state-of-the-art methods.