Goto

Collaborating Authors

 performance comparison


A Muon-Accelerated Algorithm for Low Separation Rank Tensor Generalized Linear Models

Liang, Xiao, Li, Shuang

arXiv.org Machine Learning

Tensor-valued data arise naturally in multidimensional signal and imaging problems, such as biomedical imaging. When incorporated into generalized linear models (GLMs), naive vectorization can destroy their multi-way structure and lead to high-dimensional, ill-posed estimation. To address this challenge, Low Separation Rank (LSR) decompositions reduce model complexity by imposing low-rank multilinear structure on the coefficient tensor. A representative approach for estimating LSR-based tensor GLMs (LSR-TGLMs) is the Low Separation Rank Tensor Regression (LSRTR) algorithm, which adopts block coordinate descent and enforces orthogonality of the factor matrices through repeated QR-based projections. However, the repeated projection steps can be computationally demanding and slow convergence. Motivated by the need for scalable estimation and classification from such data, we propose LSRTR-M, which incorporates Muon (MomentUm Orthogonalized by Newton-Schulz) updates into the LSRTR framework. Specifically, LSRTR-M preserves the original block coordinate scheme while replacing the projection-based factor updates with Muon steps. Across synthetic linear, logistic, and Poisson LSR-TGLMs, LSRTR-M converges faster in both iteration count and wall-clock time, while achieving lower normalized estimation and prediction errors. On the Vessel MNIST 3D task, it further improves computational efficiency while maintaining competitive classification performance.









we address some of the questions raised by the reviewers as much as time and space allows

Neural Information Processing Systems

First, we thank all the reviewers for their invaluable assessment of our paper in this challenging time. To provide more reliable evidence that AdvFlow's distributional For the sake of completeness, we also add LID [31] The results are given in Table 1. This is indicating that the attacker's distributional properties are fooling the detectors. As seen, we get similar results to Table 2 of the paper, outperforming SimBA in defended baselines. Note that some of the current SOT A results in black-box adversarial attacks come from the attacker's knowledge about the However, once the target changes its training procedure (e.g., from vanilla See the official repo. of SimBA, where it clearly is indicated that the The results of Table 1 and 2 (as well as SVHN) will be added to the camera-ready version.