AITopics | tnt

In the present paper we study the performance of linear denoisers for noisy data of the form $\mathbf{x} + \mathbf{z}$, where $\mathbf{x} \in \mathbb{R}^d$ is the desired data with zero mean and unknown covariance $\mathbfΣ$, and $\mathbf{z} \sim \mathcal{N}(0, \mathbfΣ_{\mathbf{z}})$ is additive noise. Since the covariance $\mathbfΣ$ is not known, the standard Wiener filter cannot be employed for denoising. Instead we assume we are given samples $\mathbf{x}_1,\dots,\mathbf{x}_n \in \mathbb{R}^d$ from the true distribution. A standard approach would then be to estimate $\mathbfΣ$ from the samples and use it to construct an ``empirical" Wiener filter. However, in this paper, motivated by the denoising step in diffusion models, we take a different approach whereby we train a linear denoiser $\mathbf{W}$ from the data itself. In particular, we synthetically construct noisy samples $\hat{\mathbf{x}}_i$ of the data by injecting the samples with Gaussian noise with covariance $\mathbfΣ_1 \neq \mathbfΣ_{\mathbf{z}}$ and find the best $\mathbf{W}$ that approximates $\mathbf{W}\hat{\mathbf{x}}_i \approx \mathbf{x}_i$ in a least-squares sense. In the proportional regime $\frac{n}{d} \rightarrow κ> 1$ we use the {\it Convex Gaussian Min-Max Theorem (CGMT)} to analytically find the closed form expression for the generalization error of the denoiser obtained from this process. Using this expression one can optimize over $\mathbfΣ_1$ to find the best possible denoiser. Our numerical simulations show that our denoiser outperforms the ``empirical" Wiener filter in many scenarios and approaches the optimal Wiener filter as $κ\rightarrow\infty$.

artificial intelligence, machine learning, tnt, (17 more...)

arXiv.org Machine Learning

2603.18483

Country:

North America > United States > California > Los Angeles County > Pasadena (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

dae3312c4c6c7000a37ecfb7b0aeb0e4-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 11:05:37 GMT

algorithm, matrix, shampoo, (14 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Add feedback

dae3312c4c6c7000a37ecfb7b0aeb0e4-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 11:05:33 GMT

Based on the so-calledtensor normal(TN) distribution [31],wepropose andanalyze abrandnewapproximate natural gradient method, Tensor Normal Training(TNT), which likeShampoo, only requires knowledge of the shape of the training parameters.

artificial intelligence, machine learning, matrix, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

can be further fine-tuned using alternating optimization [

Neural Information Processing SystemsFeb-9-2026, 08:34:39 GMT

Do the main claims made in the abstract and introduction accurately reflect the paper's

artificial intelligence, machine learning, tnt, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Tensor Normal Training for Deep Learning Models

Neural Information Processing SystemsDec-25-2025, 00:45:41 GMT

Despite the predominant use of first-order methods for training deep learning models, second-order methods, and in particular, natural gradient methods, remain of interest because of their potential for accelerating training through the use of curvature information. Several methods with non-diagonal preconditioning matrices, including KFAC, Shampoo, and K-BFGS, have been proposed and shown to be effective. Based on the so-called tensor normal (TN) distribution, we propose and analyze a brand new approximate natural gradient method, Tensor Normal Training (TNT), which like Shampoo, only requires knowledge of the shape of the training parameters. By approximating the probabilistically based Fisher matrix, as opposed to the empirical Fisher matrix, our method uses the block-wise covariance of the sampling based gradient as the pre-conditioning matrix. Moreover, the assumption that the sampling-based (tensor) gradient follows a TN distribution, ensures that its covariance has a Kronecker separable structure, which leads to a tractable approximation to the Fisher matrix. Consequently, TNT's memory requirements and per-iteration computational costs are only slightly higher than those for first-order methods. In our experiments, TNT exhibited superior optimization performance to state-of-the-art first-order methods, and comparable optimization performance to the state-of-the-art second-order methods KFAC and Shampoo. Moreover, TNT demonstrated its ability to generalize as well as first-order methods, while using fewer epochs.

first-order method, name change, tensor normal training, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

Add feedback

TNT: Improving Chunkwise Training for Test-Time Memorization

Li, Zeman, Behrouz, Ali, Deng, Yuan, Zhong, Peilin, Kacham, Praneeth, Karami, Mahdi, Razaviyayn, Meisam, Mirrokni, Vahab

arXiv.org Artificial IntelligenceNov-11-2025

Recurrent neural networks (RNNs) with deep test-time memorization modules, such as Titans and TTT, represent a promising, linearly-scaling paradigm distinct from Transformers. While these expressive models do not yet match the peak performance of state-of-the-art Transformers, their potential has been largely untapped due to prohibitively slow training and low hardware utilization. Existing parallelization methods force a fundamental conflict governed by the chunksize hyperparameter: large chunks boost speed but degrade performance, necessitating a fixed, suboptimal compromise. To solve this challenge, we introduce TNT, a novel training paradigm that decouples training efficiency from inference performance through a two-stage process. Stage one is an efficiency-focused pre-training phase utilizing a hierarchical memory. A global module processes large, hardware-friendly chunks for long-range context, while multiple parallel local modules handle fine-grained details. Crucially, by periodically resetting local memory states, we break sequential dependencies to enable massive context parallelization. Stage two is a brief fine-tuning phase where only the local memory modules are adapted to a smaller, high-resolution chunksize, maximizing accuracy with minimal overhead. Evaluated on Titans and TTT models, TNT achieves a substantial acceleration in training speed-up to 17 times faster than the most accurate baseline configuration - while simultaneously improving model accuracy. This improvement removes a critical scalability barrier, establishing a practical foundation for developing expressive RNNs and facilitating future work to close the performance gap with Transformers.

artificial intelligence, machine learning, memory module, (15 more...)

arXiv.org Artificial Intelligence

2511.07343

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.61)

Add feedback

A Some Tensor Definitions and Properties

Neural Information Processing SystemsAug-17-2025, 20:23:22 GMT

We present in this section fairly standard notation and definitions regarding tensors, e.g., see [ Chapter 3 of [30], that we use throughout the paper. Note that when A is a matrix, this corresponds to the row-major vectorization of A . Lemma 3. Now assume that (6) holds for 1, 2,...,k 1. For k, we let H " b The proof of Theorem 1 follows from Theorem 2.8 in [ Finally, Algorithm 2 itself ensures AS.4 in Hence, by Theorem 2.8 of [44], the result is guaranteed. In Algorithm 3, we present a detailed pseudo-code for our actual implementation of TNT.

artificial intelligence, machine learning, shampoo, (17 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Add feedback

dae3312c4c6c7000a37ecfb7b0aeb0e4-Paper.pdf

Neural Information Processing SystemsAug-17-2025, 20:23:19 GMT

artificial intelligence, machine learning, matrix, (17 more...)

Neural Information Processing Systems

Country: North America > United States > New York > New York County > New York City (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

854d9fca60b4bd07f9bb215d59ef5561-Supplemental.pdf

Neural Information Processing SystemsAug-15-2025, 15:18:44 GMT

artificial intelligence, attention map, machine learning, (13 more...)

Neural Information Processing Systems

Country: Asia > Macao (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.33)

Add feedback

Tensor Normal Training for Deep Learning Models

Neural Information Processing SystemsJan-19-2025, 09:30:26 GMT

Despite the predominant use of first-order methods for training deep learning models, second-order methods, and in particular, natural gradient methods, remain of interest because of their potential for accelerating training through the use of curvature information. Several methods with non-diagonal preconditioning matrices, including KFAC, Shampoo, and K-BFGS, have been proposed and shown to be effective. Based on the so-called tensor normal (TN) distribution, we propose and analyze a brand new approximate natural gradient method, Tensor Normal Training (TNT), which like Shampoo, only requires knowledge of the shape of the training parameters. By approximating the probabilistically based Fisher matrix, as opposed to the empirical Fisher matrix, our method uses the block-wise covariance of the sampling based gradient as the pre-conditioning matrix. Moreover, the assumption that the sampling-based (tensor) gradient follows a TN distribution, ensures that its covariance has a Kronecker separable structure, which leads to a tractable approximation to the Fisher matrix.

first-order method, matrix, tensor normal training, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Collaborating Authors

tnt

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Precise Performance of Linear Denoisers in the Proportional Regime

dae3312c4c6c7000a37ecfb7b0aeb0e4-Supplemental.pdf

dae3312c4c6c7000a37ecfb7b0aeb0e4-Paper.pdf

can be further fine-tuned using alternating optimization [

Tensor Normal Training for Deep Learning Models

TNT: Improving Chunkwise Training for Test-Time Memorization

A Some Tensor Definitions and Properties

dae3312c4c6c7000a37ecfb7b0aeb0e4-Paper.pdf

854d9fca60b4bd07f9bb215d59ef5561-Supplemental.pdf

Tensor Normal Training for Deep Learning Models