anandkumar
Online and Differentially-Private Tensor Decomposition
Tensor decomposition is an important tool for big data analysis. In this paper, we resolve many of the key algorithmic questions regarding robustness, memory efficiency, and differential privacy of tensor decomposition. We propose simple variants of the tensor power method which enjoy these strong properties. We present the first guarantees for online tensor power method which has a linear memory requirement. Moreover, we present a noise calibrated tensor power method with efficient privacy guarantees. At the heart of all these guarantees lies a careful perturbation analysis derived in this paper which improves up on the existing results significantly.
31784d9fc1fa0d25d04eae50ac9bf787-Paper.pdf
Indeedin learning applications, where symmetric tensors areformed from statistical moments (higher-order covariances) or multivariate derivatives (higher-order Hessians), CP decomposition has enabled parameter estimation for mixtures of Gaussians [20, 35], generalized linear models [34], shallow neuralnetworks[19,24,42],deepernetworks[17,18,30],hiddenMarkovmodels[5],amongothers.
RobustifyingAlgorithmsofLearningLatentTrees withVectorVariables
We consider learning the structures of Gaussian latent tree models with vector observations when a subset of them are arbitrarily corrupted. First, we present the sample complexities of Recursive Grouping (RG)and Chow-Liu Recursive Grouping (CLRG)without theassumption thattheeffectivedepth isbounded in the number of observed nodes, significantly generalizing the results in Choi et al. (2011). We show that Chow-Liu initialization inCLRG greatly reduces the sample complexity ofRG from being exponential in the diameter of the tree to onlylogarithmic inthediameter forthehidden Markovmodel (HMM).
SAOT: An Enhanced Locality-Aware Spectral Transformer for Solving PDEs
Zhou, Chenhong, Chen, Jie, Yang, Zaifeng
Neural operators have shown great potential in solving a family of Partial Differential Equations (PDEs) by modeling the mappings between input and output functions. Fourier Neural Operator (FNO) implements global convolutions via parameterizing the integral operators in Fourier space. However, it often results in over-smoothing solutions and fails to capture local details and high-frequency components. To address these limitations, we investigate incorporating the spatial-frequency localization property of Wavelet transforms into the Transformer architecture. We propose a novel Wavelet Attention (WA) module with linear computational complexity to efficiently learn locality-aware features. Building upon WA, we further develop the Spectral Attention Operator Transformer (SAOT), a hybrid spectral Transformer framework that integrates WA's localized focus with the global receptive field of Fourier-based Attention (FA) through a gated fusion block. Experimental results demonstrate that WA significantly mitigates the limitations of FA and outperforms existing Wavelet-based neural operators by a large margin. By integrating the locality-aware and global spectral representations, SAOT achieves state-of-the-art performance on six operator learning benchmarks and exhibits strong discretization-invariant ability.
85b42dd8aae56e01379be5736db5b496-AuthorFeedback.pdf
We would like to thank all the reviewers for their comprehensive reviews. We clarify the major comments below. As noted in Sec.6 (and suggested by As discussed in Sec.1, 1.1, 2-4, and Figure 1, TensorNOODL accomplishes Therefore, it seems that leveraging tensor structure may increase the computational complexity. Thank you for this insight. Further, TensorNOODL requires the initial dictionary estimate to follow A.2. for exact recovery at a linear Initializations which do not meet these conditions may still converge, albeit not at a linear rate.
Anima Anandkumar Highlights AI's Potential to Solve 'Hard Scientific Challenges'
Anima Anandkumar is using AI to help solve the world's challenges faster. She has used the technology to speed up prediction models in an effort to get ahead of extreme weather, and to work on sustainable nuclear fusion simulations so as to one day safely harness the energy source. Accepting a TIME100 AI Impact Award in Dubai on Monday, Anandkumar--a professor at California Institute of Technology who was previously the senior director of AI research at Nvidia--credited her engineer parents with setting an example for her. "Having a mom who is an engineer was just such a great role model right at home." Her parents, who brought computerized manufacturing to her hometown in India, opened up her world, she said.
Sequential Transfer in Multi-armed Bandit with Finite Set of Models
Mohammad Gheshlaghi azar, Alessandro Lazaric, Emma Brunskill
Learning from prior tasks and transferring that experience to improve future performance is critical for building lifelong learning agents. Although results in supervised and reinforcement learning show that transfer may significantly improve the learning performance, most of the literature on transfer is focused on batch learning tasks. In this paper we study the problem of sequential transfer in online learning, notably in the multi-armed bandit framework, where the objective is to minimize the total regret over a sequence of tasks by transferring knowledge from prior tasks. We introduce a novel bandit algorithm based on a method-of-moments approach for estimating the possible tasks and derive regret bounds for it.
Fourier Neural Operators for Learning Dynamics in Quantum Spin Systems
Shah, Freya, Patti, Taylor L., Berner, Julius, Tolooshams, Bahareh, Kossaifi, Jean, Anandkumar, Anima
Fourier Neural Operators (FNOs) excel on tasks using functional data, such as those originating from partial differential equations. Such characteristics render them an effective approach for simulating the time evolution of quantum wavefunctions, which is a computationally challenging, yet coveted task for understanding quantum systems. In this manuscript, we use FNOs to model the evolution of random quantum spin systems, so chosen due to their representative quantum dynamics and minimal symmetry. We explore two distinct FNO architectures and examine their performance for learning and predicting time evolution using both random and low-energy input states. Additionally, we apply FNOs to a compact set of Hamiltonian observables ($\sim\text{poly}(n)$) instead of the entire $2^n$ quantum wavefunction, which greatly reduces the size of our inputs and outputs and, consequently, the requisite dimensions of the resulting FNOs. Moreover, this Hamiltonian observable-based method demonstrates that FNOs can effectively distill information from high-dimensional spaces into lower-dimensional spaces. The extrapolation of Hamiltonian observables to times later than those used in training is of particular interest, as this stands to fundamentally increase the simulatability of quantum systems past both the coherence times of contemporary quantum architectures and the circuit-depths of tractable tensor networks.