anandkumar
31784d9fc1fa0d25d04eae50ac9bf787-Paper.pdf
Indeedin learning applications, where symmetric tensors areformed from statistical moments (higher-order covariances) or multivariate derivatives (higher-order Hessians), CP decomposition has enabled parameter estimation for mixtures of Gaussians [20, 35], generalized linear models [34], shallow neuralnetworks[19,24,42],deepernetworks[17,18,30],hiddenMarkovmodels[5],amongothers.
RobustifyingAlgorithmsofLearningLatentTrees withVectorVariables
We consider learning the structures of Gaussian latent tree models with vector observations when a subset of them are arbitrarily corrupted. First, we present the sample complexities of Recursive Grouping (RG)and Chow-Liu Recursive Grouping (CLRG)without theassumption thattheeffectivedepth isbounded in the number of observed nodes, significantly generalizing the results in Choi et al. (2011). We show that Chow-Liu initialization inCLRG greatly reduces the sample complexity ofRG from being exponential in the diameter of the tree to onlylogarithmic inthediameter forthehidden Markovmodel (HMM).
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Maryland (0.04)
- North America > United States > New York (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
85b42dd8aae56e01379be5736db5b496-AuthorFeedback.pdf
We would like to thank all the reviewers for their comprehensive reviews. We clarify the major comments below. As noted in Sec.6 (and suggested by As discussed in Sec.1, 1.1, 2-4, and Figure 1, TensorNOODL accomplishes Therefore, it seems that leveraging tensor structure may increase the computational complexity. Thank you for this insight. Further, TensorNOODL requires the initial dictionary estimate to follow A.2. for exact recovery at a linear Initializations which do not meet these conditions may still converge, albeit not at a linear rate.
Anima Anandkumar Highlights AI's Potential to Solve 'Hard Scientific Challenges'
Anima Anandkumar is using AI to help solve the world's challenges faster. She has used the technology to speed up prediction models in an effort to get ahead of extreme weather, and to work on sustainable nuclear fusion simulations so as to one day safely harness the energy source. Accepting a TIME100 AI Impact Award in Dubai on Monday, Anandkumar--a professor at California Institute of Technology who was previously the senior director of AI research at Nvidia--credited her engineer parents with setting an example for her. "Having a mom who is an engineer was just such a great role model right at home." Her parents, who brought computerized manufacturing to her hometown in India, opened up her world, she said.
- Asia > Middle East > UAE > Dubai Emirate > Dubai (0.29)
- North America > United States > California (0.27)
- Asia > India (0.27)
Sequential Transfer in Multi-armed Bandit with Finite Set of Models
Mohammad Gheshlaghi azar, Alessandro Lazaric, Emma Brunskill
Learning from prior tasks and transferring that experience to improve future performance is critical for building lifelong learning agents. Although results in supervised and reinforcement learning show that transfer may significantly improve the learning performance, most of the literature on transfer is focused on batch learning tasks. In this paper we study the problem of sequential transfer in online learning, notably in the multi-armed bandit framework, where the objective is to minimize the total regret over a sequence of tasks by transferring knowledge from prior tasks. We introduce a novel bandit algorithm based on a method-of-moments approach for estimating the possible tasks and derive regret bounds for it.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Workflow (0.48)
- Research Report (0.46)
- Instructional Material (0.34)
- Information Technology > Data Science > Data Mining > Big Data (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
Fourier Neural Operators for Learning Dynamics in Quantum Spin Systems
Shah, Freya, Patti, Taylor L., Berner, Julius, Tolooshams, Bahareh, Kossaifi, Jean, Anandkumar, Anima
Fourier Neural Operators (FNOs) excel on tasks using functional data, such as those originating from partial differential equations. Such characteristics render them an effective approach for simulating the time evolution of quantum wavefunctions, which is a computationally challenging, yet coveted task for understanding quantum systems. In this manuscript, we use FNOs to model the evolution of random quantum spin systems, so chosen due to their representative quantum dynamics and minimal symmetry. We explore two distinct FNO architectures and examine their performance for learning and predicting time evolution using both random and low-energy input states. Additionally, we apply FNOs to a compact set of Hamiltonian observables ($\sim\text{poly}(n)$) instead of the entire $2^n$ quantum wavefunction, which greatly reduces the size of our inputs and outputs and, consequently, the requisite dimensions of the resulting FNOs. Moreover, this Hamiltonian observable-based method demonstrates that FNOs can effectively distill information from high-dimensional spaces into lower-dimensional spaces. The extrapolation of Hamiltonian observables to times later than those used in training is of particular interest, as this stands to fundamentally increase the simulatability of quantum systems past both the coherence times of contemporary quantum architectures and the circuit-depths of tractable tensor networks.
- North America > United States > California > Santa Clara County > Santa Clara (0.04)
- North America > United States > California > Los Angeles County > Pasadena (0.04)
- Asia > India > Gujarat (0.04)
Reduced-Order Neural Operators: Learning Lagrangian Dynamics on Highly Sparse Graphs
Viswanath, Hrishikesh, Chang, Yue, Berner, Julius, Chen, Peter Yichen, Bera, Aniket
We present a neural operator architecture to simulate Lagrangian dynamics, such as fluid flow, granular flows, and elastoplasticity. Traditional numerical methods, such as the finite element method (FEM), suffer from long run times and large memory consumption. On the other hand, approaches based on graph neural networks are faster but still suffer from long computation times on dense graphs, which are often required for high-fidelity simulations. Our model, GIOROM or Graph Interaction Operator for Reduced-Order Modeling, learns temporal dynamics within a reduced-order setting, capturing spatial features from a highly sparse graph representation of the input and generalizing to arbitrary spatial locations during inference. The model is geometry-aware and discretization-agnostic and can generalize to different initial conditions, velocities, and geometries after training. We show that point clouds of the order of 100,000 points can be inferred from sparse graphs with $\sim$1000 points, with negligible change in computation time. We empirically evaluate our model on elastic solids, Newtonian fluids, Non-Newtonian fluids, Drucker-Prager granular flows, and von Mises elastoplasticity. On these benchmarks, our approach results in a 25$\times$ speedup compared to other neural network-based physics simulators while delivering high-fidelity predictions of complex physical systems and showing better performance on most benchmarks. The code and the demos are provided at https://github.com/HrishikeshVish/GIOROM.
- Europe (0.67)
- North America > United States (0.67)
Estimating Mixture Models via Mixtures of Polynomials
Mixture modeling is a general technique for making any simple model more expressive through weighted combination. This generality and simplicity in part explains the success of the Expectation Maximization (EM) algorithm, in which updates are easy to derive for a wide class of mixture models. However, the likelihood of a mixture model is non-convex, so EM has no known global convergence guarantees. Recently, method of moments approaches offer global guarantees for some mixture models, but they do not extend easily to the range of mixture models that exist. In this work, we present Polymom, an unifying framework based on method of moments in which estimation procedures are easily derivable, just as in EM. Polymom is applicable when the moments of a single mixture component are polynomials of the parameters. Our key observation is that the moments of the mixture model are a mixture of these polynomials, which allows us to cast estimation as a Generalized Moment Problem. We solve its relaxations using semidefinite optimization, and then extract parameters using ideas from computer algebra. This framework allows us to draw insights and apply tools from convex optimization, computer algebra and the theory of moments to study problems in statistical estimation.
- North America > United States > New York (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)