AITopics | Finzi, Marc

Collaborating Authors

Finzi, Marc

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization

Lotfi, Sanae, Finzi, Marc, Kapoor, Sanyam, Potapczynski, Andres, Goldblum, Micah, Wilson, Andrew Gordon

arXiv.org Artificial IntelligenceNov-24-2022

While there has been progress in developing non-vacuous generalization bounds for deep neural networks, these bounds tend to be uninformative about why deep learning works. In this paper, we develop a compression approach based on quantizing neural network parameters in a linear subspace, profoundly improving on previous results to provide state-of-the-art generalization bounds on a variety of tasks, including transfer learning. We use these tight bounds to better understand the role of model size, equivariance, and the implicit biases of optimization, for generalization in deep learning. Notably, we find large models can be compressed to a much greater extent than previously known, encapsulating Occam's razor. We also argue for data-independent bounds in explaining generalization.

artificial intelligence, generalization, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2211.13609

Country:

Europe (0.93)
North America > United States > California (0.46)
North America > Canada > British Columbia (0.28)

Genre:

Research Report (0.63)
Overview (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deconstructing the Inductive Biases of Hamiltonian Neural Networks

Gruver, Nate, Finzi, Marc, Stanton, Samuel, Wilson, Andrew Gordon

arXiv.org Machine LearningFeb-11-2022

Physics-inspired neural networks (NNs), such as Hamiltonian or Lagrangian NNs, dramatically outperform other learned dynamics models by leveraging strong inductive biases. These models, however, are challenging to apply to many real world systems, such as those that don't conserve energy or contain contacts, a common setting for robotics and reinforcement learning. In this paper, we examine the inductive biases that make physics-inspired models successful in practice. We show that, contrary to conventional wisdom, the improved generalization of HNNs is the result of modeling acceleration directly and avoiding artificial complexity from the coordinate system, rather than symplectic structure or energy conservation. We show that by relaxing the inductive biases of these models, we can match or exceed performance on energy-conserving systems while dramatically improving performance on practical, non-conservative systems. We extend this approach to constructing transition models for common Mujoco environments, showing that our model can appropriately balance inductive biases with the flexibility required for model-based control.

artificial intelligence, hamiltonian neural network, machine learning, (2 more...)

arXiv.org Machine Learning

2202.04836

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.60)

Add feedback

Residual Pathway Priors for Soft Equivariance Constraints

Finzi, Marc, Benton, Gregory, Wilson, Andrew Gordon

arXiv.org Machine LearningDec-2-2021

There is often a trade-off between building deep learning systems that are expressive enough to capture the nuances of the reality, and having the right inductive biases for efficient learning. We introduce Residual Pathway Priors (RPPs) as a method for converting hard architectural constraints into soft priors, guiding models towards structured solutions, while retaining the ability to capture additional complexity. Using RPPs, we construct neural network priors with inductive biases for equivariances, but without limiting flexibility. We show that RPPs are resilient to approximate or misspecified symmetries, and are as effective as fully constrained models even when symmetries are exact. We showcase the broad applicability of RPPs with dynamical systems, tabular data, and reinforcement learning. In Mujoco locomotion tasks, where contact forces and directional rewards violate strict equivariance assumptions, the RPP outperforms baseline model-free RL agents, and also improves the learned transition models for model-based RL.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

2112.01388

Country: North America > United States > California (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SKIing on Simplices: Kernel Interpolation on the Permutohedral Lattice for Scalable Gaussian Processes

Kapoor, Sanyam, Finzi, Marc, Wang, Ke Alexander, Wilson, Andrew Gordon

arXiv.org Machine LearningJun-12-2021

State-of-the-art methods for scalable Gaussian processes use iterative algorithms, requiring fast matrix vector multiplies (MVMs) with the covariance kernel. The Structured Kernel Interpolation (SKI) framework accelerates these MVMs by performing efficient MVMs on a grid and interpolating back to the original space. In this work, we develop a connection between SKI and the permutohedral lattice used for high-dimensional fast bilateral filtering. Using a sparse simplicial grid instead of a dense rectangular one, we can perform GP inference exponentially faster in the dimension than SKI. Our approach, Simplex-GP, enables scaling SKI to high dimensions, while maintaining strong predictive performance. We additionally provide a CUDA implementation of Simplex-GP, which enables significant GPU acceleration of MVM based inference.

artificial intelligence, lattice, machine learning, (14 more...)

arXiv.org Machine Learning

2106.06695

Country: North America > United States > California (0.28)

Genre: Research Report (0.84)

Industry: Leisure & Entertainment > Sports > Skiing (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Data Science (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

A Practical Method for Constructing Equivariant Multilayer Perceptrons for Arbitrary Matrix Groups

Finzi, Marc, Welling, Max, Wilson, Andrew Gordon

arXiv.org Machine LearningApr-19-2021

Symmetries and equivariance are fundamental to the generalization of neural networks on domains such as images, graphs, and point clouds. Existing work has primarily focused on a small number of groups, such as the translation, rotation, and permutation groups. In this work we provide a completely general algorithm for solving for the equivariant layers of matrix groups. In addition to recovering solutions from other works as special cases, we construct multilayer perceptrons equivariant to multiple groups that have never been tackled before, including $\mathrm{O}(1,3)$, $\mathrm{O}(5)$, $\mathrm{Sp}(n)$, and the Rubik's cube group. Our approach outperforms non-equivariant baselines, with applications to particle physics and dynamical systems. We release our software library to enable researchers to construct equivariant layers for arbitrary matrix groups.

neural network, representation, rubik's cube, (17 more...)

arXiv.org Machine Learning

2104.09459

Country: North America > United States > New York (0.14)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Rubik's Cube (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)

Add feedback

Simplifying Hamiltonian and Lagrangian Neural Networks via Explicit Constraints

Finzi, Marc, Wang, Ke Alexander, Wilson, Andrew Gordon

arXiv.org Machine LearningOct-26-2020

Reasoning about the physical world requires models that are endowed with the right inductive biases to learn the underlying dynamics. Recent works improve generalization for predicting trajectories by learning the Hamiltonian or Lagrangian of a system rather than the differential equations directly. While these methods encode the constraints of the systems using generalized coordinates, we show that embedding the system into Cartesian coordinates and enforcing the constraints explicitly with Lagrange multipliers dramatically simplifies the learning problem. We introduce a series of challenging chaotic and extended-body systems, including systems with N-pendulums, spring coupling, magnetic fields, rigid rotors, and gyroscopes, to push the limits of current approaches. Our experiments show that Cartesian coordinates with explicit constraints lead to a 100x improvement in accuracy and data efficiency.

artificial intelligence, constraint, neural network, (17 more...)

arXiv.org Machine Learning

2010.13581

Country: North America > Canada (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Learning Invariances in Neural Networks

Benton, Gregory, Finzi, Marc, Izmailov, Pavel, Wilson, Andrew Gordon

arXiv.org Machine LearningOct-22-2020

Invariances to translations have imbued convolutional neural networks with powerful generalization properties. However, we often do not know a priori what invariances are present in the data, or to what extent a model should be invariant to a given symmetry group. We show how to \emph{learn} invariances and equivariances by parameterizing a distribution over augmentations and optimizing the training loss simultaneously with respect to the network parameters and augmentation parameters. With this simple procedure we can recover the correct set and extent of invariances on image classification, regression, segmentation, and molecular property prediction from a large space of augmentations, on training data alone.

augmentation, deep learning, neural network, (18 more...)

arXiv.org Machine Learning

2010.11882

Country: North America > United States > New York (0.14)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improving Consistency-Based Semi-Supervised Learning with Weight Averaging

Athiwaratkun, Ben, Finzi, Marc, Izmailov, Pavel, Wilson, Andrew Gordon

arXiv.org Artificial IntelligenceJun-19-2018

Recent advances in deep unsupervised learning have renewed interest in semi-supervised methods, which can learn from both labeled and unlabeled data. Presently the most successful approaches to semi-supervised learning are based on consistency regularization, whereby a model is trained to be robust to small perturbations of its inputs and parameters. We show that consistency regularization leads to flatter but narrower optima. We also show that the test error surface for these methods is approximately convex in regions of weight space traversed by SGD. Inspired by these observations, we propose to train consistency based semi-supervised models with stochastic weight averaging (SWA), a recent method which averages weights along the trajectory of SGD. We also develop fast-SWA, which further accelerates convergence by averaging multiple points within each cycle of a cyclical learning rate schedule. With fast-SWA we achieve the best known semi-supervised results on CIFAR-10 and CIFAR-100 over many different numbers of observed training labels. For example, we achieve 95.0% accuracy on CIFAR-10 with only 4000 labels, compared to the previous best result in the literature of 93.7%. We also improve the best known accuracy for domain adaptation from CIFAR-10 to STL from 80% to 83%. Finally, we show that with fast-SWA the simple $\Pi$ model becomes state-of-the-art for large labeled settings.

deep learning, neural network, semi-supervised learning, (18 more...)

arXiv.org Artificial Intelligence

1806.05594

Country: North America > Canada > Quebec (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback