AITopics | Surya Ganguli

Collaborating Authors

Surya Ganguli

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Exponential expressivity in deep neural networks through transient chaos

Ben Poole, Subhaneil Lahiri, Maithra Raghu, Jascha Sohl-Dickstein, Surya Ganguli

Neural Information Processing SystemsJan-20-2025, 06:40:54 GMT

We combine Riemannian geometry with the mean field theory of high dimensional chaos to study the nature of signal propagation in generic, deep neural networks with random weights. Our results reveal an order-to-chaos expressivity phase transition, with networks in the chaotic phase computing nonlinear functions whose global curvature grows exponentially with depth but not width. We prove this generic class of deep random functions cannot be efficiently computed by any shallow network, going beyond prior work restricted to the analysis of single functions. Moreover, we formalize and quantitatively demonstrate the long conjectured idea that deep networks can disentangle highly curved manifolds in input space into flat manifolds in hidden space. Our theoretical analysis of the expressive power of deep networks broadly applies to arbitrary nonlinearities, and provides a quantitative underpinning for previously abstract notions about the geometry of deep functions.

artificial intelligence, curvature, machine learning, (19 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

An equivalence between high dimensional Bayes optimal inference and M-estimation

Madhu Advani, Surya Ganguli

Neural Information Processing SystemsJan-20-2025, 05:56:06 GMT

When recovering an unknown signal from noisy measurements, the computational difficulty of performing optimal Bayesian MMSE (minimum mean squared error) inference often necessitates the use of maximum a posteriori (MAP) inference, a special case of regularized M-estimation, as a surrogate. However, MAP is suboptimal in high dimensions, when the number of unknown signal components is similar to the number of measurements. In this work we demonstrate, when the signal distribution and the likelihood function associated with the noise are both log-concave, that optimal MMSE performance is asymptotically achievable via another M-estimation procedure. This procedure involves minimizing convex loss and regularizer functions that are nonlinearly smoothed versions of the widely applied MAP optimization problem. Our findings provide a new heuristic derivation and interpretation for recent optimal M-estimators found in the setting of linear measurements and additive noise, and further extend these results to nonlinear measurements with non-additive noise. We numerically demonstrate superior performance of our optimal M-estimators relative to MAP. Overall, at the heart of our work is the revelation of a remarkable equivalence between two seemingly very different computational problems: namely that of high dimensional Bayesian integration underlying MMSE inference, and high dimensional convex optimization underlying M-estimation. In essence we show that the former difficult integral may be computed by solving the latter, simpler optimization problem.

artificial intelligence, inference, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice

Jeffrey Pennington, Samuel Schoenholz, Surya Ganguli

Neural Information Processing SystemsOct-8-2024, 10:18:39 GMT

It is well known that weight initialization in deep networks can have a dramatic impact on learning speed. For example, ensuring the mean squared singular value of a network's input-output Jacobian is O(1) is essential for avoiding exponentially vanishing or exploding gradients. Moreover, in deep linear networks, ensuring that all singular values of the Jacobian are concentrated near 1 can yield a dramatic additional speed-up in learning; this is a property known as dynamical isometry. However, it is unclear how to achieve dynamical isometry in nonlinear deep networks. We address this question by employing powerful tools from free probability theory to analytically compute the entire singular value distribution of a deep network's input-output Jacobian.

artificial intelligence, dynamical isometry, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The emergence of multiple retinal cell types through efficient coding of natural movies

Samuel Ocko, Jack Lindsey, Surya Ganguli, Stephane Deny

Neural Information Processing SystemsOct-8-2024, 06:11:28 GMT

One of the most striking aspects of early visual processing in the retina is the immediate parcellation of visual information into multiple parallel pathways, formed by different retinal ganglion cell types each tiling the entire visual field. Existing theories of efficient coding have been unable to account for the functional advantages of such cell-type diversity in encoding natural scenes. Here we go beyond previous theories to analyze how a simple linear retinal encoding model with different convolutional cell types efficiently encodes naturalistic spatiotemporal movies given a fixed firing rate budget. We find that optimizing the receptive fields and cell densities of two cell types makes them match the properties of the two main cell types in the primate retina, midget and parasol cells, in terms of spatial and temporal sensitivity, cell spacing, and their relative ratio. Moreover, our theory gives a precise account of how the ratio of midget to parasol cells decreases with retinal eccentricity. Also, we train a nonlinear encoding model with a rectifying nonlinearity to efficiently encode naturalistic movies, and again find emergent receptive fields resembling those of midget and parasol cells that are now further subdivided into ON and OFF types. Thus our work provides a theoretical justification, based on the efficient coding of natural movies, for the existence of the four most dominant cell types in the primate retina that together comprise 70% of all ganglion cells.

artificial intelligence, cell type, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.94)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Statistical mechanics of low-rank tensor decomposition

Jonathan Kadmon, Surya Ganguli

Neural Information Processing SystemsOct-8-2024, 00:51:45 GMT

Often, large, high dimensional datasets collected across multiple modalities can be organized as a higher order tensor. Low-rank tensor decomposition then arises as a powerful and widely used tool to discover simple low dimensional structures underlying such data. However, we currently lack a theoretical understanding of the algorithmic behavior of low-rank tensor decompositions. We derive Bayesian approximate message passing (AMP) algorithms for recovering arbitrarily shaped low-rank tensors buried within noise, and we employ dynamic mean field theory to precisely characterize their performance. Our theory reveals the existence of phase transitions between easy, hard and impossible inference regimes, and displays an excellent match with simulations. Moreover it reveals several qualitative surprises compared to the behavior of symmetric, cubic tensor decomposition. Finally, we compare our AMP algorithm to the most commonly used algorithm, alternating least squares (ALS), and demonstrate that AMP significantly outperforms ALS in the presence of noise.

artificial intelligence, machine learning, tensor, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Variational Walkback: Learning a Transition Operator as a Stochastic Recurrent Net

Anirudh Goyal ALIAS PARTH GOYAL, Nan Rosemary Ke, Surya Ganguli, Yoshua Bengio

Neural Information Processing SystemsOct-7-2024, 20:28:45 GMT

We propose a novel method to directly learn a stochastic transition operator whose repeated application provides generated samples. Traditional undirected graphical models approach this problem indirectly by learning a Markov chain model whose stationary distribution obeys detailed balance with respect to a parameterized energy function. The energy function is then modified so the model and data distributions match, with no guarantee on the number of steps required for the Markov chain to converge. Moreover, the detailed balance condition is highly restrictive: energy based models corresponding to neural networks must have symmetric weights, unlike biological neural circuits. In contrast, we develop a method for directly learning arbitrarily parameterized transition operators capable of expressing nonequilibrium stationary distributions that violate detailed balance, thereby enabling us to learn more biologically plausible asymmetric neural networks and more general non-energy based dynamical systems. The proposed training objective, which we derive via principled variational methods, encourages the transition operator to "walk back" (prefer to revert its steps) in multi-step trajectories that start at datapoints, as quickly as possible back to the original data points. We present a series of experimental results illustrating the soundness of the proposed approach, Variational Walkback (VW), on the MNIST, CIFAR-10, SVHN and CelebA datasets, demonstrating superior samples compared to earlier attempts to learn a transition operator. We also show that although each rapid training trajectory is limited to a finite but variable number of steps, our transition operator continues to generate good samples well past the length of such trajectories, thereby demonstrating the match of its non-equilibrium stationary distribution to the data distribution.

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Workflow (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.54)

Add feedback

Task-Driven Convolutional Recurrent Models of the Visual System

Aran Nayebi, Daniel Bear, Jonas Kubilius, Kohitij Kar, Surya Ganguli, David Sussillo, James J. DiCarlo, Daniel L. Yamins

Neural Information Processing SystemsOct-7-2024, 12:34:34 GMT

Feed-forward convolutional neural networks (CNNs) are currently state-of-the-art for object classification tasks such as ImageNet. Further, they are quantitatively accurate models of temporally-averaged responses of neurons in the primate brain's visual system. However, biological visual systems have two ubiquitous architectural features not shared with typical CNNs: local recurrence within cortical areas, and long-range feedback from downstream areas to upstream areas.

artificial intelligence, convrnn, machine learning, (20 more...)

Neural Information Processing Systems

Country: North America > United States > California > Santa Clara County (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.49)

Technology: