AITopics

2412.06545

Country: North America > United States (0.28)

Genre:

Contests & Prizes (1.00)
Research Report > New Finding (0.68)

Industry:

Leisure & Entertainment (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

arXiv.org Machine LearningJun-5-2024

Feature learning in finite-width Bayesian deep linear networks with multiple outputs and convolutional layers

Bassetti, Federico, Gherardi, Marco, Ingrosso, Alessandro, Pastore, Mauro, Rotondo, Pietro

Deep linear networks have been extensively studied, as they provide simplified models of deep learning. However, little is known in the case of finite-width architectures with multiple outputs and convolutional layers. In this manuscript, we provide rigorous results for the statistics of functions implemented by the aforementioned class of networks, thus moving closer to a complete characterization of feature learning in the Bayesian setting. Our results include: (i) an exact and elementary non-asymptotic integral representation for the joint prior distribution over the outputs, given in terms of a mixture of Gaussians; (ii) an analytical formula for the posterior distribution in the case of squared error loss function (Gaussian likelihood); (iii) a quantitative description of the feature learning infinite-width regime, using large deviation theory. From a physical perspective, deep architectures with multiple outputs or convolutional layers represent different manifestations of kernel shape renormalization, and our work provides a dictionary that translates this physics intuition and terminology into rigorous Bayesian statistics.

artificial intelligence, finite-width bayesian deep linear network, machine learning, (2 more...)

2406.0326

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

arXiv.org Artificial IntelligenceJul-5-2023

Machine learning at the mesoscale: a computation-dissipation bottleneck

Ingrosso, Alessandro, Panizon, Emanuele

The cost of information processing in physical systems calls for a trade-off between performance and energetic expenditure. Here we formulate and study a computation-dissipation bottleneck in mesoscopic systems used as input-output devices. Using both real datasets and synthetic tasks, we show how non-equilibrium leads to enhanced performance. Our framework sheds light on a crucial compromise between information compression, input-output computation and dynamic irreversibility induced by non-reciprocal interactions.

artificial intelligence, information, machine learning, (13 more...)

2307.02379

Country: Europe (0.28)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)

arXiv.org Artificial IntelligenceMay-26-2023

Neural networks trained with SGD learn distributions of increasing complexity

Refinetti, Maria, Ingrosso, Alessandro, Goldt, Sebastian

The ability of deep neural networks to generalise well even when they interpolate their training data has been explained using various "simplicity biases". These theories postulate that neural networks avoid overfitting by first learning simple functions, say a linear classifier, before learning more complex, non-linear functions. Meanwhile, data structure is also recognised as a key ingredient for good generalisation, yet its role in simplicity biases is not yet understood. Here, we show that neural networks trained using stochastic gradient descent initially classify their inputs using lower-order input statistics, like mean and covariance, and exploit higher-order statistics only later during training. We first demonstrate this distributional simplicity bias (DSB) in a solvable model of a neural network trained on synthetic data. We empirically demonstrate DSB in a range of deep convolutional networks and visual transformers trained on CIFAR10, and show that it even holds in networks pre-trained on ImageNet. We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of Gaussian universality in learning.

artificial intelligence, machine learning, neural network, (18 more...)

2211.11567

Country:

North America > United States (1.00)
Europe (1.00)
North America > Canada (0.68)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

arXiv.org Machine LearningFeb-1-2022

Data-driven emergence of convolutional structure in neural networks

Ingrosso, Alessandro, Goldt, Sebastian

Exploiting data invariances is crucial for efficient learning in both artificial and biological neural circuits. Understanding how neural networks can discover appropriate representations capable of harnessing the underlying symmetries of their inputs is thus crucial in machine learning and neuroscience. Convolutional neural networks, for example, were designed to exploit translation symmetry and their capabilities triggered the first wave of deep learning successes. However, learning convolutions directly from translation-invariant data with a fully-connected network has so far proven elusive. Here, we show how initially fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs, resulting in localised, space-tiling receptive fields. These receptive fields match the filters of a convolutional network trained on the same task. By carefully designing data models for the visual scene, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs, which has long been recognised as the hallmark of natural images. We provide an analytical and numerical characterisation of the pattern-formation mechanism responsible for this phenomenon in a simple model, which results in an unexpected link between receptive field formation and the tensor decomposition of higher-order input correlations. These results provide a new perspective on the development of low-level feature detectors in various sensory modalities, and pave the way for studying the impact of higher-order statistics on learning in neural networks.

artificial intelligence, machine learning, receptive field, (18 more...)

2202.00565

Country:

Europe (0.28)
North America > United States > New York (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJan-24-2022

Input correlations impede suppression of chaos and learning in balanced rate networks

Engelken, Rainer, Ingrosso, Alessandro, Khajeh, Ramin, Goedeke, Sven, Abbott, L. F.

Information encoding and learning in neural circuits depend on how well time-varying stimuli can control spontaneous network activity. We show that in firing-rate networks in the balanced state, external control of recurrent dynamics, i.e., the suppression of internally-generated chaotic variability, strongly depends on correlations in the input. A unique feature of balanced networks is that, because common external input is dynamically canceled by recurrent feedback, it is far easier to suppress chaos with independent inputs into each neuron than through common input. To study this phenomenon we develop a non-stationary dynamic mean-field theory that determines how the activity statistics and largest Lyapunov exponent depend on frequency and amplitude of the input, recurrent coupling strength, and network size, for both common and independent input. We also show that uncorrelated inputs facilitate learning in balanced networks.

artificial intelligence, independent input, machine learning, (17 more...)

doi: 10.1371/journal.pcbi.1010590

2201.09916

Country:

Europe (0.46)
North America > United States > New York (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Communications > Networks (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

arXiv.org Artificial IntelligenceSep-20-2020

Epidemic mitigation by statistical inference from contact tracing data

Baker, Antoine, Biazzo, Indaco, Braunstein, Alfredo, Catania, Giovanni, Dall'Asta, Luca, Ingrosso, Alessandro, Krzakala, Florent, Mazza, Fabio, Mézard, Marc, Muntoni, Anna Paola, Refinetti, Maria, Mannelli, Stefano Sarao, Zdeborová, Lenka

Contact-tracing is an essential tool in order to mitigate the impact of pandemic such as the COVID-19. In order to achieve efficient and scalable contact-tracing in real time, digital devices can play an important role. While a lot of attention has been paid to analyzing the privacy and ethical risks of the associated mobile applications, so far much less research has been devoted to optimizing their performance and assessing their impact on the mitigation of the epidemic. We develop Bayesian inference methods to estimate the risk that an individual is infected. This inference is based on the list of his recent contacts and their own risk levels, as well as personal information such as results of tests or presence of syndromes. We propose to use probabilistic risk estimation in order to optimize testing and quarantining strategies for the control of an epidemic. Our results show that in some range of epidemic spreading (typically when the manual tracing of all contacts of infected people becomes practically impossible, but before the fraction of infected people reaches the scale where a lockdown becomes unavoidable), this inference of individuals at risk could be an efficient way to mitigate the epidemic. Our approaches translate into fully distributed algorithms that only require communication between individuals who have recently been in contact. Such communication may be encrypted and anonymized and thus compatible with privacy preserving standards. We conclude that probabilistic risk estimation is capable to enhance performance of digital contact tracing and should be considered in the currently developed mobile applications. Identifying, calling, testing, and if needed quarantining the recent contacts of an individual who has just been tested positive is the standard route for limiting the transmission of a highly contagious virus.

bayesian inference, epidemic, immunology, (21 more...)

2009.09422

Country: Europe > Italy (0.28)

Genre: Research Report > New Finding (0.86)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
(2 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

arXiv.org Machine LearningOct-6-2016

Unreasonable Effectiveness of Learning Neural Networks: From Accessible States and Robust Ensembles to Basic Algorithmic Schemes

Baldassi, Carlo, Borgs, Christian, Chayes, Jennifer, Ingrosso, Alessandro, Lucibello, Carlo, Saglietti, Luca, Zecchina, Riccardo

In artificial neural networks, learning from data is a computationally demanding task in which a large number of connection weights are iteratively tuned through stochastic-gradient-based heuristic processes over a cost-function. It is not well understood how learning occurs in these systems, in particular how they avoid getting trapped in configurations with poor computational performance. Here we study the difficult case of networks with discrete weights, where the optimization landscape is very rough even for simple architectures, and provide theoretical and numerical evidence of the existence of rare - but extremely dense and accessible - regions of configurations in the network weight space. We define a novel measure, which we call the "robust ensemble" (RE), which suppresses trapping by isolated configurations and amplifies the role of these dense regions. We analytically compute the RE in some exactly solvable models, and also provide a general algorithmic scheme which is straightforward to implement: define a cost-function given by a sum of a finite number of replicas of the original cost-function, with a constraint centering the replicas around a driving assignment. To illustrate this, we derive several powerful new algorithms, ranging from Markov Chains to message passing to gradient descent processes, where the algorithms target the robust dense states, resulting in substantial improvements in performance. The weak dependence on the number of precision bits of the weights leads us to conjecture that very similar reasoning applies to more conventional neural networks. Analogous algorithmic schemes can also be applied to other optimization problems.

algorithm, health & medicine, neural network, (19 more...)

doi: 10.1073/pnas.1608103113

1605.06444

Country:

North America > United States (0.28)
Europe > United Kingdom > England (0.14)

Genre: Research Report (1.00)

Industry:

Energy > Oil & Gas (1.00)
Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)

arXiv.org Machine LearningJun-10-2016

Discovering Neuronal Cell Types and Their Gene Expression Profiles Using a Spatial Point Process Mixture Model

Huang, Furong, Anandkumar, Animashree, Borgs, Christian, Chayes, Jennifer, Fraenkel, Ernest, Hawrylycz, Michael, Lein, Ed, Ingrosso, Alessandro, Turaga, Srinivas

Cataloging the neuronal cell types that comprise circuitry of individual brain regions is a major goal of modern neuroscience and the BRAIN initiative. Single-cell RNA sequencing can now be used to measure the gene expression profiles of individual neurons and to categorize neurons based on their gene expression profiles. While the single-cell techniques are extremely powerful and hold great promise, they are currently still labor intensive, have a high cost per cell, and, most importantly, do not provide information on spatial distribution of cell types in specific regions of the brain. We propose a complementary approach that uses computational methods to infer the cell types and their gene expression profiles through analysis of brain-wide single-cell resolution in situ hybridization (ISH) imagery contained in the Allen Brain Atlas (ABA). We measure the spatial distribution of neurons labeled in the ISH image for each gene and model it as a spatial point process mixture, whose mixture weights are given by the cell types which express that gene. By fitting a point process mixture model jointly to the ISH images, we infer both the spatial point process distribution for each cell type and their gene expression profile. We validate our predictions of cell type-specific gene expression profiles using single cell RNA sequencing data, recently published for the mouse somatosensory cortex. Jointly with the gene expression profiles, cell features such as cell size, orientation, intensity and local density level are inferred per cell type.

cell type, health & medicine, neurology, (19 more...)

1602.01889

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

arXiv.org Machine LearningFeb-25-2016

Local entropy as a measure for sampling solutions in Constraint Satisfaction Problems

Baldassi, Carlo, Ingrosso, Alessandro, Lucibello, Carlo, Saglietti, Luca, Zecchina, Riccardo

We introduce a novel Entropy-driven Monte Carlo (EdMC) strategy to efficiently sample solutions of random Constraint Satisfaction Problems (CSPs). First, we extend a recent result that, using a large-deviation analysis, shows that the geometry of the space of solutions of the Binary Perceptron Learning Problem (a prototypical CSP), contains regions of very high-density of solutions. Despite being sub-dominant, these regions can be found by optimizing a local entropy measure. Building on these results, we construct a fast solver that relies exclusively on a local entropy estimate, and can be applied to general CSPs. We describe its performance not only for the Perceptron Learning Problem but also for the random $K$-Satisfiabilty Problem (another prototypical CSP with a radically different structure), and show numerically that a simple zero-temperature Metropolis search in the smooth local entropy landscape can reach sub-dominant clusters of optimal solutions in a small number of steps, while standard Simulated Annealing either requires extremely long cooling procedures or just fails. We also discuss how the EdMC can heuristically be made even more efficient for the cases we studied.

artificial intelligence, constraint-based reasoning, entropy, (17 more...)

doi: 10.1088/1742-5468/2016/02/023301

1511.05634

Country: Europe > Italy (0.14)

Genre: Research Report (1.00)

Industry:

Energy > Oil & Gas (0.46)
Education > Focused Education > Special Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)