AITopics | typical solution

Collaborating Authors

typical solution

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

High-dimensional manifold of solutions in neural networks: insights from statistical physics

Malatesta, Enrico M.

arXiv.org Artificial IntelligenceSep-17-2023

In these pedagogic notes I review the statistical mechanics approach to neural networks, focusing on the paradigmatic example of the perceptron architecture with binary an continuous weights, in the classification setting. I will review the Gardner's approach based on replica method and the derivation of the SAT/UNSAT transition in the storage setting. Then, I discuss some recent works that unveiled how the zero training error configurations are geometrically arranged, and how this arrangement changes as the size of the training set increases. I also illustrate how different regions of solution space can be explored analytically and how the landscape in the vicinity of a solution can be characterized. I give evidence how, in binary weight models, algorithmic hardness is a consequence of the disappearance of a clustered region of solutions that extends to very large distances. Finally, I demonstrate how the study of linear mode connectivity between solutions can give insights into the average shape of the solution manifold.

entropy, local entropy, perceptron, (17 more...)

arXiv.org Artificial Intelligence

2309.0924

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Lombardy > Milan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Add feedback

The star-shaped space of solutions of the spherical negative perceptron

Annesi, Brandon Livio, Lauditi, Clarissa, Lucibello, Carlo, Malatesta, Enrico M., Perugini, Gabriele, Pittorino, Fabrizio, Saglietti, Luca

arXiv.org Artificial IntelligenceSep-5-2023

Empirical studies on the landscape of neural networks have shown that low-energy configurations are often found in complex connected structures, where zero-energy paths between pairs of distant solutions can be constructed. Here we consider the spherical negative perceptron, a prototypical non-convex neural network model framed as a continuous constraint satisfaction problem. We introduce a general analytical method for computing energy barriers in the simplex with vertex configurations sampled from the equilibrium. We find that in the over-parameterized regime the solution manifold displays simple connectivity properties. There exists a large geodesically convex component that is attractive for a wide range of optimization dynamics. Inside this region we identify a subset of atypical high-margin solutions that are geodesically connected with most other solutions, giving rise to a star-shaped geometry. We analytically characterize the organization of the connected space of solutions and show numerical evidence of a transition, at larger constraint densities, where the aforementioned simple geodesic connectivity breaks down.

energy barrier, training error, typical solution, (16 more...)

arXiv.org Artificial Intelligence

2305.10623

Country:

Europe > Italy > Lombardy > Milan (0.04)
Europe > Italy > Piedmont > Turin Province > Turin (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.71)

Add feedback

Typical and atypical solutions in non-convex neural networks with discrete and continuous weights

Baldassi, Carlo, Malatesta, Enrico M., Perugini, Gabriele, Zecchina, Riccardo

arXiv.org Artificial IntelligenceJul-24-2023

We study the binary and continuous negative-margin perceptrons as simple non-convex neural network models learning random rules and associations. We analyze the geometry of the landscape of solutions in both models and find important similarities and differences. Both models exhibit subdominant minimizers which are extremely flat and wide. These minimizers coexist with a background of dominant solutions which are composed by an exponential number of algorithmically inaccessible small clusters for the binary case (the frozen 1-RSB phase) or a hierarchical structure of clusters of different sizes for the spherical case (the full RSB phase). In both cases, when a certain threshold in constraint density is crossed, the local entropy of the wide flat minima becomes non-monotonic, indicating a break-up of the space of robust solutions into disconnected components. This has a strong impact on the behavior of algorithms in binary models, which cannot access the remaining isolated clusters. For the spherical case the behaviour is different, since even beyond the disappearance of the wide flat minima the remaining solutions are shown to always be surrounded by a large number of other solutions at any distance, up to capacity. Indeed, we exhibit numerical evidence that algorithms seem to find solutions up to the SAT/UNSAT transition, that we compute here using an 1RSB approximation. For both models, the generalization performance as a learning device is shown to be greatly improved by the existence of wide flat minimizers even when trained in the highly underconstrained regime of very negative margins.

configuration, entropy, transition, (15 more...)

arXiv.org Artificial Intelligence

2304.13871

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Lombardy > Milan (0.04)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Genre: Research Report (0.64)

Industry: Energy (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.36)

Add feedback

Learning through atypical ''phase transitions'' in overparameterized neural networks

Baldassi, Carlo, Lauditi, Clarissa, Malatesta, Enrico M., Pacelli, Rosalba, Perugini, Gabriele, Zecchina, Riccardo

arXiv.org Machine LearningOct-1-2021

Current deep neural networks are highly overparameterized (up to billions of connection weights) and nonlinear. Yet they can fit data almost perfectly through variants of gradient descent algorithms and achieve unexpected levels of prediction accuracy without overfitting. These are formidable results that escape the bias-variance predictions of statistical learning and pose conceptual challenges for non-convex optimization. In this paper, we use methods from statistical physics of disordered systems to analytically study the computational fallout of overparameterization in nonconvex neural network models. As the number of connection weights increases, we follow the changes of the geometrical structure of different minima of the error loss function and relate them to learning and generalisation performance. We find that there exist a gap between the SAT/UNSAT interpolation transition where solutions begin to exist and the point where algorithms start to find solutions, i.e. where accessible solutions appear. This second phase transition coincides with the discontinuous appearance of atypical solutions that are locally extremely entropic, i.e., flat regions of the weight space that are particularly solution-dense and have good generalization properties. Although exponentially rare compared to typical solutions (which are narrower and extremely difficult to sample), entropic solutions are accessible to the algorithms used in learning. We can characterize the generalization error of different solutions and optimize the Bayesian prediction, for data generated from a structurally different network. Numerical tests on observables suggested by the theory confirm that the scenario extends to realistic deep networks.

algorithm, generalization error, neural network, (17 more...)

arXiv.org Machine Learning

2110.00683

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Piedmont > Turin Province > Turin (0.04)
Europe > Italy > Lombardy > Milan (0.04)

Genre: Research Report (0.64)

Industry:

Energy (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.67)

Add feedback

On the geometry of solutions and on the capacity of multi-layer neural networks with ReLU activations

Baldassi, Carlo, Malatesta, Enrico M., Zecchina, Riccardo

arXiv.org Machine LearningJul-17-2019

Rectified Linear Units (ReLU) have become the main model for the neural units in current deep learning systems. This choice has been originally suggested as a way to compensate for the so called vanishing gradient problem which can undercut stochastic gradient descent (SGD) learning in networks composed of multiple layers. Here we provide analytical results on the effects of ReLUs on the capacity and on the geometrical landscape of the solution space in two-layer neural networks with either binary or real-valued weights. We study the problem of storing an extensive number of random patterns and find that, quite unexpectedly, the capacity of the network remains finite as the number of neurons in the hidden layer increases, at odds with the case of threshold units in which the capacity diverges. Possibly more important, a large deviation approach allows us to find that the geometrical landscape of the solution space has a peculiar structure: while the majority of solutions are close in distance but still isolated, there exist rare regions of solutions which are much more dense than the similar ones in the case of threshold units. These solutions are robust to perturbations of the weights and can tolerate large perturbations of the inputs. The analytical results are corroborated by numerical findings.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Machine Learning

1907.07578

Country:

North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Italy > Lombardy > Milan (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

The backtracking survey propagation algorithm for solving random K-SAT problems

Marino, Raffaele, Parisi, Giorgio, Ricci-Tersenghi, Federico

arXiv.org Artificial IntelligenceOct-6-2016

Discrete combinatorial optimization plays a central role in many scientific disciplines, however for hard problems we lack linear time algorithms that would allow us to solve very large instances. Moreover it is still unclear what are the key features that make a discrete combinatorial optimization problem hard to solve. Here we study random K-satisfiability problems with K 3, 4 which are known to be very hard close to the SAT-UNSAT threshold, where problems stop having solutions. We show that the Backtracking Survey Propagation algorithm, in a time practically linear in the problem size, is able to find solutions very close to the threshold, in a region unreachable by any other algorithm. All solutions found have no frozen variables, thus supporting the conjecture that only unfrozen solutions can be found in linear time, and that a problem becomes impossibile to solve in linear time when all solutions contain frozen variables. Optimization problems with discrete variables are widespread among scientific disciplines and often among the hardest to solve.

algorithm, artificial intelligence, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1038/ncomms12996

1508.05117

Country:

Europe (0.68)
North America > United States (0.68)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

Subdominant Dense Clusters Allow for Simple Learning and High Computational Performance in Neural Networks with Discrete Synapses

Baldassi, Carlo, Ingrosso, Alessandro, Lucibello, Carlo, Saglietti, Luca, Zecchina, Riccardo

arXiv.org Machine LearningSep-18-2015

We show that discrete synaptic weights can be efficiently used for learning in large scale neural systems, and lead to unanticipated computational performance. We focus on the representative case of learning random patterns with binary synapses in single layer networks. The standard statistical analysis shows that this problem is exponentially dominated by isolated solutions that are extremely hard to find algorithmically. Here, we introduce a novel method that allows us to find analytical evidence for the existence of subdominant and extremely dense regions of solutions. Numerical experiments confirm these findings. We also show that the dense regions are surprisingly accessible by simple learning protocols, and that these synaptic configurations are robust to perturbations and generalize better than typical solutions. These outcomes extend to synapses with multiple states and to deeper neural architectures. The large deviation measure also suggests how to design novel algorithmic schemes for optimization based on local entropy maximization.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

doi: 10.1103/PhysRevLett.115.128101

1509.05753

Country:

Europe > United Kingdom > England (0.46)
North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Training Multilayer Perceptrons with the Extended Kalman Algorithm

Singhal, Sharad, Wu, Lance

Neural Information Processing SystemsDec-31-1989

A large fraction of recent work in artificial neural nets uses multilayer perceptrons trained with the back-propagation algorithm described by Rumelhart et.

artificial intelligence, kalman algorithm, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States > California > San Diego County > San Diego (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)

Add feedback

Training Multilayer Perceptrons with the Extended Kalman Algorithm

Singhal, Sharad, Wu, Lance

Neural Information Processing SystemsDec-31-1989

A large fraction of recent work in artificial neural nets uses multilayer perceptrons trained with the back-propagation algorithm described by Rumelhart et.

algorithm, iteration, kalman algorithm, (12 more...)

Neural Information Processing Systems

Country: North America > United States > California > San Diego County > San Diego (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)

Add feedback

Training Multilayer Perceptrons with the Extended Kalman Algorithm

Singhal, Sharad, Wu, Lance

Neural Information Processing SystemsDec-31-1989

A large fraction of recent work in artificial neural nets uses multilayer perceptrons trained with the back-propagation algorithm described by Rumelhart et.

artificial intelligence, kalman algorithm, machine learning, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)

Add feedback