AITopics

2502.16292

Country:

North America > United States (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.86)

arXiv.org Artificial IntelligenceJul-24-2023

Typical and atypical solutions in non-convex neural networks with discrete and continuous weights

Baldassi, Carlo, Malatesta, Enrico M., Perugini, Gabriele, Zecchina, Riccardo

We study the binary and continuous negative-margin perceptrons as simple non-convex neural network models learning random rules and associations. We analyze the geometry of the landscape of solutions in both models and find important similarities and differences. Both models exhibit subdominant minimizers which are extremely flat and wide. These minimizers coexist with a background of dominant solutions which are composed by an exponential number of algorithmically inaccessible small clusters for the binary case (the frozen 1-RSB phase) or a hierarchical structure of clusters of different sizes for the spherical case (the full RSB phase). In both cases, when a certain threshold in constraint density is crossed, the local entropy of the wide flat minima becomes non-monotonic, indicating a break-up of the space of robust solutions into disconnected components. This has a strong impact on the behavior of algorithms in binary models, which cannot access the remaining isolated clusters. For the spherical case the behaviour is different, since even beyond the disappearance of the wide flat minima the remaining solutions are shown to always be surrounded by a large number of other solutions at any distance, up to capacity. Indeed, we exhibit numerical evidence that algorithms seem to find solutions up to the SAT/UNSAT transition, that we compute here using an 1RSB approximation. For both models, the generalization performance as a learning device is shown to be greatly improved by the existence of wide flat minimizers even when trained in the highly underconstrained regime of very negative margins.

artificial intelligence, entropy, machine learning, (17 more...)

2304.13871

Country: Europe > Italy (0.14)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.36)

arXiv.org Artificial IntelligenceJun-16-2022

Deep Networks on Toroids: Removing Symmetries Reveals the Structure of Flat Regions in the Landscape Geometry

Pittorino, Fabrizio, Ferraro, Antonio, Perugini, Gabriele, Feinauer, Christoph, Baldassi, Carlo, Zecchina, Riccardo

We systematize the approach to the investigation of deep neural network landscapes by basing it on the geometry of the space of implemented functions rather than the space of parameters. Grouping classifiers into equivalence classes, we develop a standardized parameterization in which all symmetries are removed, resulting in a toroidal topology. On this space, we explore the error landscape rather than the loss. This lets us derive a meaningful notion of the flatness of minimizers and of the geodesic paths connecting them. Using different optimization algorithms that sample minimizers with different flatness we study the mode connectivity and relative distances. Testing a variety of state-of-the-art architectures and benchmark datasets, we confirm the correlation between flatness and generalization performance; we further show that in function space flatter minima are closer to each other and that the barriers along the geodesics connecting them are small. We also find that minimizers found by variants of gradient descent can be connected by zero-error paths composed of two straight lines in parameter space, i.e. polygonal chains with a single bend. We observe similar qualitative results in neural networks with binary weights and activations, providing one of the first results concerning the connectivity in this setting. Our results hinge on symmetry removal, and are in remarkable agreement with the rich phenomenology described by some recent analytical studies performed on simple shallow models.

artificial intelligence, machine learning, removing symmetry reveal, (4 more...)

doi: 10.1088/1742-5468/ac9832

2202.03038

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.73)

arXiv.org Machine LearningFeb-8-2022

Systematically improving existing k-means initialization algorithms at nearly no cost, by pairwise-nearest-neighbor smoothing

Baldassi, Carlo

We present a meta-method for initializing (seeding) the $k$-means clustering algorithm called PNN-smoothing. It consists in splitting a given dataset into $J$ random subsets, clustering each of them individually, and merging the resulting clusterings with the pairwise-nearest-neighbor (PNN) method. It is a meta-method in the sense that when clustering the individual subsets any seeding algorithm can be used. If the computational complexity of that seeding algorithm is linear in the size of the data $N$ and the number of clusters $k$, PNN-smoothing is also almost linear with an appropriate choice of $J$, and in fact only at most a few percent slower in most cases in practice. We show empirically, using several existing seeding methods and testing on several synthetic and real datasets, that this procedure results in systematically better costs. It can even be applied recursively, and easily parallelized. Our implementation is publicly available at https://github.com/carlobaldassi/KMeansPNNSmoothing.jl

algorithm, artificial intelligence, machine learning, (19 more...)

2202.03949

Country: North America > United States > California (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

arXiv.org Machine LearningOct-1-2021

Learning through atypical ''phase transitions'' in overparameterized neural networks

Baldassi, Carlo, Lauditi, Clarissa, Malatesta, Enrico M., Pacelli, Rosalba, Perugini, Gabriele, Zecchina, Riccardo

Current deep neural networks are highly overparameterized (up to billions of connection weights) and nonlinear. Yet they can fit data almost perfectly through variants of gradient descent algorithms and achieve unexpected levels of prediction accuracy without overfitting. These are formidable results that escape the bias-variance predictions of statistical learning and pose conceptual challenges for non-convex optimization. In this paper, we use methods from statistical physics of disordered systems to analytically study the computational fallout of overparameterization in nonconvex neural network models. As the number of connection weights increases, we follow the changes of the geometrical structure of different minima of the error loss function and relate them to learning and generalisation performance. We find that there exist a gap between the SAT/UNSAT interpolation transition where solutions begin to exist and the point where algorithms start to find solutions, i.e. where accessible solutions appear. This second phase transition coincides with the discontinuous appearance of atypical solutions that are locally extremely entropic, i.e., flat regions of the weight space that are particularly solution-dense and have good generalization properties. Although exponentially rare compared to typical solutions (which are narrower and extremely difficult to sample), entropic solutions are accessible to the algorithms used in learning. We can characterize the generalization error of different solutions and optimize the Bayesian prediction, for data generated from a structurally different network. Numerical tests on observables suggested by the theory confirm that the scenario extends to realistic deep networks.

artificial intelligence, machine learning, neural network, (20 more...)

2110.00683

Country: Europe > Italy (0.28)

Genre: Research Report (0.64)

Industry:

Energy > Oil & Gas (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.67)

arXiv.org Machine LearningOct-7-2020

Entropic gradient descent algorithms and wide flat minima

Pittorino, Fabrizio, Lucibello, Carlo, Feinauer, Christoph, Perugini, Gabriele, Baldassi, Carlo, Demyanenko, Elizaveta, Zecchina, Riccardo

The properties of flat minima in the empirical risk landscape of neural networks have been debated for some time. Increasing evidence suggests they possess better generalization capabilities with respect to sharp ones. First, we discuss Gaussian mixture classification models and show analytically that there exist Bayes optimal pointwise estimators which correspond to minimizers belonging to wide flat regions. These estimators can be found by applying maximum flatness algorithms either directly on the classifier (which is norm independent) or on the differentiable loss function used in learning. Next, we extend the analysis to the deep learning scenario by extensive numerical validations. Using two algorithms, Entropy-SGD and Replicated-SGD, that explicitly include in the optimization objective a non-local flatness measure known as local entropy, we consistently improve the generalization error for common architectures (e.g. ResNet, EfficientNet). An easy to compute flatness measure shows a clear correlation with test accuracy.

algorithm, deep learning, neural network, (17 more...)

2006.07897

Country:

North America > United States (0.28)
Europe > Italy (0.28)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

arXiv.org Artificial IntelligenceAug-1-2020

Ergodic Annealing

Baldassi, Carlo, Maccheroni, Fabio, Marinacci, Massimo, Pirazzini, Marco

The recent years and events lead to a massive development of content-oriented cloud services. The most popular and voluminous content o¤ered in today's networks are videos that must be e¢ ciently delivered to end customers. The objective of the service provider (root) is to optimize the delivery of content to its costumers (terminals). In this optimization problem the cost is usually assumed to be known (left graph). Yet, in reality it is often unknown because it depends on many stochastic factors, such as the tra¢ c on the network, the level of demand, and so on (right graph). Figure 1: Graphical representation of networks where information travels from a root to a set of terminals over channels with known or unknown cost.

annealing, artificial intelligence, optimization problem, (17 more...)

2008.00234

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.35)

arXiv.org Artificial IntelligenceMay-16-2020

Multialternative Neural Decision Processes

Baldassi, Carlo, Cerreia-Vioglio, Simone, Maccheroni, Fabio, Marinacci, Massimo, Pirazzini, Marco

We introduce an algorithmic decision process for multialternative choice that combines binary comparisons and Markovian exploration. We show that a functional property, transitivity, makes it testable.

artificial intelligence, decision support system, transitivity, (14 more...)

2005.01081

Genre: Research Report (0.64)

Technology:

Information Technology > Decision Support Systems (0.71)
Information Technology > Artificial Intelligence (0.69)

arXiv.org Machine LearningSep-29-2019

Natural representation of composite data with replicated autoencoders

Negri, Matteo, Bergamini, Davide, Baldassi, Carlo, Zecchina, Riccardo, Feinauer, Christoph

Generative processes in biology and other fields often produce data that can be regarded as resulting from a composition of basic features. Here we present an unsupervised method based on autoencoders for inferring these basic features of data. The main novelty in our approach is that the training is based on the optimization of the `local entropy' rather than the standard loss, resulting in a more robust inference, and enhancing the performance on this type of data considerably. Algorithmically, this is realized by training an interacting system of replicated autoencoders. We apply this method to synthetic and protein sequence data, and show that it is able to infer a hidden representation that correlates well with the underlying generative process, without requiring any prior knowledge.

neural network, neurology, representation, (22 more...)

1909.13327

Country: Europe > Italy (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningJul-17-2019

On the geometry of solutions and on the capacity of multi-layer neural networks with ReLU activations

Baldassi, Carlo, Malatesta, Enrico M., Zecchina, Riccardo

Rectified Linear Units (ReLU) have become the main model for the neural units in current deep learning systems. This choice has been originally suggested as a way to compensate for the so called vanishing gradient problem which can undercut stochastic gradient descent (SGD) learning in networks composed of multiple layers. Here we provide analytical results on the effects of ReLUs on the capacity and on the geometrical landscape of the solution space in two-layer neural networks with either binary or real-valued weights. We study the problem of storing an extensive number of random patterns and find that, quite unexpectedly, the capacity of the network remains finite as the number of neurons in the hidden layer increases, at odds with the case of threshold units in which the capacity diverges. Possibly more important, a large deviation approach allows us to find that the geometrical landscape of the solution space has a peculiar structure: while the majority of solutions are close in distance but still isolated, there exist rare regions of solutions which are much more dense than the similar ones in the case of threshold units. These solutions are robust to perturbations of the weights and can tolerate large perturbations of the inputs. The analytical results are corroborated by numerical findings.

activation, deep learning, neural network, (20 more...)

1907.07578

Country:

North America > United States (0.14)
Europe > Italy (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)