Not enough data to create a plot.
Try a different view from the menu above.
Laio, Alessandro
Optimal transfer protocol by incremental layer defrosting
Gerace, Federica, Doimo, Diego, Mannelli, Stefano Sarao, Saglietti, Luca, Laio, Alessandro
Transfer learning is a powerful tool enabling model training with limited amounts of data. This technique is particularly useful in real-world problems where data availability is often a serious limitation. The simplest transfer learning protocol is based on ``freezing" the feature-extractor layers of a network pre-trained on a data-rich source task, and then adapting only the last layers to a data-poor target task. This workflow is based on the assumption that the feature maps of the pre-trained model are qualitatively similar to the ones that would have been learned with enough data on the target task. In this work, we show that this protocol is often sub-optimal, and the largest performance gain may be achieved when smaller portions of the pre-trained network are kept frozen. In particular, we make use of a controlled framework to identify the optimal transfer depth, which turns out to depend non-trivially on the amount of available training data and on the degree of source-target task correlation. We then characterize transfer optimality by analyzing the internal representations of two networks trained from scratch on the source and the target task through multiple established similarity measures.
Prune and distill: similar reformatting of image information along rat visual cortex and deep neural networks
Muratore, Paolo, Tafazoli, Sina, Piasini, Eugenio, Laio, Alessandro, Zoccolan, Davide
Visual object recognition has been extensively studied in both neuroscience and computer vision. Recently, the most popular class of artificial systems for this task, deep convolutional neural networks (CNNs), has been shown to provide excellent models for its functional analogue in the brain, the ventral stream in visual cortex. This has prompted questions on what, if any, are the common principles underlying the reformatting of visual information as it flows through a CNN or the ventral stream. Here we consider some prominent statistical patterns that are known to exist in the internal representations of either CNNs or the visual cortex and look for them in the other system. We show that intrinsic dimensionality (ID) of object representations along the rat homologue of the ventral stream presents two distinct expansion-contraction phases, as previously shown for CNNs. Conversely, in CNNs, we show that training results in both distillation and active pruning (mirroring the increase in ID) of low- to middle-level image information in single units, as representations gain the ability to support invariant discrimination, in agreement with previous observations in rat visual cortex. Taken together, our findings suggest that CNNs and visual cortex share a similarly tight relationship between dimensionality expansion/reduction of object representations and reformatting of image information.
Representation mitosis in wide neural networks
Doimo, Diego, Glielmo, Aldo, Goldt, Sebastian, Laio, Alessandro
Deep neural networks (DNNs) defy the classical bias-variance trade-off: adding parameters to a DNN that exactly interpolates its training data will typically improve its generalisation performance. Explaining the mechanism behind the benefit of such over-parameterisation is an outstanding challenge for deep learning theory. Here, we study the last layer representation of various deep architectures such as Wide-ResNets for image classification and find evidence for an underlying mechanism that we call *representation mitosis*: if the last hidden representation is wide enough, its neurons tend to split into groups which carry identical information, and differ from each other only by a statistically independent noise. Like in a mitosis process, the number of such groups, or ``clones'', increases linearly with the width of the layer, but only if the width is above a critical value. We show that a key ingredient to activate mitosis is continuing the training process until the training error is zero. Finally, we show that in one of the learning tasks we considered, a wide model with several automatically developed clones performs significantly better than a deep ensemble based on architectures in which the last layer has the same size as the clones.
Ranking the information content of distance measures
Glielmo, Aldo, Zeni, Claudio, Cheng, Bingqing, Csanyi, Gabor, Laio, Alessandro
Real-world data typically contain a large number of features that are often heterogeneous in nature, relevance, and also units of measure. When assessing the similarity between data points, one can build various distance measures using subsets of these features. Using the fewest features but still retaining sufficient information about the system is crucial in many statistical learning approaches, particularly when data are sparse. We introduce a statistical test that can assess the relative information retained when using two different distance measures, and determine if they are equivalent, independent, or if one is more informative than the other. This in turn allows finding the most informative distance measure out of a pool of candidates. The approach is applied to find the most relevant policy variables for controlling the Covid-19 epidemic and to find compact yet informative representations of atomic structures, but its potential applications are wide ranging in many branches of science.
Dynamical Landscape and Multistability of the Earth's Climate
Margazoglou, Georgios, Grafke, Tobias, Laio, Alessandro, Lucarini, Valerio
We apply two independent data analysis methodologies to locate stable climate states in an intermediate complexity climate model. First, drawing from the theory of quasipotentials, and viewing the state space as an energy landscape with valleys and mountain ridges, we infer the relative likelihood of the identified multistable climate states, and investigate the most likely transition trajectories as well as the expected transition times between them. Second, harnessing techniques from data science, specifically manifold learning, we characterize the data landscape of the simulation data to find climate states and basin boundaries within a fully agnostic and unsupervised framework. Both approaches show remarkable agreement, and reveal, apart from the well known warm and snowball earth states, a third intermediate stable state in one of the two climate models we consider. The combination of our approaches allows to identify how the negative feedback of ocean heat transport and entropy production via the hydrological cycle drastically change the topography of the dynamical landscape of Earth's climate.
Hierarchical nucleation in deep neural networks
Doimo, Diego, Glielmo, Aldo, Ansuini, Alessio, Laio, Alessandro
Deep convolutional networks (DCNs) learn meaningful representations where data that share the same abstract characteristics are positioned closer and closer. Understanding these representations and how they are generated is of unquestioned practical and theoretical interest. In this work we study the evolution of the probability density of the ImageNet dataset across the hidden layers in some state-of-the-art DCNs. We find that the initial layers generate a unimodal probability density getting rid of any structure irrelevant for classification. In subsequent layers density peaks arise in a hierarchical fashion that mirrors the semantic hierarchy of the concepts. Density peaks corresponding to single categories appear only close to the output and via a very sharp transition which resembles the nucleation process of a heterogeneous liquid. This process leaves a footprint in the probability density of the output layer where the topography of the peaks allows reconstructing the semantic relationships of the categories.
Intrinsic dimension of data representations in deep neural networks
Ansuini, Alessio, Laio, Alessandro, Macke, Jakob H., Zoccolan, Davide
Deep neural networks progressively transform their inputs across multiple processing layers. What are the geometrical properties of the representations learned by these networks? Here we study the intrinsic dimensionality (ID) of data-representations, i.e. the minimal number of parameters needed to describe a representation. We find that, in a trained network, the ID is orders of magnitude smaller than the number of units in each layer. Across layers, the ID first increases and then progressively decreases in the final layers. Remarkably, the ID of the last hidden layer predicts classification accuracy on the test set. These results can neither be found by linear dimensionality estimates (e.g., with principal component analysis), nor in representations that had been artificially linearized. They are neither found in untrained networks, nor in networks that are trained on randomized labels. This suggests that neural networks that can generalize are those that transform the data into low-dimensional, but not necessarily flat manifolds.
Clustering by the local intrinsic dimension: the hidden structure of real-world data
Allegra, Michele, Facco, Elena, Laio, Alessandro, Mira, Antonietta
It is well known that a small number of variables is often sufficient to effectively describe high-dimensional data. This number is called the intrinsic dimension (ID) of the data. What is not so commonly known is that the ID can vary within the same dataset. This fact has been highlighted in technical discussions, but seldom exploited to gain practical insight in the data structure. Here we develop a simple and robust approach to cluster regions with the same local ID in a given data landscape. Surprisingly, we find that many real-world data sets contain regions with widely heterogeneous dimensions. These regions host points differing in core properties: folded vs unfolded configurations in a protein molecular dynamics trajectory, active vs non-active regions in brain imaging data, and firms with different financial risk in company balance sheets. Our results show that a simple topological feature, the local ID, is sufficient to uncover a rich structure in high-dimensional data landscapes. Introduction From string theory to science fiction, the idea that we might be glued onto a lowdimensional surface embedded in a space of large dimensionality has tickled the speculations of scientists and writers alike. When it comes to multidimensional data, however, such situation is quite common rather than a wild speculation: data often concentrate on hypersurfaces of low intrinsic dimension (ID).
Estimating the intrinsic dimension of datasets by a minimal neighborhood information
Facco, Elena, d'Errico, Maria, Rodriguez, Alex, Laio, Alessandro
Analyzing large volumes of high-dimensional data is an issue of fundamental importance in data science, molecular simulations and beyond. Several approaches work on the assumption that the important content of a dataset belongs to a manifold whose Intrinsic Dimension (ID) is much lower than the crude large number of coordinates. Such manifold is generally twisted and curved, in addition points on it will be non-uniformly distributed: two factors that make the identification of the ID and its exploitation really hard. Here we propose a new ID estimator using only the distance of the first and the second nearest neighbor of each point in the sample. This extreme minimality enables us to reduce the effects of curvature, of density variation, and the resulting computational cost. The ID estimator is theoretically exact in uniformly distributed datasets, and provides consistent measures in general. When used in combination with block analysis, it allows discriminating the relevant dimensions as a function of the block size. This allows estimating the ID even when the data lie on a manifold perturbed by a high-dimensional noise, a situation often encountered in real world data sets. We demonstrate the usefulness of the approach on molecular simulations and image analysis.
Automatic topography of high-dimensional data sets by non-parametric Density Peak clustering
d'Errico, Maria, Facco, Elena, Laio, Alessandro, Rodriguez, Alex
Data analysis in high-dimensional spaces aims at obtaining a synthetic description of a data set, revealing its main structure and its salient features. We here introduce an approach for charting data spaces, providing a topography of the probability distribution from which the data are harvested. This topography includes information on the number and the height of the probability peaks, the depth of the "valleys" separating them, the relative location of the peaks and their hierarchical organization. The topography is reconstructed by using an unsupervised variant of Density Peak clustering exploiting a non-parametric density estimator, which automatically measures the density in the manifold containing the data. Importantly, the density estimator provides an estimate of the error. This is a key feature, which allows distinguishing genuine probability peaks from density fluctuations due to finite sampling.