Not enough data to create a plot.
Try a different view from the menu above.
Tudisco, Francesco
Approximation properties of neural ODEs
De Marinis, Arturo, Murari, Davide, Celledoni, Elena, Guglielmi, Nicola, Owren, Brynjulf, Tudisco, Francesco
We study the approximation properties of shallow neural networks whose activation function is defined as the flow of a neural ordinary differential equation (neural ODE) at the final time of the integration interval. We prove the universal approximation property (UAP) of such shallow neural networks in the space of continuous functions. Furthermore, we investigate the approximation properties of shallow neural networks whose parameters are required to satisfy some constraints. In particular, we constrain the Lipschitz constant of the flow of the neural ODE to increase the stability of the shallow neural network, and we restrict the norm of the weight matrices of the linear layers to one to make sure that the restricted expansivity of the flow is not compensated by the increased expansivity of the linear layers. For this setting, we prove approximation bounds that tell us the accuracy to which we can approximate a continuous function with a shallow neural network with such constraints. We prove that the UAP holds if we consider only the constraint on the Lipschitz constant of the flow or the unit norm constraint on the weight matrices of the linear layers.
Rethinking Oversmoothing in Graph Neural Networks: A Rank-Based Perspective
Zhang, Kaicheng, Deidda, Piero, Higham, Desmond, Tudisco, Francesco
Oversmoothing is a fundamental challenge in graph neural networks (GNNs): as the number of layers increases, node embeddings become increasingly similar, and model performance drops sharply. Traditionally, oversmoothing has been quantified using metrics that measure the similarity of neighbouring node features, such as the Dirichlet energy. While these metrics are related to oversmoothing, we argue they have critical limitations and fail to reliably capture oversmoothing in realistic scenarios. For instance, they provide meaningful insights only for very deep networks and under somewhat strict conditions on the norm of network weights and feature representations. As an alternative, we propose measuring oversmoothing by examining the numerical or effective rank of the feature representations. We provide theoretical support for this approach, demonstrating that the numerical rank of feature representations converges to one for a broad family of nonlinear activation functions under the assumption of nonnegative trained weights. To the best of our knowledge, this is the first result that proves the occurrence of oversmoothing without assumptions on the boundedness of the weight matrices. Along with the theoretical findings, we provide extensive numerical evaluation across diverse graph architectures. Our results show that rank-based metrics consistently capture oversmoothing, whereas energy-based metrics often fail. Notably, we reveal that a significant drop in the rank aligns closely with performance degradation, even in scenarios where energy metrics remain unchanged.
Efficient Sparsification of Simplicial Complexes via Local Densities of States
Savostianov, Anton, Schaub, Michael T., Guglielmi, Nicola, Tudisco, Francesco
Simplicial complexes (SCs), a generalization of graph models for relational data that account for higher-order relations between data items, have become a popular abstraction for analyzing complex data using tools from topological data analysis or topological signal processing. However, the analysis of many real-world datasets leads to dense SCs with a large number of higher-order interactions. Unfortunately, analyzing such large SCs often has a prohibitive cost in terms of computation time and memory consumption. The sparsification of such complexes, i.e., the approximation of an original SC with a sparser simplicial complex with only a log-linear number of high-order simplices while maintaining a spectrum close to the original SC, is of broad interest. In this work, we develop a novel method for a probabilistic sparsifaction of SCs. At its core lies the efficient computation of sparsifying sampling probability through local densities of states as functional descriptors of the spectral information. To avoid pathological structures in the spectrum of the corresponding Hodge Laplacian operators, we suggest a "kernel-ignoring" decomposition for approximating the sampling probability; additionally, we exploit error estimates to show asymptotically prevailing algorithmic complexity of the developed method. The performance of the framework is demonstrated on the family of Vietoris--Rips filtered simplicial complexes.
Solaris: A Foundation Model of the Sun
Majid, Harris Abdul, Sittoni, Pietro, Tudisco, Francesco
Foundation models have demonstrated remarkable success across various scientific domains, motivating our exploration of their potential in solar physics. In this paper, we present Solaris, the first foundation model for forecasting the Sun's atmosphere. We leverage 13 years of full-disk, multi-wavelength solar imagery from the Solar Dynamics Observatory, spanning a complete solar cycle, to pre-train Solaris for 12-hour interval forecasting. Solaris is built on a large-scale 3D Swin Transformer architecture with 109 million parameters. We demonstrate Solaris' ability to generalize by fine-tuning on a low-data regime using a single wavelength (1700 {\AA}), that was not included in pre-training, outperforming models trained from scratch on this specific wavelength. Our results indicate that Solaris can effectively capture the complex dynamics of the solar atmosphere and transform solar forecasting.
GeoLoRA: Geometric integration for parameter efficient fine-tuning
Schotthรถfer, Steffen, Zangrando, Emanuele, Ceruti, Gianluca, Tudisco, Francesco, Kusch, Jonas
Low-Rank Adaptation (LoRA) has become a widely used method for parameter-efficient fine-tuning of large-scale, pre-trained neural networks. However, LoRA and its extensions face several challenges, including the need for rank adaptivity, robustness, and computational efficiency during the fine-tuning process. We introduce GeoLoRA, a novel approach that addresses these limitations by leveraging dynamical low-rank approximation theory. GeoLoRA requires only a single backpropagation pass over the small-rank adapters, significantly reducing computational cost as compared to similar dynamical low-rank training methods and making it faster than popular baselines such as AdaLoRA. This allows GeoLoRA to efficiently adapt the allocated parameter budget across the model, achieving smaller low-rank adapters compared to heuristic methods like AdaLoRA and LoRA, while maintaining critical convergence, descent, and error-bound theoretical guarantees. The resulting method is not only more efficient but also more robust to varying hyperparameter settings. We demonstrate the effectiveness of GeoLoRA on several state-of-the-art benchmarks, showing that it outperforms existing methods in both accuracy and computational efficiency.
Low-Rank Adversarial PGD Attack
Savostianova, Dayana, Zangrando, Emanuele, Tudisco, Francesco
Adversarial attacks, characterized by subtle data perturbations that destabilize neural network predictions, have been a topic of significant interest for over a decade [48, 16, 32, 5]. These attacks have evolved into various forms, depending on the knowledge of the model's architecture (white-box, gray-box, black-box) [49], the type of data being targeted (graphs, images, text, etc.) [12, 47, 16, 57], and the specific adversarial objectives (targeted, untargeted, defense-oriented) [55, 29]. While numerous defense strategies aim to broadly stabilize models against adversarial attacks, independent of the specific attack mechanism [7, 14, 15, 41], the most effective and widely-used defenses focus on adversarial training, where the model is trained to withstand particular attacks [29, 50]. Adversarial training is known for producing robust models efficiently, but its effectiveness hinges on the availability of adversarial attacks that are both potent in degrading model accuracy and efficient in terms of computational resources. However, the most aggressive attacks often require significant computational resources, making them less practical for adversarial training. The projected gradient descent (PGD) attack [29] is popular in adversarial training due to its balance between aggressiveness and computational efficiency. In this work, we observe that in many cases the perturbations generated by PGD predominantly affect the lower part of the singular value spectrum of input images, indicating that these perturbations are approximately low-rank. Additionally, we find that the size of PGD-generated attacks differs significantly between standard and adversarially trained models when measured by their nuclear norm, which sums the singular values of the attack. This metric provides insight into the frequency profile of the attack when analyzed using the singular value decomposition (SVD) transform, aligning with known frequency profiles observed under discrete Fourier transform (DFT) and discrete cosine transform (DCT) analyses of PGD attacks [54, 31].
Subhomogeneous Deep Equilibrium Models
Sittoni, Pietro, Tudisco, Francesco
Implicit-depth neural networks have grown as powerful alternatives to traditional networks in various applications in recent years. However, these models often lack guarantees of existence and uniqueness, raising stability, performance, and reproducibility issues. In this paper, we present a new analysis of the existence and uniqueness of fixed points for implicit-depth neural networks based on the concept of subhomogeneous operators and the nonlinear Perron-Frobenius theory. Compared to previous similar analyses, our theory allows for weaker assumptions on the parameter matrices, thus yielding a more flexible framework for well-defined implicit networks. We illustrate the performance of the resulting subhomogeneous networks on feedforward, convolutional, and graph neural network examples.
Neural Rank Collapse: Weight Decay and Small Within-Class Variability Yield Low-Rank Bias
Zangrando, Emanuele, Deidda, Piero, Brugiapaglia, Simone, Guglielmi, Nicola, Tudisco, Francesco
Recent work in deep learning has shown strong empirical and theoretical evidence of an implicit low-rank bias: weight matrices in deep networks tend to be approximately low-rank and removing relatively small singular values during training or from available trained models may significantly reduce model size while maintaining or even improving model performance. However, the majority of the theoretical investigations around low-rank bias in neural networks deal with oversimplified deep linear networks. In this work, we consider general networks with nonlinear activations and the weight decay parameter, and we show the presence of an intriguing neural rank collapse phenomenon, connecting the low-rank bias of trained networks with networks' neural collapse properties: as the weight decay parameter grows, the rank of each layer in the network decreases proportionally to the within-class variability of the hidden-space embeddings of the previous layers. Our theoretical findings are supported by a range of experimental evaluations illustrating the phenomenon.
Laplacian-based Semi-Supervised Learning in Multilayer Hypergraphs by Coordinate Descent
Venturini, Sara, Cristofari, Andrea, Rinaldi, Francesco, Tudisco, Francesco
Graph Semi-Supervised learning is an important data analysis tool, where given a graph and a set of labeled nodes, the aim is to infer the labels to the remaining unlabeled nodes. In this paper, we start by considering an optimization-based formulation of the problem for an undirected graph, and then we extend this formulation to multilayer hypergraphs. We solve the problem using different coordinate descent approaches and compare the results with the ones obtained by the classic gradient descent method. Experiments on synthetic and real-world datasets show the potential of using coordinate descent methods with suitable selection rules.
Robust low-rank training via approximate orthonormal constraints
Savostianova, Dayana, Zangrando, Emanuele, Ceruti, Gianluca, Tudisco, Francesco
With the growth of model and data sizes, a broad effort has been made to design pruning techniques that reduce the resource demand of deep learning pipelines, while retaining model performance. In order to reduce both inference and training costs, a prominent line of work uses low-rank matrix factorizations to represent the network weights. Although able to retain accuracy, we observe that low-rank methods tend to compromise model robustness against adversarial perturbations. By modeling robustness in terms of the condition number of the neural network, we argue that this loss of robustness is due to the exploding singular values of the low-rank weight matrices. Thus, we introduce a robust low-rank training algorithm that maintains the network's weights on the low-rank matrix manifold while simultaneously enforcing approximate orthonormal constraints. The resulting model reduces both training and inference costs while ensuring well-conditioning and thus better adversarial robustness, without compromising model accuracy. This is shown by extensive numerical evidence and by our main approximation theorem that shows the computed robust low-rank network well-approximates the ideal full model, provided a highly performing low-rank sub-network exists.