AITopics | Tudisco, Francesco

Plotting

Tudisco, Francesco

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Approximation properties of neural ODEs

De Marinis, Arturo, Murari, Davide, Celledoni, Elena, Guglielmi, Nicola, Owren, Brynjulf, Tudisco, Francesco

arXiv.org Artificial IntelligenceMar-19-2025

We study the approximation properties of shallow neural networks whose activation function is defined as the flow of a neural ordinary differential equation (neural ODE) at the final time of the integration interval. We prove the universal approximation property (UAP) of such shallow neural networks in the space of continuous functions. Furthermore, we investigate the approximation properties of shallow neural networks whose parameters are required to satisfy some constraints. In particular, we constrain the Lipschitz constant of the flow of the neural ODE to increase the stability of the shallow neural network, and we restrict the norm of the weight matrices of the linear layers to one to make sure that the restricted expansivity of the flow is not compensated by the increased expansivity of the linear layers. For this setting, we prove approximation bounds that tell us the accuracy to which we can approximate a continuous function with a shallow neural network with such constraints. We prove that the UAP holds if we consider only the constraint on the Lipschitz constant of the flow or the unit norm constraint on the weight matrices of the linear layers.

activation function, neural network, shallow neural network, (12 more...)

arXiv.org Artificial Intelligence

2503.15696

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Rethinking Oversmoothing in Graph Neural Networks: A Rank-Based Perspective

Zhang, Kaicheng, Deidda, Piero, Higham, Desmond, Tudisco, Francesco

arXiv.org Machine LearningFeb-16-2025

Oversmoothing is a fundamental challenge in graph neural networks (GNNs): as the number of layers increases, node embeddings become increasingly similar, and model performance drops sharply. Traditionally, oversmoothing has been quantified using metrics that measure the similarity of neighbouring node features, such as the Dirichlet energy. While these metrics are related to oversmoothing, we argue they have critical limitations and fail to reliably capture oversmoothing in realistic scenarios. For instance, they provide meaningful insights only for very deep networks and under somewhat strict conditions on the norm of network weights and feature representations. As an alternative, we propose measuring oversmoothing by examining the numerical or effective rank of the feature representations. We provide theoretical support for this approach, demonstrating that the numerical rank of feature representations converges to one for a broad family of nonlinear activation functions under the assumption of nonnegative trained weights. To the best of our knowledge, this is the first result that proves the occurrence of oversmoothing without assumptions on the boundedness of the weight matrices. Along with the theoretical findings, we provide extensive numerical evaluation across diverse graph architectures. Our results show that rank-based metrics consistently capture oversmoothing, whereas energy-based metrics often fail. Notably, we reveal that a significant drop in the rank aligns closely with performance degradation, even in scenarios where energy metrics remain unchanged.

artificial intelligence, machine learning, matrix, (14 more...)

arXiv.org Machine Learning

2502.04591

Country:

Europe > Italy (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Efficient Sparsification of Simplicial Complexes via Local Densities of States

Savostianov, Anton, Schaub, Michael T., Guglielmi, Nicola, Tudisco, Francesco

arXiv.org Machine LearningFeb-11-2025

Simplicial complexes (SCs), a generalization of graph models for relational data that account for higher-order relations between data items, have become a popular abstraction for analyzing complex data using tools from topological data analysis or topological signal processing. However, the analysis of many real-world datasets leads to dense SCs with a large number of higher-order interactions. Unfortunately, analyzing such large SCs often has a prohibitive cost in terms of computation time and memory consumption. The sparsification of such complexes, i.e., the approximation of an original SC with a sparser simplicial complex with only a log-linear number of high-order simplices while maintaining a spectrum close to the original SC, is of broad interest. In this work, we develop a novel method for a probabilistic sparsifaction of SCs. At its core lies the efficient computation of sparsifying sampling probability through local densities of states as functional descriptors of the spectral information. To avoid pathological structures in the spectrum of the corresponding Hodge Laplacian operators, we suggest a "kernel-ignoring" decomposition for approximating the sampling probability; additionally, we exploit error estimates to show asymptotically prevailing algorithmic complexity of the developed method. The performance of the framework is demonstrated on the family of Vietoris--Rips filtered simplicial complexes.

artificial intelligence, machine learning, simplicial complex, (16 more...)

arXiv.org Machine Learning

2502.07558

Country:

Europe > Germany (0.14)
Europe > Italy (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.87)

Add feedback

Solaris: A Foundation Model of the Sun

Majid, Harris Abdul, Sittoni, Pietro, Tudisco, Francesco

arXiv.org Artificial IntelligenceNov-25-2024

Foundation models have demonstrated remarkable success across various scientific domains, motivating our exploration of their potential in solar physics. In this paper, we present Solaris, the first foundation model for forecasting the Sun's atmosphere. We leverage 13 years of full-disk, multi-wavelength solar imagery from the Solar Dynamics Observatory, spanning a complete solar cycle, to pre-train Solaris for 12-hour interval forecasting. Solaris is built on a large-scale 3D Swin Transformer architecture with 109 million parameters. We demonstrate Solaris' ability to generalize by fine-tuning on a low-data regime using a single wavelength (1700 {\AA}), that was not included in pre-training, outperforming models trained from scratch on this specific wavelength. Our results indicate that Solaris can effectively capture the complex dynamics of the solar atmosphere and transform solar forecasting.

machine learning, natural language, wavelength, (16 more...)

arXiv.org Artificial Intelligence

2411.16339

Genre: Research Report > New Finding (0.49)

Technology:

Information Technology > Artificial Intelligence > Vision (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)
Information Technology > Artificial Intelligence > Natural Language (0.88)

Add feedback

GeoLoRA: Geometric integration for parameter efficient fine-tuning

Schotthöfer, Steffen, Zangrando, Emanuele, Ceruti, Gianluca, Tudisco, Francesco, Kusch, Jonas

arXiv.org Artificial IntelligenceOct-24-2024

Low-Rank Adaptation (LoRA) has become a widely used method for parameter-efficient fine-tuning of large-scale, pre-trained neural networks. However, LoRA and its extensions face several challenges, including the need for rank adaptivity, robustness, and computational efficiency during the fine-tuning process. We introduce GeoLoRA, a novel approach that addresses these limitations by leveraging dynamical low-rank approximation theory. GeoLoRA requires only a single backpropagation pass over the small-rank adapters, significantly reducing computational cost as compared to similar dynamical low-rank training methods and making it faster than popular baselines such as AdaLoRA. This allows GeoLoRA to efficiently adapt the allocated parameter budget across the model, achieving smaller low-rank adapters compared to heuristic methods like AdaLoRA and LoRA, while maintaining critical convergence, descent, and error-bound theoretical guarantees. The resulting method is not only more efficient but also more robust to varying hyperparameter settings. We demonstrate the effectiveness of GeoLoRA on several state-of-the-art benchmarks, showing that it outperforms existing methods in both accuracy and computational efficiency.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.1872

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report > Promising Solution (0.34)

Industry: Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Low-Rank Adversarial PGD Attack

Savostianova, Dayana, Zangrando, Emanuele, Tudisco, Francesco

arXiv.org Machine LearningOct-16-2024

Adversarial attacks, characterized by subtle data perturbations that destabilize neural network predictions, have been a topic of significant interest for over a decade [48, 16, 32, 5]. These attacks have evolved into various forms, depending on the knowledge of the model's architecture (white-box, gray-box, black-box) [49], the type of data being targeted (graphs, images, text, etc.) [12, 47, 16, 57], and the specific adversarial objectives (targeted, untargeted, defense-oriented) [55, 29]. While numerous defense strategies aim to broadly stabilize models against adversarial attacks, independent of the specific attack mechanism [7, 14, 15, 41], the most effective and widely-used defenses focus on adversarial training, where the model is trained to withstand particular attacks [29, 50]. Adversarial training is known for producing robust models efficiently, but its effectiveness hinges on the availability of adversarial attacks that are both potent in degrading model accuracy and efficient in terms of computational resources. However, the most aggressive attacks often require significant computational resources, making them less practical for adversarial training. The projected gradient descent (PGD) attack [29] is popular in adversarial training due to its balance between aggressiveness and computational efficiency. In this work, we observe that in many cases the perturbations generated by PGD predominantly affect the lower part of the singular value spectrum of input images, indicating that these perturbations are approximately low-rank. Additionally, we find that the size of PGD-generated attacks differs significantly between standard and adversarially trained models when measured by their nuclear norm, which sums the singular values of the attack. This metric provides insight into the frequency profile of the attack when analyzed using the singular value decomposition (SVD) transform, aligning with known frequency profiles observed under discrete Fourier transform (DFT) and discrete cosine transform (DCT) analyses of PGD attacks [54, 31].

artificial intelligence, data quality, machine learning, (16 more...)

arXiv.org Machine Learning

2410.12607

Country: Europe > Italy (0.14)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (0.92)
Government > Military (0.78)

Technology:

Information Technology > Data Science > Data Quality > Data Transformation (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Subhomogeneous Deep Equilibrium Models

Sittoni, Pietro, Tudisco, Francesco

arXiv.org Artificial IntelligenceJun-6-2024

Implicit-depth neural networks have grown as powerful alternatives to traditional networks in various applications in recent years. However, these models often lack guarantees of existence and uniqueness, raising stability, performance, and reproducibility issues. In this paper, we present a new analysis of the existence and uniqueness of fixed points for implicit-depth neural networks based on the concept of subhomogeneous operators and the nonlinear Perron-Frobenius theory. Compared to previous similar analyses, our theory allows for weaker assumptions on the parameter matrices, thus yielding a more flexible framework for well-defined implicit networks. We illustrate the performance of the resulting subhomogeneous networks on feedforward, convolutional, and graph neural network examples.

artificial intelligence, machine learning, tanh, (19 more...)

arXiv.org Artificial Intelligence

2403.0072

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Neural Rank Collapse: Weight Decay and Small Within-Class Variability Yield Low-Rank Bias

Zangrando, Emanuele, Deidda, Piero, Brugiapaglia, Simone, Guglielmi, Nicola, Tudisco, Francesco

arXiv.org Artificial IntelligenceFeb-6-2024

Recent work in deep learning has shown strong empirical and theoretical evidence of an implicit low-rank bias: weight matrices in deep networks tend to be approximately low-rank and removing relatively small singular values during training or from available trained models may significantly reduce model size while maintaining or even improving model performance. However, the majority of the theoretical investigations around low-rank bias in neural networks deal with oversimplified deep linear networks. In this work, we consider general networks with nonlinear activations and the weight decay parameter, and we show the presence of an intriguing neural rank collapse phenomenon, connecting the low-rank bias of trained networks with networks' neural collapse properties: as the weight decay parameter grows, the rank of each layer in the network decreases proportionally to the within-class variability of the hidden-space embeddings of the previous layers. Our theoretical findings are supported by a range of experimental evaluations illustrating the phenomenon.

artificial intelligence, machine learning, matrix, (14 more...)

arXiv.org Artificial Intelligence

2402.03991

Country:

Europe > Italy (0.28)
North America > Canada > Quebec (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Laplacian-based Semi-Supervised Learning in Multilayer Hypergraphs by Coordinate Descent

Venturini, Sara, Cristofari, Andrea, Rinaldi, Francesco, Tudisco, Francesco

arXiv.org Artificial IntelligenceSep-24-2023

Graph Semi-Supervised learning is an important data analysis tool, where given a graph and a set of labeled nodes, the aim is to infer the labels to the remaining unlabeled nodes. In this paper, we start by considering an optimization-based formulation of the problem for an undirected graph, and then we extend this formulation to multilayer hypergraphs. We solve the problem using different coordinate descent approaches and compare the results with the ones obtained by the classic gradient descent method. Experiments on synthetic and real-world datasets show the potential of using coordinate descent methods with suitable selection rules.

artificial intelligence, laplacian-based semi-supervised learning, machine learning, (2 more...)

arXiv.org Artificial Intelligence

2301.12184

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.60)

Add feedback

Robust low-rank training via approximate orthonormal constraints

Savostianova, Dayana, Zangrando, Emanuele, Ceruti, Gianluca, Tudisco, Francesco

arXiv.org Artificial IntelligenceJun-2-2023

With the growth of model and data sizes, a broad effort has been made to design pruning techniques that reduce the resource demand of deep learning pipelines, while retaining model performance. In order to reduce both inference and training costs, a prominent line of work uses low-rank matrix factorizations to represent the network weights. Although able to retain accuracy, we observe that low-rank methods tend to compromise model robustness against adversarial perturbations. By modeling robustness in terms of the condition number of the neural network, we argue that this loss of robustness is due to the exploding singular values of the low-rank weight matrices. Thus, we introduce a robust low-rank training algorithm that maintains the network's weights on the low-rank matrix manifold while simultaneously enforcing approximate orthonormal constraints. The resulting model reduces both training and inference costs while ensuring well-conditioning and thus better adversarial robustness, without compromising model accuracy. This is shown by extensive numerical evidence and by our main approximation theorem that shows the computed robust low-rank network well-approximates the ideal full model, provided a highly performing low-rank sub-network exists.

artificial intelligence, cond, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2306.01485

Country:

Europe > Italy (0.14)
Europe > Switzerland (0.14)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback