AITopics

2502.04308

Country: Europe > United Kingdom > England (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Machine LearningJan-8-2025

Grokking at the Edge of Numerical Stability

Prieto, Lucas, Barsbey, Melih, Mediano, Pedro A. M., Birdal, Tolga

Grokking, or sudden generalization that occurs after prolonged overfitting, is a surprising phenomenon that has challenged our understanding of deep learning. While a lot of progress has been made in understanding grokking, it is still not clear why generalization is delayed and why grokking often does not happen without regularization. In this work we argue that without regularization, grokking tasks push models to the edge of numerical stability, introducing floating point errors in the Softmax that we refer to as Softmax Collapse (SC). We show that SC prevents grokking and that mitigating SC leads to grokking without regularization. Investigating the root cause of SC, we find that beyond the point of overfitting, the gradients strongly align with what we call the naïve loss minimization (NLM) direction. This component of the gradient does not change the predictions of the model but decreases the loss by scaling the logits, usually through the scaling of the weights along their current direction. We show that this scaling of the logits explains the delay in generalization characteristic of grokking, and eventually leads to SC, stopping learning altogether. To validate these hypotheses, we introduce two key contributions that mitigate the issues faced in grokking tasks: (i) StableMax, a new activation function that prevents SC and enables grokking without regularization, and (ii) Grad, a training algorithm that leads to quick generalization in grokking tasks by preventing NLM altogether. These contributions provide new insights into grokking, shedding light on its delayed generalization, reliance on regularization, and the effectiveness of known grokking-inducing methods. Code for this paper can be found at: https://github.com/LucasPrietoAl/ Deep learning has been transformative for a variety of fields such as natural language processing (Devlin et al., 2019), computer vision (Krizhevsky et al., 2012), geometry processing (Qi et al., 2017), and 3D vision (Deng et al., 2018). This rapid proliferation has brought with it surprising phenomena that defy the predictions of classical statistical learning theory. In this paper we explore one such recently observed phenomenon known as grokking, first described by Power et al. (2022) as a sudden and unexpected generalization occurring after prolonged overfitting.

artificial intelligence, generalization, machine learning, (17 more...)

2501.04697

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceOct-29-2024

Convex Formulations for Training Two-Layer ReLU Neural Networks

Prakhya, Karthik, Birdal, Tolga, Yurtsever, Alp

Solving non-convex, NP-hard optimization problems is crucial for training machine learning models, including neural networks. However, non-convexity often leads to black-box machine learning models with unclear inner workings. While convex formulations have been used for verifying neural network robustness, their application to training neural networks remains less explored. In response to this challenge, we reformulate the problem of training infinite-width two-layer ReLU networks as a convex completely positive program in a finite-dimensional (lifted) space. Despite the convexity, solving this problem remains NP-hard due to the complete positivity constraint. To overcome this challenge, we introduce a semidefinite relaxation that can be solved in polynomial time. We then experimentally evaluate the tightness of this relaxation, demonstrating its competitive performance in test accuracy across a range of classification tasks.

artificial intelligence, machine learning, neural network, (16 more...)

2410.22311

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

arXiv.org Artificial IntelligenceJul-11-2024

Topological Generalization Bounds for Discrete-Time Stochastic Optimization Algorithms

Andreeva, Rayna, Dupuis, Benjamin, Sarkar, Rik, Birdal, Tolga, Şimşekli, Umut

We present a novel set of rigorous and computationally efficient topology-based complexity notions that exhibit a strong correlation with the generalization gap in modern deep neural networks (DNNs). DNNs show remarkable generalization properties, yet the source of these capabilities remains elusive, defying the established statistical learning theory. Recent studies have revealed that properties of training trajectories can be indicative of generalization. Building on this insight, state-of-the-art methods have leveraged the topology of these trajectories, particularly their fractal dimension, to quantify generalization. Most existing works compute this quantity by assuming continuous- or infinite-time training dynamics, complicating the development of practical estimators capable of accurately predicting generalization without access to test data. In this paper, we respect the discrete-time nature of training trajectories and investigate the underlying topological quantities that can be amenable to topological data analysis tools. This leads to a new family of reliable topological complexity measures that provably bound the generalization error, eliminating the need for restrictive geometric assumptions. These measures are computationally friendly, enabling us to propose simple yet effective algorithms for computing generalization indices. Moreover, our flexible framework can be extended to different domains, tasks, and architectures. Our experimental results demonstrate that our new complexity measures correlate highly with generalization error in industry-standards architectures such as transformers and deep graph networks. Our approach consistently outperforms existing topological bounds across a wide range of datasets, models, and optimizers, highlighting the practical relevance and effectiveness of our complexity measures.

artificial intelligence, deep learning, machine learning, (12 more...)

2407.08723

Country: Europe > United Kingdom (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningMay-26-2024

Attending to Topological Spaces: The Cellular Transformer

Ballester, Rubén, Hernández-García, Pablo, Papillon, Mathilde, Battiloro, Claudio, Miolane, Nina, Birdal, Tolga, Casacuberta, Carles, Escalera, Sergio, Hajij, Mustafa

Topological Deep Learning seeks to enhance the predictive performance of neural network models by harnessing topological structures in input data. Topological neural networks operate on spaces such as cell complexes and hypergraphs, that can be seen as generalizations of graphs. In this work, we introduce the Cellular Transformer (CT), a novel architecture that generalizes graph-based transformers to cell complexes. First, we propose a new formulation of the usual self- and cross-attention mechanisms, tailored to leverage incidence relations in cell complexes, e.g., edge-face and node-edge relations. Additionally, we propose a set of topological positional encodings specifically designed for cell complexes. By transforming three graph datasets into cell complex datasets, our experiments reveal that CT not only achieves state-of-the-art performance, but it does so without the need for more complex enhancements such as virtual nodes, in-domain structural encodings, or graph rewiring.

artificial intelligence, cell complex, machine learning, (17 more...)

2405.14094

Country:

Europe (0.46)
North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

arXiv.org Machine LearningFeb-13-2024

Position Paper: Challenges and Opportunities in Topological Deep Learning

Papamarkou, Theodore, Birdal, Tolga, Bronstein, Michael, Carlsson, Gunnar, Curry, Justin, Gao, Yue, Hajij, Mustafa, Kwitt, Roland, Liò, Pietro, Di Lorenzo, Paolo, Maroulas, Vasileios, Miolane, Nina, Nasrin, Farzana, Ramamurthy, Karthikeyan Natesan, Rieck, Bastian, Scardapane, Simone, Schaub, Michael T., Veličković, Petar, Wang, Bei, Wang, Yusu, Wei, Guo-Wei, Zamzmi, Ghada

Traditional machine learning often assumes that the observed data of interest are supported on a linear vector space Topological deep learning (TDL) is a rapidly and can be described by a set of feature vectors. However, evolving field that uses topological features to understand there is growing awareness that, in many cases, this viewpoint and design deep learning models. This is insufficient to describe several data within the real paper posits that TDL may complement graph representation world. For example, molecules may be described more appropriately learning and geometric deep learning by graphs than feature vectors. Other examples by incorporating topological concepts, and can include three-dimensional objects represented by meshes, thus provide a natural choice for various machine as encountered in computer graphics and geometry processing, learning settings. To this end, this paper discusses or data supported on top of a complex social network open problems in TDL, ranging from practical of interrelated actors. Hence, there has been an increased benefits to theoretical foundations. For each problem, interest in importing concepts from geometry and topology it outlines potential solutions and future research into the usual machine learning pipelines to gain further opportunities.

artificial intelligence, machine learning, survey article, (15 more...)

2402.08871

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Michigan > Ingham County (0.14)

Genre: Research Report > Promising Solution (0.87)

Industry:

Information Technology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceFeb-7-2024

TopoX: A Suite of Python Packages for Machine Learning on Topological Domains

Hajij, Mustafa, Papillon, Mathilde, Frantzen, Florian, Agerberg, Jens, AlJabea, Ibrahem, Ballester, Ruben, Battiloro, Claudio, Bernárdez, Guillermo, Birdal, Tolga, Brent, Aiden, Chin, Peter, Escalera, Sergio, Fiorellino, Simone, Gardaa, Odin Hoff, Gopalakrishnan, Gurusankar, Govil, Devendra, Hoppe, Josef, Karri, Maneel Reddy, Khouja, Jude, Lecha, Manuel, Livesay, Neal, Meißner, Jan, Mukherjee, Soham, Nikitin, Alexander, Papamarkou, Theodore, Prílepok, Jaro, Ramamurthy, Karthikeyan Natesan, Rosen, Paul, Guzmán-Sáenz, Aldo, Salatiello, Alessandro, Samaga, Shreyas N., Scardapane, Simone, Schaub, Michael T., Scofano, Luca, Spinelli, Indro, Telyatnikov, Lev, Truong, Quang, Walters, Robin, Yang, Maosheng, Zaghen, Olga, Zamzmi, Ghada, Zia, Ali, Miolane, Nina

We introduce TopoX, a Python software suite that provides reliable and user-friendly building blocks for computing and machine learning on topological domains that extend graphs: hypergraphs, simplicial, cellular, path and combinatorial complexes. TopoX consists of three packages: TopoNetX facilitates constructing and computing on these domains, including working with nodes, edges and higher-order cells; TopoEmbedX provides methods to embed topological domains into vector spaces, akin to popular graph-based embedding algorithms such as node2vec; TopoModelX is built on top of PyTorch and offers a comprehensive toolbox of higher-order message passing functions for neural networks on topological domains. The extensively documented and unit-tested source code of TopoX is available under MIT license at https://github.com/pyt-team.

artificial intelligence, machine learning, topological domain, (13 more...)

2402.02441

Country: North America > United States (0.70)

Genre: Research Report (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

arXiv.org Artificial IntelligenceJan-18-2024

ICML 2023 Topological Deep Learning Challenge : Design and Results

Papillon, Mathilde, Hajij, Mustafa, Jenne, Helen, Mathe, Johan, Myers, Audun, Papamarkou, Theodore, Birdal, Tolga, Dey, Tamal, Doster, Tim, Emerson, Tegan, Gopalakrishnan, Gurusankar, Govil, Devendra, Guzmán-Sáenz, Aldo, Kvinge, Henry, Livesay, Neal, Mukherjee, Soham, Samaga, Shreyas N., Ramamurthy, Karthikeyan Natesan, Karri, Maneel Reddy, Rosen, Paul, Sanborn, Sophia, Walters, Robin, Agerberg, Jens, Barikbin, Sadrodin, Battiloro, Claudio, Bazhenov, Gleb, Bernardez, Guillermo, Brent, Aiden, Escalera, Sergio, Fiorellino, Simone, Gavrilev, Dmitrii, Hassanin, Mohammed, Häusner, Paul, Gardaa, Odin Hoff, Khamis, Abdelwahed, Lecha, Manuel, Magai, German, Malygina, Tatiana, Ballester, Rubén, Nadimpalli, Kalyan, Nikitin, Alexander, Rabinowitz, Abraham, Salatiello, Alessandro, Scardapane, Simone, Scofano, Luca, Singh, Suraj, Sjölund, Jens, Snopov, Pavel, Spinelli, Indro, Telyatnikov, Lev, Testa, Lucia, Yang, Maosheng, Yue, Yixiao, Zaghen, Olga, Zia, Ali, Miolane, Nina

This paper presents the computational challenge on topological deep learning that was hosted within the ICML 2023 Workshop on Topology and Geometry in Machine Learning. The competition asked participants to provide open-source implementations of topological neural networks from the literature by contributing to the python packages TopoNetX (data processing) and TopoModelX (deep learning). The challenge attracted twenty-eight qualifying submissions in its two-month duration. This paper describes the design of the challenge and summarizes its main findings.

artificial intelligence, machine learning, submission, (12 more...)

doi: 10.5281/zenodo.7958513

2309.15188

Country: North America > United States > Hawaii (0.14)

Genre:

Research Report (0.51)
Questionnaire & Opinion Survey (0.35)

Industry: Information Technology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningJan-8-2024

Fun with Flags: Robust Principal Directions via Flag Manifolds

Mankovich, Nathan, Camps-Valls, Gustau, Birdal, Tolga

Principal component analysis (PCA), along with its extensions to manifolds and outlier contaminated data, have been indispensable in computer vision and machine learning. In this work, we present a unifying formalism for PCA and its variants, and introduce a framework based on the flags of linear subspaces, \ie a hierarchy of nested linear subspaces of increasing dimension, which not only allows for a common implementation but also yields novel variants, not explored previously. We begin by generalizing traditional PCA methods that either maximize variance or minimize reconstruction error. We expand these interpretations to develop a wide array of new dimensionality reduction algorithms by accounting for outliers and the data manifold. To devise a common computational approach, we recast robust and dual forms of PCA as optimization problems on flag manifolds. We then integrate tangent space approximations of principal geodesic analysis (tangent-PCA) into this flag-based framework, creating novel robust and dual geodesic PCA variations. The remarkable flexibility offered by the 'flagification' introduced here enables even more algorithmic variants identified by specific flag types. Last but not least, we propose an effective convergent solver for these flag-formulations employing the Stiefel manifold. Our empirical results on both real-world and synthetic scenarios, demonstrate the superiority of our novel algorithms, especially in terms of robustness to outliers on manifolds.

artificial intelligence, machine learning, optimization problem, (17 more...)

2401.04071

Country:

North America > United States (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)

arXiv.org Machine LearningDec-14-2023

Combinatorial Complexes: Bridging the Gap Between Cell Complexes and Hypergraphs

Hajij, Mustafa, Zamzmi, Ghada, Papamarkou, Theodore, Guzmán-Sáenz, Aldo, Birdal, Tolga, Schaub, Michael T.

Graph-based signal processing techniques have become essential for handling data in non-Euclidean spaces. However, there is a growing awareness that these graph models might need to be expanded into `higher-order' domains to effectively represent the complex relations found in high-dimensional data. Such higher-order domains are typically modeled either as hypergraphs, or as simplicial, cubical or other cell complexes. In this context, cell complexes are often seen as a subclass of hypergraphs with additional algebraic structure that can be exploited, e.g., to develop a spectral theory. In this article, we promote an alternative perspective. We argue that hypergraphs and cell complexes emphasize \emph{different} types of relations, which may have different utility depending on the application context. Whereas hypergraphs are effective in modeling set-type, multi-body relations between entities, cell complexes provide an effective means to model hierarchical, interior-to-boundary type relations. We discuss the relative advantages of these two choices and elaborate on the previously introduced concept of a combinatorial complex that enables co-existing set-type and hierarchical relations. Finally, we provide a brief numerical experiment to demonstrate that this modelling flexibility can be advantageous in learning tasks.

artificial intelligence, machine learning, relation, (16 more...)

2312.09504

Country:

North America > United States (0.68)
Europe > United Kingdom > England (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)