AITopics

2406.04229

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)

arXiv.org Artificial IntelligenceJun-6-2024

Transformers need glasses! Information over-squashing in language tasks

Barbero, Federico, Banino, Andrea, Kapturowski, Steven, Kumaran, Dharshan, Araújo, João G. M., Vitvitskyi, Alex, Pascanu, Razvan, Veličković, Petar

We study how information propagates in decoder-only Transformers, which are the architectural backbone of most existing frontier large language models (LLMs). We rely on a theoretical signal propagation analysis -- specifically, we analyse the representations of the last token in the final layer of the Transformer, as this is the representation used for next-token prediction. Our analysis reveals a representational collapse phenomenon: we prove that certain distinct sequences of inputs to the Transformer can yield arbitrarily close representations in the final token. This effect is exacerbated by the low-precision floating-point formats frequently used in modern LLMs. As a result, the model is provably unable to respond to these sequences in different ways -- leading to errors in, e.g., tasks involving counting or copying. Further, we show that decoder-only Transformer language models can lose sensitivity to specific tokens in the input, which relates to the well-known phenomenon of over-squashing in graph neural networks. We provide empirical evidence supporting our claims on contemporary LLMs. Our theory also points to simple solutions towards ameliorating these issues.

large language model, machine learning, natural language, (19 more...)

2406.04267

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningJun-5-2024

Temporal Graph Rewiring with Expander Graphs

Petrović, Katarina, Huang, Shenyang, Poursafaei, Farimah, Veličković, Petar

Evolving relations in real-world networks are often modelled by temporal graphs. Graph rewiring techniques have been utilised on Graph Neural Networks (GNNs) to improve expressiveness and increase model performance. In this work, we propose Temporal Graph Rewiring (TGR), the first approach for graph rewiring on temporal graphs. TGR enables communication between temporally distant nodes in a continuous time dynamic graph by utilising expander graph propagation to construct a message passing highway for message passing between distant nodes. Expander graphs are suitable candidates for rewiring as they help overcome the oversquashing problem often observed in GNNs. On the public tgbl-wiki benchmark, we show that TGR improves the performance of a widely used TGN model by a significant margin. Our code repository is accessible at https://github.com/kpetrovicc/TGR.git .

artificial intelligence, graph, machine learning, (12 more...)

2406.02362

Country:

Europe > United Kingdom (0.28)
North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

arXiv.org Machine LearningFeb-23-2024

Categorical Deep Learning: An Algebraic Theory of Architectures

Gavranović, Bruno, Lessard, Paul, Dudzik, Andrew, von Glehn, Tamara, Araújo, João G. M., Veličković, Petar

We present our position on the elusive quest for a general-purpose framework for specifying and studying deep learning architectures. Our opinion is that the key attempts made so far lack a coherent bridge between specifying constraints which models must satisfy and specifying their implementations. Focusing on building a such a bridge, we propose to apply category theory -- precisely, the universal algebra of monads valued in a 2-category of parametric maps -- as a single theory elegantly subsuming both of these flavours of neural network design. To defend our position, we show how this theory recovers constraints induced by geometric deep learning, as well as implementations of many architectures drawn from the diverse landscape of neural networks, such as RNNs. We also illustrate how the theory naturally encodes many standard constructs in computer science and automata theory.

artificial intelligence, categorical deep learning, machine learning, (2 more...)

2402.15332

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.80)

arXiv.org Machine LearningFeb-13-2024

Position Paper: Challenges and Opportunities in Topological Deep Learning

Papamarkou, Theodore, Birdal, Tolga, Bronstein, Michael, Carlsson, Gunnar, Curry, Justin, Gao, Yue, Hajij, Mustafa, Kwitt, Roland, Liò, Pietro, Di Lorenzo, Paolo, Maroulas, Vasileios, Miolane, Nina, Nasrin, Farzana, Ramamurthy, Karthikeyan Natesan, Rieck, Bastian, Scardapane, Simone, Schaub, Michael T., Veličković, Petar, Wang, Bei, Wang, Yusu, Wei, Guo-Wei, Zamzmi, Ghada

Traditional machine learning often assumes that the observed data of interest are supported on a linear vector space Topological deep learning (TDL) is a rapidly and can be described by a set of feature vectors. However, evolving field that uses topological features to understand there is growing awareness that, in many cases, this viewpoint and design deep learning models. This is insufficient to describe several data within the real paper posits that TDL may complement graph representation world. For example, molecules may be described more appropriately learning and geometric deep learning by graphs than feature vectors. Other examples by incorporating topological concepts, and can include three-dimensional objects represented by meshes, thus provide a natural choice for various machine as encountered in computer graphics and geometry processing, learning settings. To this end, this paper discusses or data supported on top of a complex social network open problems in TDL, ranging from practical of interrelated actors. Hence, there has been an increased benefits to theoretical foundations. For each problem, interest in importing concepts from geometry and topology it outlines potential solutions and future research into the usual machine learning pipelines to gain further opportunities.

artificial intelligence, machine learning, survey article, (15 more...)

2402.08871

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Michigan > Ingham County (0.14)

Genre: Research Report > Promising Solution (0.87)

Industry:

Information Technology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJan-3-2024

Parallel Algorithms Align with Neural Execution

Engelmayer, Valerie, Georgiev, Dobrik, Veličković, Petar

Neural algorithmic reasoners are parallel processors. Teaching them sequential algorithms contradicts this nature, rendering a significant share of their computations redundant. Parallel algorithms however may exploit their full computational power, therefore requiring fewer layers to be executed. This drastically reduces training times, as we observe when comparing parallel implementations of searching, sorting and finding strongly connected components to their sequential counterparts on the CLRS framework. Additionally, parallel versions achieve (often strongly) superior predictive performance.

algorithm, artificial intelligence, machine learning, (16 more...)

2307.04049

Country: Europe > United Kingdom > England (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Architecture > Distributed Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceNov-20-2023

Recursive Algorithmic Reasoning

Jürß, Jonas, Jayalath, Dulhan, Veličković, Petar

Learning models that execute algorithms can enable us to address a key problem in deep learning: generalizing to out-of-distribution data. However, neural networks are currently unable to execute recursive algorithms because they do not have arbitrarily large memory to store and recall state. To address this, we (1) propose a way to augment graph neural networks (GNNs) with a stack, and (2) develop an approach for capturing intermediate algorithm trajectories that improves algorithmic alignment with recursive algorithms over previous methods. The stack allows the network to learn to store and recall a portion of the state of the network at a particular time, analogous to the action of a call stack in a recursive algorithm. This augmentation permits the network to reason recursively. We empirically demonstrate that our proposals significantly improve generalization to larger input graphs over prior work on depth-first search (DFS).

algorithm, artificial intelligence, machine learning, (17 more...)

2307.00337

Country:

North America > United States (1.00)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre:

Research Report (0.50)
Workflow (0.46)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

arXiv.org Artificial IntelligenceNov-6-2023

Finding Increasingly Large Extremal Graphs with AlphaZero and Tabu Search

Mehrabian, Abbas, Anand, Ankit, Kim, Hyunjik, Sonnerat, Nicolas, Balog, Matej, Comanici, Gheorghe, Berariu, Tudor, Lee, Andrew, Ruoss, Anian, Bulanova, Anna, Toyama, Daniel, Blackwell, Sam, Paredes, Bernardino Romera, Veličković, Petar, Orseau, Laurent, Lee, Joonkyung, Naredla, Anurag Murty, Precup, Doina, Wagner, Adam Zsolt

This work studies a central extremal graph theory problem inspired by a 1975 conjecture of Erd\H{o}s, which aims to find graphs with a given size (number of nodes) that maximize the number of edges without having 3- or 4-cycles. We formulate this problem as a sequential decision-making problem and compare AlphaZero, a neural network-guided tree search, with tabu search, a heuristic local search method. Using either method, by introducing a curriculum -- jump-starting the search for larger graphs using good graphs found at smaller sizes -- we improve the state-of-the-art lower bounds for several sizes. We also propose a flexible graph-generation environment and a permutation-invariant network architecture for learning to search in the space of graphs.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2311.03583

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Games (0.93)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningOct-17-2023

TacticAI: an AI assistant for football tactics

Wang, Zhe, Veličković, Petar, Hennes, Daniel, Tomašev, Nenad, Prince, Laurel, Kaisers, Michael, Bachrach, Yoram, Elie, Romuald, Wenliang, Li Kevin, Piccinini, Federico, Spearman, William, Graham, Ian, Connor, Jerome, Yang, Yi, Recasens, Adrià, Khan, Mina, Beauguerlange, Nathalie, Sprechmann, Pablo, Moreno, Pol, Heess, Nicolas, Bowling, Michael, Hassabis, Demis, Tuyls, Karl

Identifying key patterns of tactics implemented by rival teams, and developing effective responses, lies at the heart of modern football. However, doing so algorithmically remains an open research challenge. To address this unmet need, we propose TacticAI, an AI football tactics assistant developed and evaluated in close collaboration with domain experts from Liverpool FC. We focus on analysing corner kicks, as they offer coaches the most direct opportunities for interventions and improvements. TacticAI incorporates both a predictive and a generative component, allowing the coaches to effectively sample and explore alternative player setups for each corner kick routine and to select those with the highest predicted likelihood of success. We validate TacticAI on a number of relevant benchmark tasks: predicting receivers and shot attempts and recommending player position adjustments. The utility of TacticAI is validated by a qualitative study conducted with football domain experts at Liverpool FC. We show that TacticAI's model suggestions are not only indistinguishable from real tactics, but also favoured over existing tactics 90% of the time, and that TacticAI offers an effective corner kick retrieval system. TacticAI achieves these results despite the limited availability of gold-standard data, achieving data efficiency through geometric deep learning.

data mining, machine learning, natural language, (20 more...)

2310.10553

Country:

Europe (0.46)
North America > Canada > Alberta (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.46)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

arXiv.org Artificial IntelligenceAug-17-2023

Half-Hop: A graph upsampling approach for slowing down message passing

Azabou, Mehdi, Ganesh, Venkataramana, Thakoor, Shantanu, Lin, Chi-Heng, Sathidevi, Lakshmi, Liu, Ran, Valko, Michal, Veličković, Petar, Dyer, Eva L.

Message passing neural networks have shown a lot of success on graph-structured data. However, there are many instances where message passing can lead to over-smoothing or fail when neighboring nodes belong to different classes. In this work, we introduce a simple yet general framework for improving learning in message passing neural networks. Our approach essentially upsamples edges in the original graph by adding "slow nodes" at each edge that can mediate communication between a source and a target node. Our method only modifies the input graph, making it plug-and-play and easy to use with existing models. To understand the benefits of slowing down message passing, we provide theoretical and empirical analyses. We report results on several supervised and self-supervised benchmarks, and show improvements across the board, notably in heterophilic conditions where adjacent nodes are more likely to have different labels. Finally, we show how our approach can be used to generate augmentations for self-supervised learning, where slow nodes are randomly introduced into different edges in the graph to generate multi-scale views with variable path lengths.

artificial intelligence, half-hop, machine learning, (19 more...)

2308.09198

Country:

North America > Canada > Quebec (0.14)
North America > United States > Hawaii (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Architecture > Distributed Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)