AITopics | Cazzaniga, Alberto

Collaborating Authors

Cazzaniga, Alberto

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Geometry of Tokens in Internal Representations of Large Language Models

Viswanathan, Karthik, Gardinazzi, Yuri, Panerai, Giada, Cazzaniga, Alberto, Biagetti, Matteo

arXiv.org Artificial IntelligenceJan-17-2025

We investigate the relationship between the geometry of token embeddings and their role in the next token prediction within transformer models. An important aspect of this connection uses the notion of empirical measure, which encodes the distribution of token point clouds across transformer layers and drives the evolution of token representations in the mean-field interacting picture. We use metrics such as intrinsic dimension, neighborhood overlap, and cosine similarity to observationally probe these empirical measures across layers. To validate our approach, we compare these metrics to a dataset where the tokens are shuffled, which disrupts the syntactic and semantic structure. Our findings reveal a correlation between the geometric properties of token embeddings and the cross-entropy loss of next token predictions, implying that prompts with higher loss values have tokens represented in higher-dimensional spaces.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.10573

Country:

Asia > China (0.28)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Middle East > Malta (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Narrow Gate: Localized Image-Text Communication in Vision-Language Models

Serra, Alessandro, Ortu, Francesco, Panizon, Emanuele, Valeriani, Lucrezia, Basile, Lorenzo, Ansuini, Alessio, Doimo, Diego, Cazzaniga, Alberto

arXiv.org Artificial IntelligenceDec-9-2024

Recent advances in multimodal training have significantly improved the integration of image understanding and generation within a unified model. This study investigates how vision-language models (VLMs) handle image-understanding tasks, specifically focusing on how visual information is processed and transferred to the textual domain. We compare VLMs that generate both images and text with those that output only text, highlighting key differences in information flow. We find that in models with multimodal outputs, image and text embeddings are more separated within the residual stream. Additionally, models vary in how information is exchanged from visual to textual tokens. VLMs that only output text exhibit a distributed communication pattern, where information is exchanged through multiple image tokens. In contrast, models trained for image and text generation rely on a single token that acts as a narrow gate for the visual information. We demonstrate that ablating this single token significantly deteriorates performance on image understanding tasks. Furthermore, modifying this token enables effective steering of the image semantics, showing that targeted, local interventions can reliably control the model's global behavior.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2412.06646

Country:

Europe (1.00)
North America > United States (0.68)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Persistent Topological Features in Large Language Models

Gardinazzi, Yuri, Panerai, Giada, Viswanathan, Karthik, Ansuini, Alessio, Cazzaniga, Alberto, Biagetti, Matteo

arXiv.org Artificial IntelligenceOct-14-2024

Understanding the decision-making processes of large language models (LLMs) is critical given their widespread applications. Towards this goal, describing the topological and geometrical properties of internal representations has recently provided valuable insights. For a more comprehensive characterization of these inherently complex spaces, we present a novel framework based on zigzag persistence, a method in topological data analysis (TDA) well-suited for describing data undergoing dynamic transformations across layers. Within this framework, we introduce persistence similarity, a new metric that quantifies the persistence and transformation of topological features such as $p$-cycles throughout the model layers. Unlike traditional similarity measures, our approach captures the entire evolutionary trajectory of these features, providing deeper insights into the internal workings of LLMs. As a practical application, we leverage persistence similarity to identify and prune redundant layers, demonstrating comparable performance to state-of-the-art methods across several benchmark datasets. Additionally, our analysis reveals consistent topological behaviors across various models and hyperparameter settings, suggesting a universal structure in LLM internal representations.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.11042

Country: Europe > Italy (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals

Ortu, Francesco, Jin, Zhijing, Doimo, Diego, Sachan, Mrinmaya, Cazzaniga, Alberto, Schölkopf, Bernhard

arXiv.org Artificial IntelligenceJun-6-2024

Interpretability research aims to bridge the gap between empirical success and our scientific understanding of the inner workings of large language models (LLMs). However, most existing research focuses on analyzing a single mechanism, such as how models copy or recall factual knowledge. In this work, we propose a formulation of competition of mechanisms, which focuses on the interplay of multiple mechanisms instead of individual mechanisms and traces how one of them becomes dominant in the final prediction. We uncover how and where mechanisms compete within LLMs using two interpretability methods: logit inspection and attention modification. Our findings show traces of the mechanisms and their competition across various model components and reveal attention positions that effectively control the strength of certain mechanisms. Code: https://github.com/francescortu/comp-mech. Data: https://huggingface.co/datasets/francescortu/comp-mech.

large language model, machine learning, mechanism, (20 more...)

arXiv.org Artificial Intelligence

2402.11655

Country:

Asia (0.68)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

The geometry of hidden representations of large transformer models

Valeriani, Lucrezia, Doimo, Diego, Cuturello, Francesca, Laio, Alessandro, Ansuini, Alessio, Cazzaniga, Alberto

arXiv.org Machine LearningOct-30-2023

Large transformers are powerful architectures used for self-supervised data analysis across various data types, including protein sequences, images, and text. In these models, the semantic structure of the dataset emerges from a sequence of transformations between one representation and the next. We characterize the geometric and statistical properties of these representations and how they change as we move through the layers. By analyzing the intrinsic dimension (ID) and neighbor composition, we find that the representations evolve similarly in transformers trained on protein language tasks and image reconstruction tasks. In the first layers, the data manifold expands, becoming high-dimensional, and then contracts significantly in the intermediate layers. In the last part of the model, the ID remains approximately constant or forms a second shallow peak. We show that the semantic information of the dataset is better expressed at the end of the first peak, and this phenomenon can be observed across many models trained on diverse datasets. Based on our findings, we point out an explicit strategy to identify, without supervision, the layers that maximize semantic content: representations at intermediate layers corresponding to a relative minimum of the ID profile are more suitable for downstream learning tasks.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2302.00294

Country:

Europe > Italy (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.87)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Emergent representations in networks trained with the Forward-Forward algorithm

Tosato, Niccolò, Basile, Lorenzo, Ballarin, Emanuele, de Alteriis, Giuseppe, Cazzaniga, Alberto, Ansuini, Alessio

arXiv.org Artificial IntelligenceOct-26-2023

The Backpropagation algorithm has often been criticised for its lack of biological realism. In an attempt to find a more biologically plausible alternative, the recently introduced Forward-Forward algorithm replaces the forward and backward passes of Backpropagation with two forward passes. In this work, we show that the internal representations obtained by the Forward-Forward algorithm can organise into category-specific ensembles exhibiting high sparsity - i.e. composed of an extremely low number of active units. This situation is reminiscent of what has been observed in cortical sensory areas, where neuronal ensembles are suggested to serve as the functional building blocks for perception and action. Interestingly, while this sparse pattern does not typically arise in models trained with standard Backpropagation, it can emerge in networks trained with Backpropagation on the same objective proposed for the Forward-Forward algorithm. These results suggest that the learning procedure proposed by Forward-Forward may be superior to Backpropagation in modelling learning in the cortex, even when a backward pass is used.

artificial intelligence, ensemble, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2305.18353

Country:

Europe > United Kingdom > England (0.14)
North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (1.00)

Add feedback