AITopics | Treviso, Marcos

Collaborating Authors

Treviso, Marcos

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LaTIM: Measuring Latent Token-to-Token Interactions in Mamba Models

Pitorro, Hugo, Treviso, Marcos

arXiv.org Artificial IntelligenceFeb-24-2025

State space models (SSMs), such as Mamba, have emerged as an efficient alternative to transformers for long-context sequence modeling. However, despite their growing adoption, SSMs lack the interpretability tools that have been crucial for understanding and improving attention-based architectures. While recent efforts provide insights into Mamba's internal mechanisms, they do not explicitly decompose token-wise contributions, leaving gaps in understanding how Mamba selectively processes sequences across layers. In this work, we introduce LaTIM, a novel token-level decomposition method for both Mamba-1 and Mamba-2 that enables fine-grained interpretability. We extensively evaluate our method across diverse tasks, including machine translation, copying, and retrieval-based generation, demonstrating its effectiveness in revealing Mamba's token-to-token interaction patterns.

computational linguistic, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.15612

Country:

Asia (1.00)
Europe (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

AdaSplash: Adaptive Sparse Flash Attention

Gonçalves, Nuno, Treviso, Marcos, Martins, André F. T.

arXiv.org Artificial IntelligenceFeb-17-2025

The computational cost of softmax-based attention in transformers limits their applicability to long-context tasks. Adaptive sparsity, of which $\alpha$-entmax attention is an example, offers a flexible data-dependent alternative, but existing implementations are inefficient and do not leverage the sparsity to obtain runtime and memory gains. In this work, we propose AdaSplash, which combines the efficiency of GPU-optimized algorithms with the sparsity benefits of $\alpha$-entmax. We first introduce a hybrid Halley-bisection algorithm, resulting in a 7-fold reduction in the number of iterations needed to compute the $\alpha$-entmax transformation. Then, we implement custom Triton kernels to efficiently handle adaptive sparsity. Experiments with RoBERTa and ModernBERT for text classification and single-vector retrieval, along with GPT-2 for language modeling, show that our method achieves substantial improvements in runtime and memory efficiency compared to existing $\alpha$-entmax implementations. It approaches -- and in some cases surpasses -- the efficiency of highly optimized softmax implementations like FlashAttention-2, enabling long-context training while maintaining strong task performance.

large language model, machine learning, plash, (21 more...)

arXiv.org Artificial Intelligence

2502.12082

Country:

Europe (1.00)
North America > United States > New York (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

How Effective are State Space Models for Machine Translation?

Pitorro, Hugo, Vasylenko, Pavlo, Treviso, Marcos, Martins, André F. T.

arXiv.org Artificial IntelligenceJul-7-2024

Transformers are the current architecture of choice for NLP, but their attention layers do not scale well to long contexts. Recent works propose to replace attention with linear recurrent layers -- this is the case for state space models, which enjoy efficient training and inference. However, it remains unclear whether these models are competitive with transformers in machine translation (MT). In this paper, we provide a rigorous and comprehensive experimental comparison between transformers and linear recurrent models for MT. Concretely, we experiment with RetNet, Mamba, and hybrid versions of Mamba which incorporate attention mechanisms. Our findings demonstrate that Mamba is highly competitive with transformers on sentence and paragraph-level datasets, where in the latter both models benefit from shifting the training distribution towards longer sequences. Further analysis show that integrating attention into Mamba improves translation quality, robustness to sequence length extrapolation, and the ability to recall named entities.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2407.05489

Country:

Europe > Germany (0.28)
Asia > Middle East > UAE (0.14)
Asia > Philippines > Luzon > National Capital Region > City of Manila (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

xTower: A Multilingual LLM for Explaining and Correcting Translation Errors

Treviso, Marcos, Guerreiro, Nuno M., Agrawal, Sweta, Rei, Ricardo, Pombal, José, Vaz, Tania, Wu, Helena, Silva, Beatriz, van Stigt, Daan, Martins, André F. T.

arXiv.org Artificial IntelligenceJun-27-2024

While machine translation (MT) systems are achieving increasingly strong performance on benchmarks, they often produce translations with errors and anomalies. Understanding these errors can potentially help improve the translation quality and user experience. This paper introduces xTower, an open large language model (LLM) built on top of TowerBase designed to provide free-text explanations for translation errors in order to guide the generation of a corrected translation. The quality of the generated explanations by xTower are assessed via both intrinsic and extrinsic evaluation. We ask expert translators to evaluate the quality of the explanations across two dimensions: relatedness towards the error span being explained and helpfulness in error understanding and improving translation quality. Extrinsically, we test xTower across various experimental setups in generating translation corrections, demonstrating significant improvements in translation quality. Our findings highlight xTower's potential towards not only producing plausible and helpful explanations of automatic translations, but also leveraging them to suggest corrected translations.

large language model, machine learning, translation, (21 more...)

arXiv.org Artificial Intelligence

2406.19482

Country:

North America > United States (0.67)
Asia > Middle East > UAE (0.14)
Europe > Portugal > Lisbon > Lisbon (0.14)
North America > Mexico > Mexico City (0.14)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Scaling up COMETKIWI: Unbabel-IST 2023 Submission for the Quality Estimation Shared Task

Rei, Ricardo, Guerreiro, Nuno M., Pombal, José, van Stigt, Daan, Treviso, Marcos, Coheur, Luisa, de Souza, José G. C., Martins, André F. T.

arXiv.org Artificial IntelligenceSep-21-2023

We present the joint contribution of Unbabel and Instituto Superior T\'ecnico to the WMT 2023 Shared Task on Quality Estimation (QE). Our team participated on all tasks: sentence- and word-level quality prediction (task 1) and fine-grained error span detection (task 2). For all tasks, we build on the COMETKIWI-22 model (Rei et al., 2022b). Our multilingual approaches are ranked first for all tasks, reaching state-of-the-art performance for quality estimation at word-, span- and sentence-level granularity. Compared to the previous state-of-the-art COMETKIWI-22, we show large improvements in correlation with human judgements (up to 10 Spearman points). Moreover, we surpass the second-best multilingual submission to the shared-task with up to 3.8 absolute points.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2309.11925

Country: Europe > Portugal (0.15)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)

Add feedback

CREST: A Joint Framework for Rationalization and Counterfactual Text Generation

Treviso, Marcos, Ross, Alexis, Guerreiro, Nuno M., Martins, André F. T.

arXiv.org Artificial IntelligenceMay-26-2023

Selective rationales and counterfactual examples have emerged as two effective, complementary classes of interpretability methods for analyzing and training NLP models. However, prior work has not explored how these methods can be integrated to combine their complementary advantages. We overcome this limitation by introducing CREST (ContRastive Edits with Sparse raTionalization), a joint framework for selective rationalization and counterfactual text generation, and show that this framework leads to improvements in counterfactual quality, model robustness, and interpretability. First, CREST generates valid counterfactuals that are more natural than those produced by previous methods, and subsequently can be used for data augmentation at scale, reducing the need for human-generated examples. Second, we introduce a new loss function that leverages CREST counterfactuals to regularize selective rationales and show that this regularization improves both model robustness and rationale quality, compared to methods that do not leverage CREST counterfactuals. Our results demonstrate that CREST successfully bridges the gap between selective rationales and counterfactual examples, addressing the limitations of existing methods and providing a more comprehensive view of a model's predictions.

artificial intelligence, counterfactual, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.17075

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

The Inside Story: Towards Better Understanding of Machine Translation Neural Evaluation Metrics

Rei, Ricardo, Guerreiro, Nuno M., Treviso, Marcos, Coheur, Luisa, Lavie, Alon, Martins, André F. T.

arXiv.org Artificial IntelligenceMay-19-2023

Neural metrics for machine translation evaluation, such as COMET, exhibit significant improvements in their correlation with human judgments, as compared to traditional metrics based on lexical overlap, such as BLEU. Yet, neural metrics are, to a great extent, "black boxes" returning a single sentence-level score without transparency about the decision-making process. In this work, we develop and compare several neural explainability methods and demonstrate their effectiveness for interpreting state-of-the-art fine-tuned neural metrics. Our study reveals that these metrics leverage token-level information that can be directly attributed to translation errors, as assessed through comparison of token-level neural saliency maps with Multidimensional Quality Metrics (MQM) annotations and with synthetically-generated critical translation errors. To ease future research, we release our code at: https://github.com/Unbabel/COMET/tree/explainable-metrics.

artificial intelligence, explanation, natural language, (15 more...)

arXiv.org Artificial Intelligence

2305.11806

Country:

Europe (1.00)
Asia (0.69)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Efficient Methods for Natural Language Processing: A Survey

Treviso, Marcos, Lee, Ji-Ung, Ji, Tianchu, van Aken, Betty, Cao, Qingqing, Ciosici, Manuel R., Hassid, Michael, Heafield, Kenneth, Hooker, Sara, Raffel, Colin, Martins, Pedro H., Martins, André F. T., Forde, Jessica Zosa, Milder, Peter, Simpson, Edwin, Slonim, Noam, Dodge, Jesse, Strubell, Emma, Balasubramanian, Niranjan, Derczynski, Leon, Gurevych, Iryna, Schwartz, Roy

arXiv.org Artificial IntelligenceMar-24-2023

Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; however, using only scale to improve performance means that resource consumption also grows. Such resources include data, time, storage, or energy, all of which are naturally limited and unevenly distributed. This motivates research into efficient methods that require fewer resources to achieve similar results. This survey synthesizes and relates current methods and findings in efficient NLP. We aim to provide both guidance for conducting NLP under limited resources, and point towards promising research directions for developing more efficient methods.

computational linguistic, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2209.00099

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)
Asia > Middle East > Israel (0.28)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Energy (1.00)
Education > Educational Setting (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

Learning to Scaffold: Optimizing Model Explanations for Teaching

Fernandes, Patrick, Treviso, Marcos, Pruthi, Danish, Martins, André F. T., Neubig, Graham

arXiv.org Artificial IntelligenceNov-29-2022

Modern machine learning models are opaque, and as a result there is a burgeoning academic subfield on methods that explain these models' behavior. However, what is the precise goal of providing such explanations, and how can we demonstrate that explanations achieve this goal? Some research argues that explanations should help teach a student (either human or machine) to simulate the model being explained, and that the quality of explanations can be measured by the simulation accuracy of students on unexplained examples. In this work, leveraging meta-learning techniques, we extend this idea to improve the quality of the explanations themselves, specifically by optimizing explanations such that student models more effectively learn to simulate the original model. We train models on three natural language processing and computer vision tasks, and find that students trained with explanations extracted with our framework are able to simulate the teacher significantly more effectively than ones produced with previous methods. Through human annotations and a user study, we further find that these learned explanations more closely align with how humans would explain the required decisions in these tasks. Our code is available at https://github.com/coderpat/learning-scaffold

explanation, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2204.1081

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.87)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Sparse Continuous Distributions and Fenchel-Young Losses

Martins, André F. T., Treviso, Marcos, Farinhas, António, Aguiar, Pedro M. Q., Figueiredo, Mário A. T., Blondel, Mathieu, Niculae, Vlad

arXiv.org Artificial IntelligenceAug-4-2021

Exponential families are widely used in machine learning; they include many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, there has been recent works on sparse alternatives to softmax (e.g. sparsemax, $\alpha$-entmax, and fusedmax) and corresponding losses, which have varying support. This paper expands that line of work in several directions: first, it extends $\Omega$-regularized prediction maps and Fenchel-Young losses to arbitrary domains (possibly countably infinite or continuous). For linearly parametrized families, we show that minimization of Fenchel-Young losses is equivalent to moment matching of the statistics, generalizing a fundamental property of exponential families. When $\Omega$ is a Tsallis negentropy with parameter $\alpha$, we obtain "deformed exponential families," which include $\alpha$-entmax and sparsemax ($\alpha$ = 2) as particular cases. For quadratic energy functions in continuous domains, the resulting densities are $\beta$-Gaussians, an instance of elliptical distributions that contain as particular cases the Gaussian, biweight, triweight and Epanechnikov densities, and for which we derive closed-form expressions for the variance, Tsallis entropy, and Fenchel-Young loss. When $\Omega$ is a total variation or Sobolev regularizer, we obtain a continuous version of the fusedmax. Finally, we introduce continuous-domain attention mechanisms, deriving efficient gradient backpropagation algorithms for $\alpha \in \{1, 4/3, 3/2, 2\}$. Using them, we demonstrate our sparse continuous distributions for attention-based audio classification and visual question answering, showing that they allow attending to time intervals and compact regions.

bayesian inference, neural network, sparse continuous distribution, (18 more...)

arXiv.org Artificial Intelligence

2108.01988

Country:

Europe > Portugal (0.14)
Asia > Middle East (0.14)
Europe > Netherlands (0.14)
Europe > France (0.14)

Genre: Research Report > New Finding (0.45)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback