AITopics | Campos, Jon Ander

Collaborating Authors

Campos, Jon Ander

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions

Rakotonirina, Nathanaël Carraz, Hamdy, Mohammed, Campos, Jon Ander, Weber, Lucas, Testoni, Alberto, Fadaee, Marzieh, Pezzelle, Sandro, Del Tredici, Marco

arXiv.org Artificial IntelligenceFeb-19-2025

Large Language Models (LLMs) are increasingly used in working environments for a wide range of tasks, excelling at solving individual problems in isolation. However, are they also able to effectively collaborate over long-term interactions? To investigate this, we introduce MemoryCode, a synthetic multi-session dataset designed to test LLMs' ability to track and execute simple coding instructions amid irrelevant information, simulating a realistic setting. While all the models we tested handle isolated instructions well, even the performance of state-of-the-art models like GPT-4o deteriorates when instructions are spread across sessions. Our analysis suggests this is due to their failure to retrieve and integrate information over long instruction chains. Our results highlight a fundamental limitation of current LLMs, restricting their ability to collaborate effectively in long interactions.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.13791

Country:

North America > United States (0.28)
North America > Mexico > Mexico City (0.14)

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.48)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LLMs can implicitly learn from mistakes in-context

Alazraki, Lisa, Mozes, Maximilian, Campos, Jon Ander, Tan, Yi Chern, Rei, Marek, Bartolo, Max

arXiv.org Artificial IntelligenceFeb-12-2025

Learning from mistakes is a fundamental feature of human intelligence. Previous work has shown that Large Language Models (LLMs) can also learn from incorrect answers when provided with a comprehensive rationale detailing why an answer is wrong or how to correct it. In this work, we examine whether LLMs can learn from mistakes in mathematical reasoning tasks when these explanations are not provided. We investigate if LLMs are able to implicitly infer such rationales simply from observing both incorrect and correct answers. Surprisingly, we find that LLMs perform better, on average, when rationales are eliminated from the context and incorrect answers are simply shown alongside correct ones. This approach also substantially outperforms chain-of-thought prompting in our evaluations. We show that these results are consistent across LLMs of different sizes and varying reasoning abilities. Further, we carry out an in-depth analysis, and show that prompting with both wrong and correct answers leads to greater performance and better generalisation than introducing additional, more diverse question-answer pairs into the context. Finally, we show that new rationales generated by models that have only observed incorrect and correct answers are scored equally as highly by humans as those produced with the aid of exemplar rationales. Our results demonstrate that LLMs are indeed capable of in-context implicit learning.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.0855

Country:

Asia (0.93)
Europe > Middle East > Malta (0.14)
North America > Mexico > Mexico City (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (0.47)
Education (0.45)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier

Dang, John, Singh, Shivalika, D'souza, Daniel, Ahmadian, Arash, Salamanca, Alejandro, Smith, Madeline, Peppin, Aidan, Hong, Sungjin, Govindassamy, Manoj, Zhao, Terrence, Kublik, Sandra, Amer, Meor, Aryabumi, Viraat, Campos, Jon Ander, Tan, Yi-Chern, Kocmi, Tom, Strub, Florian, Grinsztajn, Nathan, Flet-Berliac, Yannis, Locatelli, Acyr, Lin, Hangyu, Talupuru, Dwarak, Venkitesh, Bharat, Cairuz, David, Yang, Bowen, Chung, Tim, Ko, Wei-Yin, Shi, Sylvie Shang, Shukayev, Amir, Bae, Sammie, Piktus, Aleksandra, Castagné, Roman, Cruz-Salinas, Felipe, Kim, Eddie, Crawhall-Stein, Lucas, Morisot, Adrien, Roy, Sudip, Blunsom, Phil, Zhang, Ivan, Gomez, Aidan, Frosst, Nick, Fadaee, Marzieh, Ermis, Beyza, Üstün, Ahmet, Hooker, Sara

arXiv.org Artificial IntelligenceDec-5-2024

We introduce the Aya Expanse model family, a new generation of 8B and 32B parameter multilingual language models, aiming to address the critical challenge of developing highly performant multilingual models that match or surpass the capabilities of monolingual models. By leveraging several years of research at Cohere For AI and Cohere, including advancements in data arbitrage, multilingual preference training, and model merging, Aya Expanse sets a new state-of-the-art in multilingual performance. Our evaluations on the Arena-Hard-Auto dataset, translated into 23 languages, demonstrate that Aya Expanse 8B and 32B outperform leading open-weight models in their respective parameter classes, including Gemma 2, Qwen 2.5, and Llama 3.1, achieving up to a 76.6% win-rate. Notably, Aya Expanse 32B outperforms Llama 3.1 70B, a model with twice as many parameters, achieving a 54.0% win-rate. In this short technical report, we present extended evaluation results for the Aya Expanse model family and release their open-weights, together with a new multilingual evaluation dataset m-ArenaHard.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.04261

Country:

Asia (0.46)
North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improving Reward Models with Synthetic Critiques

Ye, Zihuiwen, Greenlee-Scott, Fraser, Bartolo, Max, Blunsom, Phil, Campos, Jon Ander, Gallé, Matthias

arXiv.org Artificial IntelligenceMay-31-2024

Reward models (RM) play a critical role in aligning language models through the process of reinforcement learning from human feedback. RMs are trained to predict a score reflecting human preference, which requires significant time and cost for human annotation. Additionally, RMs tend to quickly overfit on superficial features in the training set, hindering their generalization performance on unseen distributions. We propose a novel approach using synthetic natural language critiques generated by large language models to provide additional feedback, evaluating aspects such as instruction following, correctness, and style. This offers richer signals and more robust features for RMs to assess and score on. We demonstrate that high-quality critiques improve the performance and data efficiency of RMs initialized from different pretrained models. Conversely, we also show that low-quality critiques negatively impact performance. Furthermore, incorporating critiques enhances the interpretability and robustness of RM training.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2405.2085

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)

Add feedback

Aya 23: Open Weight Releases to Further Multilingual Progress

Aryabumi, Viraat, Dang, John, Talupuru, Dwarak, Dash, Saurabh, Cairuz, David, Lin, Hangyu, Venkitesh, Bharat, Smith, Madeline, Campos, Jon Ander, Tan, Yi Chern, Marchisio, Kelly, Bartolo, Max, Ruder, Sebastian, Locatelli, Acyr, Kreutzer, Julia, Frosst, Nick, Gomez, Aidan, Blunsom, Phil, Fadaee, Marzieh, Üstün, Ahmet, Hooker, Sara

arXiv.org Artificial IntelligenceMay-31-2024

This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (\"Ust\"un et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modeling capabilities to approximately half of the world's population. The Aya model covered 101 languages whereas Aya 23 is an experiment in depth vs breadth, exploring the impact of allocating more capacity to fewer languages that are included during pre-training. Aya 23 outperforms both previous massively multilingual models like Aya 101 for the languages it covers, as well as widely used models like Gemma, Mistral and Mixtral on an extensive range of discriminative and generative tasks. We release the open weights for both the 8B and 35B models as part of our continued commitment for expanding access to multilingual progress.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2405.15032

Country:

North America > Canada (0.14)
Asia > Middle East (0.14)
North America > United States (0.14)
(2 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively

Labruna, Tiziano, Campos, Jon Ander, Azkune, Gorka

arXiv.org Artificial IntelligenceMay-6-2024

In this paper, we demonstrate how Large Language Models (LLMs) can effectively learn to use an off-the-shelf information retrieval (IR) system specifically when additional context is required to answer a given question. Given the performance of IR systems, the optimal strategy for question answering does not always entail external information retrieval; rather, it often involves leveraging the parametric memory of the LLM itself. Prior research has identified this phenomenon in the PopQA dataset, wherein the most popular questions are effectively addressed using the LLM's parametric memory, while less popular ones require IR system usage. Following this, we propose a tailored training approach for LLMs, leveraging existing open-domain question answering datasets. Here, LLMs are trained to generate a special token, , when they do not know the answer to a question. Our evaluation of the Adaptive Retrieval LLM (Adapt-LLM) on the PopQA dataset showcases improvements over the same LLM under three configurations: (i) retrieving information for all the questions, (ii) using always the parametric memory of the LLM, and (iii) using a popularity threshold to decide when to use a retriever. Through our analysis, we demonstrate that Adapt-LLM is able to generate the token when it determines that it does not know how to answer a question, indicating the need for IR, while it achieves notably high accuracy levels when it chooses to rely only on its parametric memory.

information retrieval, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2404.19705

Country: Europe (0.14)

Genre: Research Report > New Finding (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination for each Benchmark

Sainz, Oscar, Campos, Jon Ander, García-Ferrero, Iker, Etxaniz, Julen, de Lacalle, Oier Lopez, Agirre, Eneko

arXiv.org Artificial IntelligenceOct-27-2023

In this position paper, we argue that the classical evaluation on Natural Language Processing (NLP) tasks using annotated benchmarks is in trouble. The worst kind of data contamination happens when a Large Language Model (LLM) is trained on the test split of a benchmark, and then evaluated in the same benchmark. The extent of the problem is unknown, as it is not straightforward to measure. Contamination causes an overestimation of the performance of a contaminated model in a target benchmark and associated task with respect to their non-contaminated counterparts. The consequences can be very harmful, with wrong scientific conclusions being published while other correct ones are discarded. This position paper defines different levels of data contamination and argues for a community effort, including the development of automatic and semi-automatic measures to detect when data from a benchmark was exposed to a model, and suggestions for flagging papers with conclusions that are compromised by data contamination.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2310.18018

Country:

Europe (0.68)
North America (0.68)
Asia > Middle East (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Unsupervised Domain Adaption for Neural Information Retrieval

Dominguez, Carlos, Campos, Jon Ander, Agirre, Eneko, Azkune, Gorka

arXiv.org Artificial IntelligenceOct-13-2023

Neural information retrieval requires costly annotated data for each target domain to be competitive. Synthetic annotation by query generation using Large Language Models or rulebased string manipulation has been proposed as an alternative, but their relative merits have not been analysed. In this paper, we compare both methods head-to-head using the same neural IR architecture. We focus on the BEIR benchmark, which includes test datasets from several domains with no training data, and explore two scenarios: zero-shot, where the supervised system is trained in a large out-ofdomain dataset (MS-MARCO); and unsupervised Figure 1: Experimental design: (left) a supervised retriever domain adaptation, where, in addition to is trained with manual annotations from MS-MS-MARCO, the system is fine-tuned in synthetic MARCO; (middle) an unsupervised retriever is trained data from the target domain. Our results with automatically generated queries for MS-MARCO indicate that Large Language Models outperform documents; (right) an unsupervised domain adaptation rule-based methods in all scenarios by a retriever is trained with both MS-MARCO manual annotations large margin, and, more importantly, that unsupervised and automatically generated queries in-domain domain adaptation is effective compared BEIR dataset documents. Evaluation is performed in to applying a supervised IR system in a BEIR producing two scenarios: zero-shot (left and middle zero-shot fashion. In addition we explore several retrievers); unsupervised domain adaptation (right sizes of open Large Language Models to retriever).

information retrieval, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2310.0935

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

IXA/Cogcomp at SemEval-2023 Task 2: Context-enriched Multilingual Named Entity Recognition using Knowledge Bases

García-Ferrero, Iker, Campos, Jon Ander, Sainz, Oscar, Salaberria, Ander, Roth, Dan

arXiv.org Artificial IntelligenceApr-27-2023

Named Entity Recognition (NER) is a core natural language processing task in which pretrained language models have shown remarkable performance. However, standard benchmarks like CoNLL 2003 (Tjong Kim Sang and De Meulder, 2003) do not address many of the challenges that deployed NER systems face, such as having to classify emerging or complex entities in a fine-grained way. In this paper we present a novel NER cascade approach comprising three steps: first, identifying candidate entities in the input sentence; second, linking the each candidate to an existing knowledge base; third, predicting the fine-grained category for each entity candidate. We empirically demonstrate the significance of external knowledge bases in accurately classifying fine-grained and emerging entities. Our system exhibits robust performance in the MultiCoNER2 (Fetahu et al., 2023b) shared task, even in the low-resource language setting where we leverage knowledge bases of high-resource languages.

artificial intelligence, computational linguistic, natural language, (17 more...)

arXiv.org Artificial Intelligence

2304.10637

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Training Language Models with Language Feedback at Scale

Scheurer, Jérémy, Campos, Jon Ander, Korbak, Tomasz, Chan, Jun Shern, Chen, Angelica, Cho, Kyunghyun, Perez, Ethan

arXiv.org Artificial IntelligenceApr-9-2023

Pretrained language models often generate outputs that are not in line with human preferences, such as harmful text or factually incorrect summaries. Recent work approaches the above issues by learning from a simple form of human feedback: comparisons between pairs of model-generated outputs. However, comparison feedback only conveys limited information about human preferences. In this paper, we introduce Imitation learning from Language Feedback (ILF), a new approach that utilizes more informative language feedback. ILF consists of three steps that are applied iteratively: first, conditioning the language model on the input, an initial LM output, and feedback to generate refinements. Second, selecting the refinement incorporating the most feedback. Third, finetuning the language model to maximize the likelihood of the chosen refinement given the input. We show theoretically that ILF can be viewed as Bayesian Inference, similar to Reinforcement Learning from human feedback. We evaluate ILF's effectiveness on a carefully-controlled toy task and a realistic summarization task. Our experiments demonstrate that large language models accurately incorporate feedback and that finetuning with ILF scales well with the dataset size, even outperforming finetuning on human summaries. Learning from both language and comparison feedback outperforms learning from each alone, achieving human-level summarization performance.

artificial intelligence, natural language, refinement, (17 more...)

arXiv.org Artificial Intelligence

2303.16755

Country:

North America (0.28)
Europe (0.27)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.45)

Add feedback