AITopics | Collins, Michael

Collaborating Authors

Collins, Michael

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

InfAlign: Inference-aware language model alignment

Balashankar, Ananth, Sun, Ziteng, Berant, Jonathan, Eisenstein, Jacob, Collins, Michael, Hutter, Adrian, Lee, Jong, Nagpal, Chirag, Prost, Flavien, Sinha, Aradhana, Suresh, Ananda Theertha, Beirami, Ahmad

arXiv.org Artificial IntelligenceDec-30-2024

Language model alignment has become a critical step in training modern generative language models. The goal of alignment is to finetune a reference model such that the win rate of a sample from the aligned model over a sample from the reference model is high, subject to a KL divergence constraint. Today, we are increasingly using inference-time algorithms (e.g., Best-of-N, controlled decoding, tree search) to decode from language models rather than standard sampling. However, the alignment objective does not capture such inference-time decoding procedures. We show that the existing alignment framework is sub-optimal in view of such inference-time methods. We then modify the alignment objective and propose a framework for inference-aware alignment (IAPO). We prove that for any inference-time decoding algorithm, the optimal solution that optimizes the inference-time win rate of the aligned policy against the reference policy is the solution to the typical RLHF problem with a transformation of the reward. This motivates us to provide the KL-regularized calibrate-and-transform RL (CTRL) algorithm to solve this problem, which involves a reward calibration step and a KL-regularized reward maximization step with a transformation of the calibrated reward. We particularize our study to two important inference-time strategies: best-of-N sampling and best-of-N jailbreaking, where N responses are sampled from the model and the one with the highest or lowest reward is selected. We propose specific transformations for these strategies and demonstrate that our framework offers significant improvements over existing state-of-the-art methods for language model alignment. Empirically, we outperform baselines that are designed without taking inference-time decoding into consideration by 8-12% and 4-9% on inference-time win rates over the Anthropic helpfulness and harmlessness dialog benchmark datasets.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.19792

Country: North America > United States (0.93)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation

Bohnet, Bernd, Swersky, Kevin, Liu, Rosanne, Awasthi, Pranjal, Nova, Azade, Snaider, Javier, Sedghi, Hanie, Parisi, Aaron T, Collins, Michael, Lazaridou, Angeliki, Firat, Orhan, Fiedel, Noah

arXiv.org Artificial IntelligenceMay-31-2024

We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books. Previous efforts to construct such datasets relied on crowd-sourcing [1], but the emergence of transformers with a context size of 1 million or more tokens [2] now enables entirely automatic approaches. Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text, such as questions involving character arcs, broader themes, or the consequences of early actions later in the story. We propose a holistic pipeline for automatic data generation including question generation, answering, and model scoring using an "Evaluator". We find that a relative approach, comparing answers between models in a pairwise fashion and ranking with a Bradley-Terry model, provides a more consistent and differentiating scoring mechanism than an absolute scorer that rates answers individually. We also show that LLMs from different model families produce moderate agreement in their ratings. We ground our approach using the manually curated NarrativeQA dataset, where our evaluator shows excellent agreement with human judgement and even finds errors in the dataset. Using our automatic evaluation approach, we show that using an entire book as context produces superior reading comprehension performance compared to baseline no-context (parametric knowledge only) and retrieval-based approaches.

large language model, machine learning, question answering, (19 more...)

arXiv.org Artificial Intelligence

2406.00179

Country:

Europe (0.67)
Asia > Japan > Honshū (0.14)
North America > United States > Ohio (0.14)
North America > United States > Louisiana (0.14)

Genre:

Research Report > Experimental Study (0.67)
Research Report > New Finding (0.45)

Industry:

Health & Medicine (0.92)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains

Jacovi, Alon, Bitton, Yonatan, Bohnet, Bernd, Herzig, Jonathan, Honovich, Or, Tseng, Michael, Collins, Michael, Aharoni, Roee, Geva, Mor

arXiv.org Artificial IntelligenceFeb-2-2024

Prompting language models to provide step-by-step answers (e.g., "Chain-of-Thought") is the prominent approach for complex reasoning tasks, where more accurate reasoning chains typically improve downstream task performance. Recent literature discusses automatic methods to verify reasoning steps to evaluate and improve their correctness. However, no fine-grained step-level datasets are available to enable thorough evaluation of such verification methods, hindering progress in this direction. We introduce Reveal: Reasoning Verification Evaluation, a new dataset to benchmark automatic verifiers of complex Chain-of-Thought reasoning in open-domain question answering settings. Reveal includes comprehensive labels for the relevance, attribution to evidence passages, and logical correctness of each reasoning step in a language model's answer, across a wide variety of datasets and state-of-the-art language models.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2402.00559

Country:

Europe (0.67)
North America > United States > Georgia > Dougherty County > Albany (0.14)

Genre: Workflow (1.00)

Industry:

Health & Medicine (0.68)
Leisure & Entertainment > Sports > Soccer (0.67)
Leisure & Entertainment > Sports > Basketball (0.67)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Controlled Decoding from Language Models

Mudgal, Sidharth, Lee, Jong, Ganapathy, Harish, Li, YaGuang, Wang, Tao, Huang, Yanping, Chen, Zhifeng, Cheng, Heng-Tze, Collins, Michael, Strohman, Trevor, Chen, Jilin, Beutel, Alex, Beirami, Ahmad

arXiv.org Artificial IntelligenceOct-25-2023

We propose controlled decoding (CD), a novel off-policy reinforcement learning method to control the autoregressive generation from language models towards high reward outcomes. CD solves an off-policy reinforcement learning problem through a value function for the reward, which we call a prefix scorer. The prefix scorer is used at inference time to steer the generation towards higher reward outcomes. We show that the prefix scorer may be trained on (possibly) off-policy data to predict the expected reward when decoding is continued from a partially decoded response. We empirically demonstrate that CD is effective as a control mechanism on Reddit conversations corpus. We also show that the modularity of the design of CD makes it possible to control for multiple rewards, effectively solving a multi-objective reinforcement learning problem with no additional complexity. Finally, we show that CD can be applied in a novel blockwise fashion at inference-time, again without the need for any training-time changes, essentially bridging the gap between the popular best-of-$K$ strategy and token-level reinforcement learning. This makes CD a promising approach for alignment of language models.

controlled decoding, machine learning, reinforcement learning, (2 more...)

arXiv.org Artificial Intelligence

2310.17022

Genre: Research Report (0.69)

Industry: Education > Focused Education > Special Education (0.44)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models

Bohnet, Bernd, Tran, Vinh Q., Verga, Pat, Aharoni, Roee, Andor, Daniel, Soares, Livio Baldini, Ciaramita, Massimiliano, Eisenstein, Jacob, Ganchev, Kuzman, Herzig, Jonathan, Hui, Kai, Kwiatkowski, Tom, Ma, Ji, Ni, Jianmo, Saralegui, Lierni Sestorain, Schuster, Tal, Cohen, William W., Collins, Michael, Das, Dipanjan, Metzler, Donald, Petrov, Slav, Webster, Kellie

arXiv.org Artificial IntelligenceFeb-10-2023

Large language models (LLMs) have shown impressive results while requiring little or no direct supervision. Further, there is mounting evidence that LLMs may have potential in information-seeking scenarios. We believe the ability of an LLM to attribute the text that it generates is likely to be crucial in this setting. We formulate and study Attributed QA as a key first step in the development of attributed LLMs. We propose a reproducible evaluation framework for the task and benchmark a broad set of architectures. We take human annotations as a gold standard and show that a correlated automatic metric is suitable for development. Our experimental work gives concrete answers to two key questions (How to measure attribution?, and How well do current state-of-the-art methods perform on attribution?), and give some hints as to how to address a third (How to build LLMs with attribution?).

attribution, natural language, question answering, (19 more...)

arXiv.org Artificial Intelligence

2212.08037

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.93)

Industry: Leisure & Entertainment > Sports (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Learning to Reject with a Fixed Predictor: Application to Decontextualization

Mohri, Christopher, Andor, Daniel, Choi, Eunsol, Collins, Michael

arXiv.org Artificial IntelligenceJan-31-2023

Large language models, often trained with billions of parameters, have achieved impressive performance in recent years (Raffel et al., 2019) and are used in a wide variety of natural language generation tasks. However, their output is sometimes undesirable, with hallucinated content (Maynez et al., 2020; Filippova, 2020), and much work remains to fully understand their properties. In many applications, such as healthcare, question-answering systems, or customer service, incorrect predictions are particularly costly and must be avoided. This motivates the design of algorithms for large language models and other NLP tasks that achieve high precision on a large fraction of the input set, while abstaining on the rest. How can we devise such accurate models that allow a reject option?

machine learning, question answering, rejection loss, (17 more...)

arXiv.org Artificial Intelligence

2301.09044

Country:

North America > United States (0.46)
Europe > Spain (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.54)
(2 more...)

Add feedback

Coreference Resolution through a seq2seq Transition-Based System

Bohnet, Bernd, Alberti, Chris, Collins, Michael

arXiv.org Artificial IntelligenceNov-22-2022

Most recent coreference resolution systems use search algorithms over possible spans to identify mentions and resolve coreference. We instead present a coreference resolution system that uses a text-to-text (seq2seq) paradigm to predict mentions and links jointly. We implement the coreference system as a transition system and use multilingual T5 as an underlying language model. We obtain state-of-the-art accuracy on the CoNLL-2012 datasets with 83.3 F1-score for English (a 2.3 higher F1-score than previous work (Dobrovolskii, 2021)) using only CoNLL data for training, 68.5 F1-score for Arabic (+4.1 higher than previous work) and 74.3 F1-score for Chinese (+5.3). In addition we use the SemEval-2010 data sets for experiments in the zero-shot setting, a few-shot setting, and supervised setting using all available training data. We get substantially higher zero-shot F1-scores for 3 out of 4 languages than previous approaches and significantly exceed previous supervised state-of-the-art results for all five tested languages.

artificial intelligence, computational linguistic, natural language, (16 more...)

arXiv.org Artificial Intelligence

2211.12142

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Transportation (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.55)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.34)

Add feedback

Towards Computationally Verifiable Semantic Grounding for Language Models

Alberti, Chris, Ganchev, Kuzman, Collins, Michael, Gehrmann, Sebastian, Chelba, Ciprian

arXiv.org Artificial IntelligenceNov-16-2022

The paper presents an approach to semantic grounding of language models (LMs) that conceptualizes the LM as a conditional model generating text given a desired semantic message formalized as a set of entity-relationship triples. It embeds the LM in an auto-encoder by feeding its output to a semantic parser whose output is in the same representation domain as the input message. Compared to a baseline that generates text using greedy search, we demonstrate two techniques that improve the fluency and semantic accuracy of the generated text: The first technique samples multiple candidate text sequences from which the semantic parser chooses. The second trains the language model while keeping the semantic parser frozen to improve the semantic accuracy of the auto-encoder. We carry out experiments on the English WebNLG 3.0 data set, using BLEU to measure the fluency of generated text and standard parsing metrics to measure semantic accuracy. We show that our proposed approaches significantly improve on the greedy search baseline. Human evaluation corroborates the results of the automatic evaluation experiments.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2211.0907

Country:

Europe (0.46)
North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Decontextualization: Making Sentences Stand-Alone

Choi, Eunsol, Palomaki, Jennimaria, Lamm, Matthew, Kwiatkowski, Tom, Das, Dipanjan, Collins, Michael

arXiv.org Artificial IntelligenceFeb-9-2021

Models for question answering, dialogue agents, and summarization often interpret the meaning of a sentence in a rich context and use that meaning in a new context. Taking excerpts of text can be problematic, as key pieces may not be explicit in a local window. We isolate and define the problem of sentence decontextualization: taking a sentence together with its context and rewriting it to be interpretable out of context, while preserving its meaning. We describe an annotation procedure, collect data on the Wikipedia corpus, and use the data to train models to automatically decontextualize sentences. We present preliminary studies that show the value of sentence decontextualization in a user facing task, and as preprocessing for systems that perform document understanding. We argue that decontextualization is an important subtask in many downstream applications, and that the definitions and resources provided can benefit tasks that operate on sentences that occur in a richer context.

artificial intelligence, decontextualization, social media, (18 more...)

arXiv.org Artificial Intelligence

2102.05169

Country:

Europe (0.69)
North America > United States > Texas (0.14)

Genre: Research Report (0.82)

Industry:

Media > Film (0.68)
Leisure & Entertainment > Sports (0.47)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.55)

Add feedback

NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned

Min, Sewon, Boyd-Graber, Jordan, Alberti, Chris, Chen, Danqi, Choi, Eunsol, Collins, Michael, Guu, Kelvin, Hajishirzi, Hannaneh, Lee, Kenton, Palomaki, Jennimaria, Raffel, Colin, Roberts, Adam, Kwiatkowski, Tom, Lewis, Patrick, Wu, Yuxiang, Küttler, Heinrich, Liu, Linqing, Minervini, Pasquale, Stenetorp, Pontus, Riedel, Sebastian, Yang, Sohee, Seo, Minjoon, Izacard, Gautier, Petroni, Fabio, Hosseini, Lucas, De Cao, Nicola, Grave, Edouard, Yamada, Ikuya, Shimaoka, Sonse, Suzuki, Masatoshi, Miyawaki, Shumpei, Sato, Shun, Takahashi, Ryo, Suzuki, Jun, Fajcik, Martin, Docekal, Martin, Ondrej, Karel, Smrz, Pavel, Cheng, Hao, Shen, Yelong, Liu, Xiaodong, He, Pengcheng, Chen, Weizhu, Gao, Jianfeng, Oguz, Barlas, Chen, Xilun, Karpukhin, Vladimir, Peshterliev, Stan, Okhonko, Dmytro, Schlichtkrull, Michael, Gupta, Sonal, Mehdad, Yashar, Yih, Wen-tau

arXiv.org Artificial IntelligenceDec-31-2020

We review the EfficientQA competition from NeurIPS 2020. The competition focused on open-domain question answering (QA), where systems take natural language questions as input and return natural language answers. The aim of the competition was to build systems that can predict correct answers while also satisfying strict on-disk memory budgets. These memory budgets were designed to encourage contestants to explore the trade-off between storing large, redundant, retrieval corpora or the parameters of large learned models. In this report, we describe the motivation and organization of the competition, review the best submissions, and analyze system predictions to inform a discussion of evaluation for open-domain QA.

prediction, upstream oil & gas, us government, (23 more...)

arXiv.org Artificial Intelligence

2101.00133

Country:

Asia (1.00)
Europe (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Sports (1.00)
Media (0.68)
Leisure & Entertainment > Games (0.67)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)

Add feedback