AITopics | Dwivedi-Yu, Jane

Collaborating Authors

Dwivedi-Yu, Jane

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Active Retrieval Augmented Generation

Jiang, Zhengbao, Xu, Frank F., Gao, Luyu, Sun, Zhiqing, Liu, Qian, Dwivedi-Yu, Jane, Yang, Yiming, Callan, Jamie, Neubig, Graham

arXiv.org Artificial IntelligenceOct-21-2023

Despite the remarkable ability of large language models (LMs) to comprehend and generate language, they have a tendency to hallucinate and create factually inaccurate output. Augmenting LMs by retrieving information from external knowledge resources is one promising solution. Most existing retrieval augmented LMs employ a retrieve-and-generate setup that only retrieves information once based on the input. This is limiting, however, in more general scenarios involving generation of long texts, where continually gathering information throughout generation is essential. In this work, we provide a generalized view of active retrieval augmented generation, methods that actively decide when and what to retrieve across the course of the generation. We propose Forward-Looking Active REtrieval augmented generation (FLARE), a generic method which iteratively uses a prediction of the upcoming sentence to anticipate future content, which is then utilized as a query to retrieve relevant documents to regenerate the sentence if it contains low-confidence tokens. We test FLARE along with baselines comprehensively over 4 long-form knowledge-intensive generation tasks/datasets. FLARE achieves superior or competitive performance on all tasks, demonstrating the effectiveness of our method. Code and datasets are available at https://github.com/jzbjyb/FLARE.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2305.06983

Country:

North America > United States > Pennsylvania (0.14)
North America > United States > California (0.14)
North America > United States > Alaska (0.14)
(3 more...)

Genre: Research Report (0.70)

Industry:

Government (1.00)
Media > Film (0.71)
Leisure & Entertainment > Games > Computer Games (0.70)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evaluation of Faithfulness Using the Longest Supported Subsequence

Mittal, Anirudh, Schick, Timo, Artetxe, Mikel, Dwivedi-Yu, Jane

arXiv.org Artificial IntelligenceAug-23-2023

As increasingly sophisticated language models emerge, their trustworthiness becomes a pivotal issue, especially in tasks such as summarization and question-answering. Ensuring their responses are contextually grounded and faithful is challenging due to the linguistic diversity and the myriad of possible answers. In this paper, we introduce a novel approach to evaluate faithfulness of machine-generated text by computing the longest noncontinuous substring of the claim that is supported by the context, which we refer to as the Longest Supported Subsequence (LSS). Using a new human-annotated dataset, we finetune a model to generate LSS. We introduce a new method of evaluation and demonstrate that these metrics correlate better with human ratings when LSS is employed, as opposed to when it is not. Our proposed metric demonstrates an 18% enhancement over the prevailing state-of-the-art metric for faithfulness on our dataset. Our metric consistently outperforms other metrics on a summarization dataset across six different models. Finally, we compare several popular Large Language Models (LLMs) for faithfulness using this metric. We release the human-annotated dataset built for predicting LSS and our fine-tuned model for evaluating faithfulness.

computational linguistic, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2308.12157

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry:

Leisure & Entertainment (0.93)
Media > Film (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Shepherd: A Critic for Language Model Generation

Wang, Tianlu, Yu, Ping, Tan, Xiaoqing Ellen, O'Brien, Sean, Pasunuru, Ramakanth, Dwivedi-Yu, Jane, Golovneva, Olga, Zettlemoyer, Luke, Fazel-Zarandi, Maryam, Celikyilmaz, Asli

arXiv.org Artificial IntelligenceAug-8-2023

As large language models improve, there is increasing interest in techniques that leverage these models' capabilities to refine their own outputs. In this work, we introduce Shepherd, a language model specifically tuned to critique responses and suggest refinements, extending beyond the capabilities of an untuned model to identify diverse errors and provide suggestions to remedy them. At the core of our approach is a high quality feedback dataset, which we curate from community feedback and human annotations. Even though Shepherd is small (7B parameters), its critiques are either equivalent or preferred to those from established models including ChatGPT. Using GPT-4 for evaluation, Shepherd reaches an average win-rate of 53-87% compared to competitive alternatives. In human evaluation, Shepherd strictly outperforms other models and on average closely ties with ChatGPT.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2308.04592

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine (1.00)
Banking & Finance > Trading (0.47)
Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

NormBank: A Knowledge Bank of Situational Social Norms

Ziems, Caleb, Dwivedi-Yu, Jane, Wang, Yi-Chia, Halevy, Alon, Yang, Diyi

arXiv.org Artificial IntelligenceJul-24-2023

We present NormBank, a knowledge bank of 155k situational norms. This resource is designed to ground flexible normative reasoning for interactive, assistive, and collaborative AI systems. Unlike prior commonsense resources, NormBank grounds each inference within a multivalent sociocultural frame, which includes the setting (e.g., restaurant), the agents' contingent roles (waiter, customer), their attributes (age, gender), and other physical, social, and cultural constraints (e.g., the temperature or the country of operation). In total, NormBank contains 63k unique constraints from a taxonomy that we introduce and iteratively refine here. Constraints then apply in different combinations to frame social norms. Under these manipulations, norms are non-monotonic - one can cancel an inference by updating its frame even slightly. Still, we find evidence that neural models can help reliably extend the scope and coverage of NormBank. We further demonstrate the utility of this resource with a series of transfer experiments.

constraint, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.17008

Country:

Europe (1.00)
Asia (0.93)
North America > United States > California (0.46)
North America > United States > Minnesota (0.28)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Leisure & Entertainment (1.00)
Health & Medicine (0.93)
Consumer Products & Services > Restaurants (0.48)
Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.93)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)

Add feedback

TimelineQA: A Benchmark for Question Answering over Timelines

Tan, Wang-Chiew, Dwivedi-Yu, Jane, Li, Yuliang, Mathias, Lambert, Saeidi, Marzieh, Yan, Jing Nathan, Halevy, Alon Y.

arXiv.org Artificial IntelligenceJun-1-2023

Lifelogs are descriptions of experiences that a person had during their life. Lifelogs are created by fusing data from the multitude of digital services, such as online photos, maps, shopping and content streaming services. Question answering over lifelogs can offer personal assistants a critical resource when they try to provide advice in context. However, obtaining answers to questions over lifelogs is beyond the current state of the art of question answering techniques for a variety of reasons, the most pronounced of which is that lifelogs combine free text with some degree of structure such as temporal and geographical information. We create and publicly release TimelineQA1, a benchmark for accelerating progress on querying lifelogs. TimelineQA generates lifelogs of imaginary people. The episodes in the lifelog range from major life episodes such as high school graduation to those that occur on a daily basis such as going for a run. We describe a set of experiments on TimelineQA with several state-of-the-art QA models. Our experiments reveal that for atomic queries, an extractive QA system significantly out-performs a state-of-the-art retrieval-augmented QA system. For multi-hop queries involving aggregates, we show that the best result is obtained with a state-of-the-art table QA technique, assuming the ground truth set of episodes for deriving the answer is available.

lifelog, natural language, question answering, (18 more...)

arXiv.org Artificial Intelligence

2306.01069

Country:

Asia > Japan > Honshū (0.14)
North America > United States > Louisiana (0.14)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment (1.00)
Consumer Products & Services (1.00)
Media > Television (0.66)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)

Add feedback

Learnings from Data Integration for Augmented Language Models

Halevy, Alon, Dwivedi-Yu, Jane

arXiv.org Artificial IntelligenceApr-10-2023

One of the limitations of large language models is that they do not have access to up-to-date, proprietary or personal data. As a result, there are multiple efforts to extend language models with techniques for accessing external data. In that sense, LLMs share the vision of data integration systems whose goal is to provide seamless access to a large collection of heterogeneous data sources. While the details and the techniques of LLMs differ greatly from those of data integration, this paper shows that some of the lessons learned from research on data integration can elucidate the research path we are conducting today on language models.

artificial intelligence, information fusion, natural language, (12 more...)

arXiv.org Artificial Intelligence

2304.04576

Country:

Europe (0.29)
North America > United States (0.28)

Genre: Research Report (0.42)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.80)

Add feedback

Augmented Language Models: a Survey

Mialon, Grégoire, Dessì, Roberto, Lomeli, Maria, Nalmpantis, Christoforos, Pasunuru, Ram, Raileanu, Roberta, Rozière, Baptiste, Schick, Timo, Dwivedi-Yu, Jane, Celikyilmaz, Asli, Grave, Edouard, LeCun, Yann, Scialom, Thomas

arXiv.org Artificial IntelligenceFeb-15-2023

This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools. The former is defined as decomposing a potentially complex task into simpler subtasks while the latter consists in calling external modules such as a code interpreter. LMs can leverage these augmentations separately or in combination via heuristics, or learn to do so from demonstrations. While adhering to a standard missing tokens prediction objective, such augmented LMs can use various, possibly non-parametric external modules to expand their context processing ability, thus departing from the pure language modeling paradigm. We therefore refer to them as Augmented Language Models (ALMs). The missing token objective allows ALMs to learn to reason, use tools, and even act, while still performing standard natural language tasks and even outperforming most regular LMs on several benchmarks. In this work, after reviewing current advance in ALMs, we conclude that this new research direction has the potential to address common limitations of traditional LMs such as interpretability, consistency, and scalability issues.

arxiv preprint arxiv, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2302.07842

Country: North America > United States (1.00)

Genre: Overview (1.00)

Industry:

Education (1.00)
Leisure & Entertainment > Games (0.67)
Information Technology > Services (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(4 more...)

Add feedback

Toolformer: Language Models Can Teach Themselves to Use Tools

Schick, Timo, Dwivedi-Yu, Jane, Dessì, Roberto, Raileanu, Roberta, Lomeli, Maria, Zettlemoyer, Luke, Cancedda, Nicola, Scialom, Thomas

arXiv.org Artificial IntelligenceFeb-9-2023

Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup, where much simpler and smaller models excel. In this paper, we show that LMs can teach themselves to use external tools via simple APIs and achieve the best of both worlds. We introduce Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. This is done in a self-supervised way, requiring nothing more than a handful of demonstrations for each API. We incorporate a range of tools, including a calculator, a Q\&A system, two different search engines, a translation system, and a calendar. Toolformer achieves substantially improved zero-shot performance across a variety of downstream tasks, often competitive with much larger models, without sacrificing its core language modeling abilities.

artificial intelligence, machine translation, natural language, (16 more...)

arXiv.org Artificial Intelligence

2302.04761

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment (0.93)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback

Atlas: Few-shot Learning with Retrieval Augmented Language Models

Izacard, Gautier, Lewis, Patrick, Lomeli, Maria, Hosseini, Lucas, Petroni, Fabio, Schick, Timo, Dwivedi-Yu, Jane, Joulin, Armand, Riedel, Sebastian, Grave, Edouard

arXiv.org Artificial IntelligenceNov-16-2022

Large language models have shown impressive few-shot results on a wide range of tasks. However, when knowledge is key for such results, as is the case for tasks such as question answering and fact checking, massive parameter counts to store knowledge seem to be needed. Retrieval augmented models are known to excel at knowledge intensive tasks without the need for as many parameters, but it is unclear whether they work in few-shot settings. In this work we present Atlas, a carefully designed and pre-trained retrieval augmented language model able to learn knowledge intensive tasks with very few training examples. We perform evaluations on a wide range of tasks, including MMLU, KILT and NaturalQuestions, and study the impact of the content of the document index, showing that it can easily be updated. Notably, Atlas reaches over 42% accuracy on Natural Questions using only 64 examples, outperforming a 540B parameters model by 3% despite having 50x fewer parameters.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2208.03299

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report (0.83)

Industry:

Education > Curriculum > Subject-Specific Education (1.00)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback