AITopics | Van Durme, Benjamin

Plotting

Van Durme, Benjamin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

BenchCLAMP: A Benchmark for Evaluating Language Models on Syntactic and Semantic Parsing

Roy, Subhro, Thomson, Sam, Chen, Tongfei, Shin, Richard, Pauls, Adam, Eisner, Jason, Van Durme, Benjamin

arXiv.org Artificial IntelligenceJan-10-2024

Recent work has shown that generation from a prompted or fine-tuned language model can perform well at semantic parsing when the output is constrained to be a valid semantic representation. We introduce BenchCLAMP, a Benchmark to evaluate Constrained LAnguage Model Parsing, that includes context-free grammars for seven semantic parsing datasets and two syntactic parsing datasets with varied output representations, as well as a constrained decoding interface to generate only valid outputs covered by these grammars. We provide low, medium, and high resource splits for each dataset, allowing accurate comparison of various language models under different data regimes. Our benchmark supports evaluation of language models using prompt-based learning as well as fine-tuning. We benchmark eight language models, including two GPT-3 variants available only through an API. Our experiments show that encoder-decoder pretrained language models can achieve similar performance or surpass state-of-the-art methods for syntactic and semantic parsing when the model output is constrained to be valid.

computational linguistic, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2206.10668

Country:

Europe (1.00)
North America > United States > Texas (0.14)
North America > United States > New Mexico (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Do Androids Know They're Only Dreaming of Electric Sheep?

CH-Wang, Sky, Van Durme, Benjamin, Eisner, Jason, Kedzie, Chris

arXiv.org Artificial IntelligenceDec-28-2023

We design probes trained on the internal representations of a transformer language model that are predictive of its hallucinatory behavior on in-context generation tasks. To facilitate this detection, we create a span-annotated dataset of organic and synthetic hallucinations over several tasks. We find that probes trained on the force-decoded states of synthetic hallucinations are generally ecologically invalid in organic hallucination detection. Furthermore, hidden state information about hallucination appears to be task and distribution-dependent. Intrinsic and extrinsic hallucination saliency varies across layers, hidden state types, and tasks; notably, extrinsic hallucinations tend to be more salient in a transformer's internal representations. Outperforming multiple contemporary baselines, we show that probing is a feasible and efficient alternative to language model hallucination evaluation when model states are available.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2312.17249

Country:

Europe > Spain > Canary Islands (0.14)
Asia > Middle East (0.14)
North America > United States (0.14)
(5 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Sports (0.68)
Consumer Products & Services > Restaurants (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

NELLIE: A Neuro-Symbolic Inference Engine for Grounded, Compositional, and Explainable Reasoning

Weir, Nathaniel, Clark, Peter, Van Durme, Benjamin

arXiv.org Artificial IntelligenceDec-21-2023

Our goal is a modern approach to answering questions via systematic reasoning where answers are supported by human interpretable proof trees grounded in an NL corpus of authoritative facts. Such a system would help alleviate the challenges of interpretability and hallucination with modern LMs, and the lack of grounding of current explanation methods (e.g., Chain-of-Thought). This paper proposes a new take on Prolog-based inference engines, where we replace handcrafted rules with a combination of neural language modeling, guided generation, and semiparametric dense retrieval. Our implementation, NELLIE, is the first system to demonstrate fully interpretable, end-to-end grounded QA as entailment tree proof search, going beyond earlier work explaining known-to-be-true facts from text. In experiments, NELLIE outperforms a similar-sized state-of-the-art reasoner [Tafjord et al., 2022] while producing knowledge-grounded explanations. We also find NELLIE can exploit both semi-structured and NL text corpora to guide reasoning. Together these suggest a new way to jointly reap the benefits of both modern neural methods and traditional symbolic reasoning.

artificial intelligence, computational linguistic, natural language, (18 more...)

arXiv.org Artificial Intelligence

2209.07662

Country:

Europe (1.00)
North America > United States > Washington > King County > Seattle (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area (0.48)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Data Portraits: Recording Foundation Model Training Data

Marone, Marc, Van Durme, Benjamin

arXiv.org Artificial IntelligenceDec-14-2023

Foundation models are trained on increasingly immense and opaque datasets. Even while these models are now key in AI system building, it can be difficult to answer the straightforward question: has the model already encountered a given example during training? We therefore propose a widespread adoption of Data Portraits: artifacts that record training data and allow for downstream inspection. First we outline the properties of such an artifact and discuss how existing solutions can be used to increase transparency. We then propose and implement a solution based on data sketching, stressing fast and space efficient querying. Using our tools, we document a popular language modeling corpus (The Pile) and a recently released code modeling dataset (The Stack). We show that our solution enables answering questions about test set leakage and model plagiarism. Our tool is lightweight and fast, costing only 3% of the dataset size in overhead. We release a live interface of our tools at https://dataportraits.org/ and call on dataset and model creators to release Data Portraits as a complement to current documentation practices.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2303.03919

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

InteractiveIE: Towards Assessing the Strength of Human-AI Collaboration in Improving the Performance of Information Extraction

Mondal, Ishani, Yuan, Michelle, N, Anandhavelu, Garimella, Aparna, Ferraro, Francis, Blair-Stanek, Andrew, Van Durme, Benjamin, Boyd-Graber, Jordan

arXiv.org Artificial IntelligenceNov-17-2023

Learning template based information extraction from documents is a crucial yet difficult task. Prior template-based IE approaches assume foreknowledge of the domain templates; however, real-world IE do not have pre-defined schemas and it is a figure-out-as you go phenomena. To quickly bootstrap templates in a real-world setting, we need to induce template slots from documents with zero or minimal supervision. Since the purpose of question answering intersect with the goal of information extraction, we use automatic question generation to induce template slots from the documents and investigate how a tiny amount of a proxy human-supervision on-the-fly (termed as InteractiveIE) can further boost the performance. Extensive experiments on biomedical and legal documents, where obtaining training data is expensive, reveal encouraging trends of performance improvement using InteractiveIE over AI-only baseline.

large language model, machine learning, question answering, (25 more...)

arXiv.org Artificial Intelligence

2305.14659

Country:

Asia (1.00)
North America > United States > Maryland (0.46)
North America > United States > New York (0.28)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.70)
Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.87)
(4 more...)

Add feedback

Interpreting User Requests in the Context of Natural Language Standing Instructions

Moghe, Nikita, Xia, Patrick, Andreas, Jacob, Eisner, Jason, Van Durme, Benjamin, Jhamtani, Harsh

arXiv.org Artificial IntelligenceNov-16-2023

Users of natural language interfaces, generally powered by Large Language Models (LLMs),often must repeat their preferences each time they make a similar request. To alleviate this, we propose including some of a user's preferences and instructions in natural language -- collectively termed standing instructions -- as additional context for such interfaces. For example, when a user states I'm hungry, their previously expressed preference for Persian food will be automatically added to the LLM prompt, so as to influence the search for relevant restaurants. We develop NLSI, a language-to-program dataset consisting of over 2.4K dialogues spanning 17 domains, where each dialogue is paired with a user profile (a set of users specific standing instructions) and corresponding structured representations (API calls). A key challenge in NLSI is to identify which subset of the standing instructions is applicable to a given dialogue. NLSI contains diverse phenomena, from simple preferences to interdependent instructions such as triggering a hotel search whenever the user is booking tickets to an event. We conduct experiments on NLSI using prompting with large language models and various retrieval approaches, achieving a maximum of 44.7% exact match on API prediction. Our results demonstrate the challenges in identifying the relevant standing instructions and their interpretation into API calls.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2311.09796

Country: North America > United States > Washington > King County > Seattle (0.14)

Genre: Research Report > New Finding (0.86)

Industry:

Consumer Products & Services (1.00)
Media > Film (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

BLT: Can Large Language Models Handle Basic Legal Text?

Blair-Stanek, Andrew, Holzenberger, Nils, Van Durme, Benjamin

arXiv.org Artificial IntelligenceNov-16-2023

We find that the best publicly available LLMs like GPT-4 and PaLM 2 currently perform poorly at basic text handling required of lawyers or paralegals, such as looking up the text at a line of a witness deposition or at a subsection of a contract. We introduce a benchmark to quantify this poor performance, which casts into doubt LLMs' current reliability as-is for legal practice. Finetuning for these tasks brings an older LLM to near-perfect performance on our test set and also raises performance on a related legal task. This stark result highlights the need for more domain expertise in LLM training.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2311.09693

Country: North America > United States (0.93)

Genre: Research Report (0.40)

Industry: Law > Litigation (0.94)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Toucan: Token-Aware Character Level Language Modeling

Fleshman, William, Van Durme, Benjamin

arXiv.org Artificial IntelligenceNov-14-2023

Character-level language models obviate the need for separately trained tokenizers, but efficiency suffers from longer sequence lengths. Learning to combine character representations into tokens has made training these models more efficient, but they still require decoding characters individually. We propose Toucan, an augmentation to character-level models to make them "token-aware". Comparing our method to prior work, we demonstrate significant speed-ups in character generation without a loss in language modeling performance. We then explore differences between our learned dynamic tokenization of character sequences with popular fixed vocabulary solutions such as Byte-Pair Encoding and WordPiece, finding our approach leads to a greater amount of longer sequences tokenized as single items. Our project and code are available at https://nlp.jhu.edu/nuggets/.

artificial intelligence, chatbot, natural language, (17 more...)

arXiv.org Artificial Intelligence

2311.0862

Country:

Europe (0.69)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.62)

Add feedback

FAMuS: Frames Across Multiple Sources

Vashishtha, Siddharth, Martin, Alexander, Gantt, William, Van Durme, Benjamin, White, Aaron Steven

arXiv.org Artificial IntelligenceNov-9-2023

Understanding event descriptions is a central aspect of language processing, but current approaches focus overwhelmingly on single sentences or documents. Aggregating information about an event \emph{across documents} can offer a much richer understanding. To this end, we present FAMuS, a new corpus of Wikipedia passages that \emph{report} on some event, paired with underlying, genre-diverse (non-Wikipedia) \emph{source} articles for the same event. Events and (cross-sentence) arguments in both report and source are annotated against FrameNet, providing broad coverage of different event types. We present results on two key event understanding tasks enabled by FAMuS: \emph{source validation} -- determining whether a document is a valid source for a target report event -- and \emph{cross-document argument extraction} -- full-document argument extraction for a target event from both its report and the correct source article. We release both FAMuS and our models to support further research.

computational linguistic, large language model, machine learning, (25 more...)

arXiv.org Artificial Intelligence

2311.05601

Country:

Europe (1.00)
North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Virginia (0.14)
(6 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)
(3 more...)

Add feedback

Narrowing the Gap between Zero- and Few-shot Machine Translation by Matching Styles

Tan, Weiting, Xu, Haoran, Shen, Lingfeng, Li, Shuyue Stella, Murray, Kenton, Koehn, Philipp, Van Durme, Benjamin, Chen, Yunmo

arXiv.org Artificial IntelligenceNov-3-2023

Large language models trained primarily in a monolingual setting have demonstrated their ability to generalize to machine translation using zero- and few-shot examples with in-context learning. However, even though zero-shot translations are relatively good, there remains a discernible gap comparing their performance with the few-shot setting. In this paper, we investigate the factors contributing to this gap and find that this gap can largely be closed (for about 70%) by matching the writing styles of the target corpus. Additionally, we explore potential approaches to enhance zero-shot baselines without the need for parallel demonstration examples, providing valuable insights into how these methods contribute to improving translation metrics.

large language model, machine learning, translation, (19 more...)

arXiv.org Artificial Intelligence

2311.0231

Country: North America > United States > Pennsylvania (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Endocrinology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback