AITopics | Schuster, Tal

Collaborating Authors

Schuster, Tal

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LAIT: Efficient Multi-Segment Encoding in Transformers with Layer-Adjustable Interaction

Milbauer, Jeremiah, Louis, Annie, Hosseini, Mohammad Javad, Fabrikant, Alex, Metzler, Donald, Schuster, Tal

arXiv.org Artificial IntelligenceMay-31-2023

Transformer encoders contextualize token representations by attending to all other tokens at each layer, leading to quadratic increase in compute effort with the input length. In practice, however, the input text of many NLP tasks can be seen as a sequence of related segments (e.g., the sequence of sentences within a passage, or the hypothesis and premise in NLI). While attending across these segments is highly beneficial for many tasks, we hypothesize that this interaction can be delayed until later encoding stages. To this end, we introduce Layer-Adjustable Interactions in Transformers (LAIT). Within LAIT, segmented inputs are first encoded independently, and then jointly. This partial two-tower architecture bridges the gap between a Dual Encoder's ability to pre-compute representations for segments and a fully self-attentive Transformer's capacity to model cross-segment attention. The LAIT framework effectively leverages existing pretrained Transformers and converts them into the hybrid of the two aforementioned architectures, allowing for easy and intuitive control over the performance-efficiency tradeoff. Experimenting on a wide range of NLP tasks, we find LAIT able to reduce 30-50% of the attention FLOPs on many tasks, while preserving high accuracy; in some practical settings, LAIT could reduce actual latency by orders of magnitude.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.19585

Country:

Europe (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and Entailment Recognition

Chen, Sihao, Buthpitiya, Senaka, Fabrikant, Alex, Roth, Dan, Schuster, Tal

arXiv.org Artificial IntelligenceMay-24-2023

The widely studied task of Natural Language Inference (NLI) requires a system to recognize whether one piece of text is textually entailed by another, i.e. whether the entirety of its meaning can be inferred from the other. In current NLI datasets and models, textual entailment relations are typically defined on the sentence- or paragraph-level. However, even a simple sentence often contains multiple propositions, i.e. distinct units of meaning conveyed by the sentence. As these propositions can carry different truth values in the context of a given premise, we argue for the need to recognize the textual entailment relation of each proposition in a sentence individually. We propose PropSegmEnt, a corpus of over 45K propositions annotated by expert human raters. Our dataset structure resembles the tasks of (1) segmenting sentences within a document to the set of propositions, and (2) classifying the entailment relation of each proposition with respect to a different yet topically-aligned document, i.e. documents describing the same event or entity. We establish strong baselines for the segmentation and entailment tasks. Through case studies on summary hallucination detection and document-level NLI, we demonstrate that our conceptual framework is potentially useful for understanding and explaining the compositionality of NLI labels.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2212.1075

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Sports > Football (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Conformal Risk Control

Angelopoulos, Anastasios N., Bates, Stephen, Fisch, Adam, Lei, Lihua, Schuster, Tal

arXiv.org Artificial IntelligenceApr-29-2023

We extend conformal prediction to control the expected value of any monotone loss function. The algorithm generalizes split conformal prediction together with its coverage guarantee. Like conformal prediction, the conformal risk control procedure is tight up to an $\mathcal{O}(1/n)$ factor. We also introduce extensions of the idea to distribution shift, quantile risk control, multiple and adversarial risk control, and expectations of U-statistics. Worked examples from computer vision and natural language processing demonstrate the usage of our algorithm to bound the false negative rate, graph distance, and token-level F1-score.

machine learning, natural language, prediction, (15 more...)

arXiv.org Artificial Intelligence

2208.02814

Country:

North America > United States > Hawaii (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.64)

Industry:

Health & Medicine (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

UL2: Unifying Language Learning Paradigms

Tay, Yi, Dehghani, Mostafa, Tran, Vinh Q., Garcia, Xavier, Wei, Jason, Wang, Xuezhi, Chung, Hyung Won, Shakeri, Siamak, Bahri, Dara, Schuster, Tal, Zheng, Huaixiu Steven, Zhou, Denny, Houlsby, Neil, Metzler, Donald

arXiv.org Artificial IntelligenceFeb-28-2023

Existing pre-trained models are generally geared towards a particular class of problems. To date, there seems to be still no consensus on what the right architecture and pre-training setup should be. This paper presents a unified framework for pre-training models that are universally effective across datasets and setups. We begin by disentangling architectural archetypes with pre-training objectives -- two concepts that are commonly conflated. Next, we present a generalized & unified perspective for self-supervision in NLP and show how different pre-training objectives can be cast as one another and how interpolating between different objectives can be effective. We then propose Mixture-of-Denoisers (MoD), a pre-training objective that combines diverse pre-training paradigms together. We furthermore introduce a notion of mode switching, wherein downstream fine-tuning is associated with specific pre-training schemes. We conduct extensive ablative experiments to compare multiple pre-training objectives and find that our method pushes the Pareto-frontier by outperforming T5 & GPT-like models across multiple diverse setups. By scaling our model up to 20B parameters, we achieve SOTA performance on 50 well-established supervised finetuning based NLP tasks. Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization. On 0-shot MMLU, UL2 20B outperforms T0 and T5 models. UL2 20B also works well with chain-of-thought prompting and reasoning, making it an appealing choice for research into reasoning at a small to medium scale of 20B parameters. Finally, we apply FLAN instruction tuning to the UL2 20B model, achieving MMLU and Big-Bench scores competitive to FLAN-PaLM 62B. We release Flax-based T5X checkpoints for the UL2 20B & Flan-UL2 20B.

artificial intelligence, arxiv preprint arxiv, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2205.05131

Country:

Europe (0.45)
Asia (0.45)
North America > United States > Oregon (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Education > Curriculum > Subject-Specific Education (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models

Bohnet, Bernd, Tran, Vinh Q., Verga, Pat, Aharoni, Roee, Andor, Daniel, Soares, Livio Baldini, Ciaramita, Massimiliano, Eisenstein, Jacob, Ganchev, Kuzman, Herzig, Jonathan, Hui, Kai, Kwiatkowski, Tom, Ma, Ji, Ni, Jianmo, Saralegui, Lierni Sestorain, Schuster, Tal, Cohen, William W., Collins, Michael, Das, Dipanjan, Metzler, Donald, Petrov, Slav, Webster, Kellie

arXiv.org Artificial IntelligenceFeb-10-2023

Large language models (LLMs) have shown impressive results while requiring little or no direct supervision. Further, there is mounting evidence that LLMs may have potential in information-seeking scenarios. We believe the ability of an LLM to attribute the text that it generates is likely to be crucial in this setting. We formulate and study Attributed QA as a key first step in the development of attributed LLMs. We propose a reproducible evaluation framework for the task and benchmark a broad set of architectures. We take human annotations as a gold standard and show that a correlated automatic metric is suitable for development. Our experimental work gives concrete answers to two key questions (How to measure attribution?, and How well do current state-of-the-art methods perform on attribution?), and give some hints as to how to address a third (How to build LLMs with attribution?).

attribution, natural language, question answering, (19 more...)

arXiv.org Artificial Intelligence

2212.08037

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.93)

Industry: Leisure & Entertainment > Sports (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Stretching Sentence-pair NLI Models to Reason over Long Documents and Clusters

Schuster, Tal, Chen, Sihao, Buthpitiya, Senaka, Fabrikant, Alex, Metzler, Donald

arXiv.org Artificial IntelligenceNov-1-2022

Natural Language Inference (NLI) has been extensively studied by the NLP community as a framework for estimating the semantic relation between sentence pairs. While early work identified certain biases in NLI models, recent advancements in modeling and datasets demonstrated promising performance. In this work, we further explore the direct zero-shot applicability of NLI models to real applications, beyond the sentence-pair setting they were trained on. First, we analyze the robustness of these models to longer and out-of-domain inputs. Then, we develop new aggregation methods to allow operating over full documents, reaching state-of-the-art performance on the ContractNLI dataset. Interestingly, we find NLI scores to provide strong retrieval signals, leading to more relevant evidence extractions compared to common similarity-based methods. Finally, we go further and investigate whole document clusters to identify both discrepancies and consensus among sources. In a test case, we find real inconsistencies between Wikipedia pages in different languages about the same topic.

computational linguistic, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2204.07447

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.67)

Industry:

Leisure & Entertainment (0.93)
Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.48)

Add feedback

Transformer Memory as a Differentiable Search Index

Tay, Yi, Tran, Vinh Q., Dehghani, Mostafa, Ni, Jianmo, Bahri, Dara, Mehta, Harsh, Qin, Zhen, Hui, Kai, Zhao, Zhe, Gupta, Jai, Schuster, Tal, Cohen, William W., Metzler, Donald

arXiv.org Artificial IntelligenceFeb-16-2022

In this paper, we demonstrate that information retrieval can be accomplished with a single Transformer, in which all information about the corpus is encoded in the parameters of the model. To this end, we introduce the Differentiable Search Index (DSI), a new paradigm that learns a text-to-text model that maps string queries directly to relevant docids; in other words, a DSI model answers queries directly using only its parameters, dramatically simplifying the whole retrieval process. We study variations in how documents and their identifiers are represented, variations in training procedures, and the interplay between models and corpus sizes. Experiments demonstrate that given appropriate design choices, DSI significantly outperforms strong baselines such as dual encoder models. Moreover, DSI demonstrates strong generalization capabilities, outperforming a BM25 baseline in a zero-shot setup.

artificial intelligence, differentiable search index, transformer memory

arXiv.org Artificial Intelligence

2202.06991

Genre: Research Report (0.40)

Technology:

Information Technology > Information Management > Search (0.60)
Information Technology > Artificial Intelligence > Natural Language (0.53)

Add feedback

Programming Puzzles

Schuster, Tal, Kalyan, Ashwin, Polozov, Oleksandr, Kalai, Adam Tauman

arXiv.org Artificial IntelligenceJun-10-2021

We introduce a new type of programming challenge called programming puzzles, as an objective and comprehensive evaluation of program synthesis, and release an open-source dataset of Python Programming Puzzles (P3). Each puzzle is defined by a short Python program $f$, and the goal is to find an input $x$ which makes $f$ output "True". The puzzles are objective in that each one is specified entirely by the source code of its verifier $f$, so evaluating $f(x)$ is all that is needed to test a candidate solution $x$. They do not require an answer key or input/output examples, nor do they depend on natural language understanding. The dataset is comprehensive in that it spans problems of a range of difficulties and domains, ranging from trivial string manipulation problems that are immediately obvious to human programmers (but not necessarily to AI), to classic programming puzzles (e.g., Towers of Hanoi), to interview/competitive-programming problems (e.g., dynamic programming), to longstanding open problems in algorithms and mathematics (e.g., factoring). The objective nature of P3 readily supports self-supervised bootstrapping. We develop baseline enumerative program synthesis and GPT-3 solvers that are capable of solving easy puzzles -- even without access to any reference solutions -- by learning from their own past solutions. Based on a small user study, we find puzzle difficulty to correlate between human programmers and the baseline AI solvers.

logic programming, puzzle, survey article, (21 more...)

arXiv.org Artificial Intelligence

2106.05784

Country:

Asia > Vietnam > Hanoi > Hanoi (0.24)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment > Games (1.00)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

Add feedback

Consistent Accelerated Inference via Confident Adaptive Transformers

Schuster, Tal, Fisch, Adam, Jaakkola, Tommi, Barzilay, Regina

arXiv.org Artificial IntelligenceApr-18-2021

We develop a novel approach for confidently accelerating inference in the large and expensive multilayer Transformers that are now ubiquitous in natural language processing (NLP). Amortized or approximate computational methods increase efficiency, but can come with unpredictable performance costs. In this work, we present CATs -- Confident Adaptive Transformers -- in which we simultaneously increase computational efficiency, while guaranteeing a specifiable degree of consistency with the original model with high confidence. Our method trains additional prediction heads on top of intermediate layers, and dynamically decides when to stop allocating computational effort to each input using a meta consistency classifier. To calibrate our early prediction stopping rule, we formulate a unique extension of conformal prediction. We demonstrate the effectiveness of this approach on four classification and regression tasks.

health & medicine, neural network, prediction, (19 more...)

arXiv.org Artificial Intelligence

2104.08803

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.54)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.93)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Few-shot Conformal Prediction with Auxiliary Tasks

Fisch, Adam, Schuster, Tal, Jaakkola, Tommi, Barzilay, Regina

arXiv.org Artificial IntelligenceFeb-17-2021

We develop a novel approach to conformal prediction when the target task has limited data available for training. Conformal prediction identifies a small set of promising output candidates in place of a single prediction, with guarantees that the set contains the correct answer with high probability. When training data is limited, however, the predicted set can easily become unusably large. In this work, we obtain substantially tighter prediction sets while maintaining desirable marginal guarantees by casting conformal prediction as a meta-learning paradigm over exchangeable collections of auxiliary tasks. Our conformalization algorithm is simple, fast, and agnostic to the choice of underlying model, learning algorithm, or dataset. We demonstrate the effectiveness of this approach across a number of few-shot classification and regression tasks in natural language processing, computer vision, and computational chemistry for drug discovery.

health & medicine, neural network, prediction, (16 more...)

arXiv.org Artificial Intelligence

2102.08898

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (1.00)

Industry:

Education (0.46)
Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback