AITopics | Ratner, Alexander

Collaborating Authors

Ratner, Alexander

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Language Model Preference Evaluation with Multiple Weak Evaluators

Hu, Zhengyu, Zhang, Jieyu, Xiong, Zhihan, Ratner, Alexander, Xiong, Hui, Krishna, Ranjay

arXiv.org Artificial IntelligenceDec-29-2024

Despite the remarkable success of Large Language Models (LLMs), evaluating their outputs' quality regarding *preference* remains a critical challenge. Existing works usually leverage a powerful LLM (e.g., GPT4) as the judge for comparing LLMs' output pairwisely, yet such model-based evaluator is vulnerable to *conflicting preference*, i.e., output A is better than B, B than C, but C than A, causing contradictory evaluation results. To improve model-based preference evaluation, we introduce GED (Preference Graph Ensemble and Denoise), a novel approach that leverages multiple model-based evaluators to construct preference graphs, and then ensemble and denoise these graphs for better, non-contradictory evaluation results. In particular, our method consists of two primary stages: aggregating evaluations into a unified graph and applying a denoising process to eliminate cyclic inconsistencies, ensuring a directed acyclic graph (DAG) structure. We provide theoretical guarantees for our framework, demonstrating its efficacy in recovering the ground truth preference structure. Extensive experiments across ten benchmark datasets show that GED outperforms baseline methods in model ranking, response selection, and model alignment tasks. Notably, GED combines weaker evaluators like Llama3-8B, Mistral-7B, and Qwen2-7B to surpass the performance of stronger evaluators like Qwen2-72B, highlighting its ability to enhance evaluation reliability and improve model performance.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.12869

Country: Asia (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization

Hsieh, Cheng-Yu, Chuang, Yung-Sung, Li, Chun-Liang, Wang, Zifeng, Le, Long T., Kumar, Abhishek, Glass, James, Ratner, Alexander, Lee, Chen-Yu, Krishna, Ranjay, Pfister, Tomas

arXiv.org Artificial IntelligenceJul-3-2024

Large language models (LLMs), even when specifically trained to process long input contexts, struggle to capture relevant information located in the middle of their input. This phenomenon has been known as the lost-in-themiddle problem. In this work, we make three contributions. First, we set out to understand the factors that cause this phenomenon. In doing so, we establish a connection between lost-in-the-middle to LLMs' intrinsic attention bias: LLMs exhibit an U-shaped attention bias where the tokens at the beginning and at the end of its input receive higher attention, regardless Figure 1: (a) Lost-in-the-middle refers to models' U-of their relevance. Second, we mitigate shape RAG performance as the relevant context's (e.g., this positional bias through a calibration a gold document containing the answer to a query) position mechanism, found-in-the-middle, that allows varies within the input; (b) We observe models the model to attend to contexts faithfully according exhibit U-shape attention weights favoring leading and to their relevance, even though when ending contexts, regardless of their actual contents; (c) they are in the middle. Third, we show foundin-the-middle Models do attend to relevant contexts even when placed not only achieves better performance in the middle, but are eventually distracted by leading/ending in locating relevant information within contexts; (d) We propose a calibration mechanism, a long context, but also eventually leads to improved found-in-the-middle, that disentangles the effect retrieval-augmented generation (RAG) of U-shape attention bias and allows models to attend performance across various tasks, outperforming to relevant context regardless their positions.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2406.16008

Country:

Asia (0.28)
North America > Canada (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MaskSearch: Querying Image Masks at Scale

He, Dong, Zhang, Jieyu, Daum, Maureen, Ratner, Alexander, Balazinska, Magdalena

arXiv.org Artificial IntelligenceJan-8-2024

Machine learning tasks over image databases often generate masks that annotate image content (e.g., saliency maps, segmentation maps, depth maps) and enable a variety of applications (e.g., determine if a model is learning spurious correlations or if an image was maliciously modified to mislead a model). While queries that retrieve examples based on mask properties are valuable to practitioners, existing systems do not support them efficiently. In this paper, we formalize the problem and propose MaskSearch, a system that focuses on accelerating queries over databases of image masks while guaranteeing the correctness of query results. MaskSearch leverages a novel indexing technique and an efficient filter-verification query execution framework. Experiments with our prototype show that MaskSearch, using indexes approximately 5% of the compressed data size, accelerates individual queries by up to two orders of magnitude and consistently outperforms existing methods on various multi-query workloads that simulate dataset exploration and analysis processes.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2305.02375

Country:

North America > United States > Indiana (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Diagnostic Medicine (0.68)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

On the Trade-off of Intra-/Inter-class Diversity for Supervised Pre-training

Zhang, Jieyu, Wang, Bohan, Hu, Zhengyu, Koh, Pang Wei, Ratner, Alexander

arXiv.org Machine LearningDec-1-2023

Pre-training datasets are critical for building state-of-the-art machine learning models, motivating rigorous study on their impact on downstream tasks. In this work, we study the impact of the trade-off between the intra-class diversity (the number of samples per class) and the inter-class diversity (the number of classes) of a supervised pre-training dataset. Empirically, we found that with the size of the pre-training dataset fixed, the best downstream performance comes with a balance on the intra-/inter-class diversity. To understand the underlying mechanism, we show theoretically that the downstream performance depends monotonically on both types of diversity. Notably, our theory reveals that the optimal class-to-sample ratio (#classes / #samples per class) is invariant to the size of the pre-training dataset, which motivates an application of predicting the optimal number of pre-training classes. We demonstrate the effectiveness of this application by an improvement of around 2 points on the downstream tasks when using ImageNet as the pre-training dataset.

artificial intelligence, machine learning, pre-training dataset, (17 more...)

arXiv.org Machine Learning

2305.12224

Country:

North America (0.14)
Europe > Germany (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

DataComp: In search of the next generation of multimodal datasets

Gadre, Samir Yitzhak, Ilharco, Gabriel, Fang, Alex, Hayase, Jonathan, Smyrnis, Georgios, Nguyen, Thao, Marten, Ryan, Wortsman, Mitchell, Ghosh, Dhruba, Zhang, Jieyu, Orgad, Eyal, Entezari, Rahim, Daras, Giannis, Pratt, Sarah, Ramanujan, Vivek, Bitton, Yonatan, Marathe, Kalyani, Mussmann, Stephen, Vencu, Richard, Cherti, Mehdi, Krishna, Ranjay, Koh, Pang Wei, Saukh, Olga, Ratner, Alexander, Song, Shuran, Hajishirzi, Hannaneh, Farhadi, Ali, Beaumont, Romain, Oh, Sewoong, Dimakis, Alex, Jitsev, Jenia, Carmon, Yair, Shankar, Vaishaal, Schmidt, Ludwig

arXiv.org Artificial IntelligenceOct-20-2023

Multimodal datasets are a critical component in recent breakthroughs such as Stable Diffusion and GPT-4, yet their design does not receive the same research attention as model architectures or training algorithms. To address this shortcoming in the ML ecosystem, we introduce DataComp, a testbed for dataset experiments centered around a new candidate pool of 12.8 billion image-text pairs from Common Crawl. Participants in our benchmark design new filtering techniques or curate new data sources and then evaluate their new dataset by running our standardized CLIP training code and testing the resulting model on 38 downstream test sets. Our benchmark consists of multiple compute scales spanning four orders of magnitude, which enables the study of scaling trends and makes the benchmark accessible to researchers with varying resources. Our baseline experiments show that the DataComp workflow leads to better training sets. In particular, our best baseline, DataComp-1B, enables training a CLIP ViT-L/14 from scratch to 79.2% zero-shot accuracy on ImageNet, outperforming OpenAI's CLIP ViT-L/14 by 3.7 percentage points while using the same training procedure and compute. We release DataComp and all accompanying code at www.datacomp.ai.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2304.14108

Country:

Asia (0.67)
Europe (0.67)
North America > United States > Illinois (0.14)
North America > United States > Texas (0.14)

Genre:

Research Report > New Finding (1.00)
Overview (0.92)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.92)
Health & Medicine (0.92)
Energy (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias

Yu, Yue, Zhuang, Yuchen, Zhang, Jieyu, Meng, Yu, Ratner, Alexander, Krishna, Ranjay, Shen, Jiaming, Zhang, Chao

arXiv.org Artificial IntelligenceOct-17-2023

Large language models (LLMs) have been recently leveraged as training data generators for various natural language processing (NLP) tasks. While previous research has explored different approaches to training models using generated data, they generally rely on simple class-conditional prompts, which may limit the diversity of the generated data and inherit systematic biases of LLM. Thus, we investigate training data generation with diversely attributed prompts (e.g., specifying attributes like length and style), which have the potential to yield diverse and attributed generated data. Our investigation focuses on datasets with high cardinality and diverse domains, wherein we demonstrate that attributed prompts outperform simple class-conditional prompts in terms of the resulting model's performance. Additionally, we present a comprehensive empirical study on data generation encompassing vital aspects like bias, diversity, and efficiency, and highlight three key observations: firstly, synthetic datasets generated by simple prompts exhibit significant biases, such as regional bias; secondly, attribute diversity plays a pivotal role in enhancing model performance; lastly, attributed prompts achieve the performance of simple class-conditional prompts while utilizing only 5\% of the querying cost of ChatGPT associated with the latter. The data and code are available on \url{https://github.com/yueyu1030/AttrPrompt}.

artificial intelligence, large language model, natural language, (3 more...)

arXiv.org Artificial Intelligence

2306.15895

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models

Hsieh, Cheng-Yu, Chen, Si-An, Li, Chun-Liang, Fujii, Yasuhisa, Ratner, Alexander, Lee, Chen-Yu, Krishna, Ranjay, Pfister, Tomas

arXiv.org Artificial IntelligenceAug-1-2023

Today, large language models (LLMs) are taught to use new tools by providing a few demonstrations of the tool's usage. Unfortunately, demonstrations are hard to acquire, and can result in undesirable biased usage if the wrong demonstration is chosen. Even in the rare scenario that demonstrations are readily available, there is no principled selection protocol to determine how many and which ones to provide. As tasks grow more complex, the selection search grows combinatorially and invariably becomes intractable. Our work provides an alternative to demonstrations: tool documentation. We advocate the use of tool documentation, descriptions for the individual tool usage, over demonstrations. We substantiate our claim through three main empirical findings on 6 tasks across both vision and language modalities. First, on existing benchmarks, zero-shot prompts with only tool documentation are sufficient for eliciting proper tool usage, achieving performance on par with few-shot prompts. Second, on a newly collected realistic tool-use dataset with hundreds of available tool APIs, we show that tool documentation is significantly more valuable than demonstrations, with zero-shot documentation significantly outperforming few-shot without documentation. Third, we highlight the benefits of tool documentations by tackling image generation and video tracking using just-released unseen state-of-the-art models as tools. Finally, we highlight the possibility of using tool documentation to automatically enable new applications: by using nothing more than the documentation of GroundingDino, Stable Diffusion, XMem, and SAM, LLMs can re-invent the functionalities of the just-released Grounded-SAM and Track Anything models.

arxiv preprint arxiv, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2308.00675

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Hsieh, Cheng-Yu, Li, Chun-Liang, Yeh, Chih-Kuan, Nakhost, Hootan, Fujii, Yasuhisa, Ratner, Alexander, Krishna, Ranjay, Lee, Chen-Yu, Pfister, Tomas

arXiv.org Artificial IntelligenceJul-5-2023

Deploying large language models (LLMs) is challenging because they are memory inefficient and compute-intensive for practical applications. In reaction, researchers train smaller task-specific models by either finetuning with human labels or distilling using LLM-generated labels. However, finetuning and distillation require large amounts of training data to achieve comparable performance to LLMs. We introduce Distilling step-by-step, a new mechanism that (a) trains smaller models that outperform LLMs, and (b) achieves so by leveraging less training data needed by finetuning or distillation. Our method extracts LLM rationales as additional supervision for training small models within a multi-task framework. We present three findings across 4 NLP benchmarks: First, compared to both finetuning and distillation, our mechanism achieves better performance with much fewer labeled/unlabeled training examples. Second, compared to few-shot prompted LLMs, we achieve better performance using substantially smaller model sizes. Third, we reduce both the model size and the amount of data required to outperform LLMs; our finetuned 770M T5 model outperforms the few-shot prompted 540B PaLM model using only 80% of available data on a benchmark, whereas standard finetuning the same T5 model struggles to match even by using 100% of the dataset. We release the code at: https://github.com/google-research/distilling-step-by-step .

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.02301

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A Survey on Programmatic Weak Supervision

Zhang, Jieyu, Hsieh, Cheng-Yu, Yu, Yue, Zhang, Chao, Ratner, Alexander

arXiv.org Artificial IntelligenceFeb-14-2022

Labeling training data has become one of the major roadblocks to using machine learning. Among various weak supervision paradigms, programmatic weak supervision (PWS) has achieved remarkable success in easing the manual labeling bottleneck by programmatically synthesizing training labels from multiple potentially noisy supervision sources. This paper presents a comprehensive survey of recent advances in PWS. In particular, we give a brief introduction of the PWS learning paradigm, and review representative approaches for each component within PWS's learning workflow. In addition, we discuss complementary learning paradigms for tackling limited labeled data scenarios and how these related approaches can be used in conjunction with PWS. Finally, we identify several critical challenges that remain under-explored in the area to hopefully inspire future research directions in the field.

artificial intelligence, machine learning, programmatic weak supervision, (1 more...)

arXiv.org Artificial Intelligence

2202.05433

Genre: Overview (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Creating Training Sets via Weak Indirect Supervision

Zhang, Jieyu, Wang, Bohan, Song, Xiangchen, Wang, Yujing, Yang, Yaming, Bai, Jing, Ratner, Alexander

arXiv.org Machine LearningOct-7-2021

Creating labeled training sets has become one of the major roadblocks in machine learning. To address this, recent Weak Supervision (WS) frameworks synthesize training labels from multiple potentially noisy supervision sources. However, existing frameworks are restricted to supervision sources that share the same output space as the target task. To extend the scope of usable sources, we formulate Weak Indirect Supervision (WIS), a new research problem for automatically synthesizing training labels based on indirect supervision sources that have different output label spaces. To overcome the challenge of mismatched output spaces, we develop a probabilistic modeling approach, PLRM, which uses user-provided label relations to model and leverage indirect supervision sources. Moreover, we provide a theoretically-principled test of the distinguishability of PLRM for unseen labels, along with an generalization bound. On both image and text classification tasks as well as an industrial advertising application, we demonstrate the advantages of PLRM by outperforming baselines by a margin of 2%-9%.

artificial intelligence, health & medicine, machine learning, (16 more...)

arXiv.org Machine Learning

2110.03484

Country:

South America > Brazil > Rio de Janeiro (0.14)
North America > United States > Hawaii (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.87)

Add feedback