AITopics | Rozonoyer, Benjamin

Collaborating Authors

Rozonoyer, Benjamin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Quasi-random Multi-Sample Inference for Large Language Models

Parashar, Aditya, Singh, Aditya Vikram, Amballa, Avinash, Lai, Jinlin, Rozonoyer, Benjamin

arXiv.org Artificial IntelligenceNov-9-2024

Large language models (LLMs) are often equipped with multi-sample decoding strategies. An LLM implicitly defines an arithmetic code book, facilitating efficient and embarrassingly parallelizable \textbf{arithmetic sampling} to produce multiple samples using quasi-random codes. Traditional text generation methods, such as beam search and sampling-based techniques, have notable limitations: they lack parallelizability or diversity of sampled sequences. This study explores the potential of arithmetic sampling, contrasting it with ancestral sampling across two decoding tasks that employ multi-sample inference: chain-of-thought reasoning with self-consistency and machine translation with minimum Bayes risk decoding. Our results demonstrate that arithmetic sampling produces more diverse samples, significantly improving reasoning and translation performance as the sample size increases. We observe a $\mathbf{3\text{-}5\%}$ point increase in accuracy on the GSM8K dataset and a $\mathbf{0.45\text{-}0.89\%}$ point increment in COMET score for WMT19 tasks using arithmetic sampling without any significant computational overhead.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2411.06251

Country: North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Multistage Collaborative Knowledge Distillation from Large Language Models for Semi-Supervised Sequence Generation

Zhao, Jiachen, Zhao, Wenlong, Drozdov, Andrew, Rozonoyer, Benjamin, Sultan, Md Arafat, Lee, Jay-Yoon, Iyyer, Mohit, McCallum, Andrew

arXiv.org Artificial IntelligenceJan-12-2024

We study semi-supervised sequence generation tasks where labeled data are too scarce to effectively finetune a model and at the same time few-shot prompting of a large language model (LLM) has suboptimal performance. This happens when a task, such as parsing, is expensive to annotate and also unfamiliar to a pretrained LLM. In this paper, we present a discovery that student models distilled from an in-context learned LLM can often generalize better than their teacher on such tasks. Leveraging this finding, we present a new method -- multistage collaborative knowledge distillation from an LLM (MCKD) -- for such tasks. MCKD first few-shot prompts an LLM to produce pseudolabels for unlabeled data. At each intermediate knowledge distillation (KD) stage, a new pair of students is trained on disjoint partitions of the pseudolabeled data. Each student then produces new and improved pseudolabels for its unseen partition to be used in the next stage of distillation. We demonstrate the advantage of multistage cross-partition labeling on several syntactic and semantic parsing tasks. On CRAFT biomedical parsing, for example, 3-stage MCKD with 50 labeled examples outperforms the prompted LLM and vanilla KD by 7.5% and 3.7% parsing F1, respectively, and matches the performance of supervised finetuning with 500 examples.

large language model, machine learning, pseudolabel, (18 more...)

arXiv.org Artificial Intelligence

2311.0864

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (1.00)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback