AITopics | Bogin, Ben

Collaborating Authors

Bogin, Ben

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order

Nakamura, Taishi, Mishra, Mayank, Tedeschi, Simone, Chai, Yekun, Stillerman, Jason T, Friedrich, Felix, Yadav, Prateek, Laud, Tanmay, Chien, Vu Minh, Zhuo, Terry Yue, Misra, Diganta, Bogin, Ben, Vu, Xuan-Son, Karpinska, Marzena, Dantuluri, Arnav Varma, Kusa, Wojciech, Furlanello, Tommaso, Yokota, Rio, Muennighoff, Niklas, Pai, Suhas, Adewumi, Tosin, Laippala, Veronika, Yao, Xiaozhe, Junior, Adalberto, Ariyak, Alpay, Drozd, Aleksandr, Clive, Jordan, Gupta, Kshitij, Chen, Liangyu, Sun, Qi, Tsui, Ken, Persaud, Noah, Fahmy, Nour, Chen, Tianlong, Bansal, Mohit, Monti, Nicolo, Dang, Tai, Luo, Ziyang, Bui, Tien-Tung, Navigli, Roberto, Mehta, Virendra, Blumberg, Matthew, May, Victor, Nguyen, Huu, Pyysalo, Sampo

arXiv.org Artificial IntelligenceApr-23-2024

Pretrained language models underpin several AI applications, but their high computational cost for training limits accessibility. Initiatives such as BLOOM and StarCoder aim to democratize access to pretrained models for collaborative community development. However, such existing models face challenges: limited multilingual capabilities, continual pretraining causing catastrophic forgetting, whereas pretraining from scratch is computationally expensive, and compliance with AI safety and development laws. This paper presents Aurora-M, a 15B parameter multilingual open-source model trained on English, Finnish, Hindi, Japanese, Vietnamese, and code. Continually pretrained from StarCoderPlus on 435 billion additional tokens, Aurora-M surpasses 2 trillion tokens in total training token count. It is the first open-source multilingual model fine-tuned on human-reviewed safety instructions, thus aligning its development not only with conventional red-teaming considerations, but also with the specific concerns articulated in the Biden-Harris Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. Aurora-M is rigorously evaluated across various tasks and languages, demonstrating robustness against catastrophic forgetting and outperforming alternatives in multilingual settings, particularly in safety evaluations. To promote responsible open-source LLM development, Aurora-M and its variants are released at https://huggingface.co/collections/aurora-m/aurora-m-models-65fdfdff62471e09812f5407 .

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2404.00399

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Massachusetts (0.14)

Genre: Research Report (0.64)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Soldaini, Luca, Kinney, Rodney, Bhagia, Akshita, Schwenk, Dustin, Atkinson, David, Authur, Russell, Bogin, Ben, Chandu, Khyathi, Dumas, Jennifer, Elazar, Yanai, Hofmann, Valentin, Jha, Ananya Harsh, Kumar, Sachin, Lucy, Li, Lyu, Xinxi, Lambert, Nathan, Magnusson, Ian, Morrison, Jacob, Muennighoff, Niklas, Naik, Aakanksha, Nam, Crystal, Peters, Matthew E., Ravichander, Abhilasha, Richardson, Kyle, Shen, Zejiang, Strubell, Emma, Subramani, Nishant, Tafjord, Oyvind, Walsh, Pete, Zettlemoyer, Luke, Smith, Noah A., Hajishirzi, Hannaneh, Beltagy, Iz, Groeneveld, Dirk, Dodge, Jesse, Lo, Kyle

arXiv.org Artificial IntelligenceJan-31-2024

Language models have become a critical technology to tackling a wide range of natural language processing tasks, yet many details about how the best-performing language models were developed are not reported. In particular, information about their pretraining corpora is seldom discussed: commercial language models rarely provide any information about their data; even open models rarely release datasets they are trained on, or an exact recipe to reproduce them. As a result, it is challenging to conduct certain threads of language modeling research, such as understanding how training data impacts model capabilities and shapes their limitations. To facilitate open research on language model pretraining, we release Dolma, a three trillion tokens English corpus, built from a diverse mixture of web content, scientific papers, code, public-domain books, social media, and encyclopedic materials. In addition, we open source our data curation toolkit to enable further experimentation and reproduction of our work. In this report, we document Dolma, including its design principles, details about its construction, and a summary of its contents. We interleave this report with analyses and experimental results from training language models on intermediate states of Dolma to share what we have learned about important data curation practices, including the role of content or quality filters, deduplication, and multi-source mixing. Dolma has been used to train OLMo, a state-of-the-art, open language model and framework designed to build and study the science of language modeling.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2402.00159

Country:

Asia > Middle East > UAE (0.14)
North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Texas (0.14)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Government (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Leveraging Code to Improve In-context Learning for Semantic Parsing

Bogin, Ben, Gupta, Shivanshu, Clark, Peter, Sabharwal, Ashish

arXiv.org Artificial IntelligenceNov-15-2023

In-context learning (ICL) is an appealing approach for semantic parsing due to its few-shot nature and improved generalization. However, learning to parse to rare domain-specific languages (DSLs) from just a few demonstrations is challenging, limiting the performance of even the most capable LLMs. In this work, we improve the effectiveness of ICL for semantic parsing by (1) using general-purpose programming languages such as Python instead of DSLs, and (2) augmenting prompts with a structured domain description that includes, e.g., the available classes and functions. We show that both these changes significantly improve accuracy across three popular datasets. Combined, they lead to dramatic improvements (e.g. 7.9% to 66.5% on SMCalFlow compositional split), nearly closing the performance gap between easier i.i.d.\ and harder compositional splits when used with a strong model, and reducing the need for a large number of demonstrations. We find that the resemblance of the target parse language to general-purpose code is a more important factor than the language's popularity in pre-training corpora. Our findings provide an improved methodology for building semantic parsers in the modern context of ICL with LLMs.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2311.09519

Country:

Europe (0.67)
Asia > Middle East > UAE (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Answering Questions by Meta-Reasoning over Multiple Chains of Thought

Yoran, Ori, Wolfson, Tomer, Bogin, Ben, Katz, Uri, Deutch, Daniel, Berant, Jonathan

arXiv.org Artificial IntelligenceOct-17-2023

Modern systems for multi-hop question answering (QA) typically break questions into a sequence of reasoning steps, termed chain-of-thought (CoT), before arriving at a final answer. Often, multiple chains are sampled and aggregated through a voting mechanism over the final answers, but the intermediate steps themselves are discarded. While such approaches improve performance, they do not consider the relations between intermediate steps across chains and do not provide a unified explanation for the predicted answer. We introduce Multi-Chain Reasoning (MCR), an approach which prompts large language models to meta-reason over multiple chains of thought, rather than aggregating their answers. MCR examines different reasoning chains, mixes information between them and selects the most relevant facts in generating an explanation and predicting the answer. MCR outperforms strong baselines on 7 multi-hop QA datasets. Moreover, our analysis reveals that MCR explanations exhibit high quality, enabling humans to verify its answers.

meta-reasoning, multiple chain, natural language, (3 more...)

arXiv.org Artificial Intelligence

2304.13007

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.87)

Add feedback

Diverse Demonstrations Improve In-context Compositional Generalization

Levy, Itay, Bogin, Ben, Berant, Jonathan

arXiv.org Artificial IntelligenceJun-24-2023

In-context learning has shown great success in i.i.d semantic parsing splits, where the training and test sets are drawn from the same distribution. In this setup, models are typically prompted with demonstrations that are similar to the input utterance. However, in the setup of compositional generalization, where models are tested on outputs with structures that are absent from the training set, selecting similar demonstrations is insufficient, as often no example will be similar enough to the input. In this work, we propose a method to select diverse demonstrations that aims to collectively cover all of the structures required in the output program, in order to encourage the model to generalize to new structures from these demonstrations. We empirically show that combining diverse demonstrations with in-context learning substantially improves performance across three compositional generalization semantic parsing datasets in the pure in-context learning setup and when combined with finetuning.

demonstration, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2212.068

Country:

North America > United States (1.00)
Europe (0.93)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Obtaining Faithful Interpretations from Compositional Neural Networks

Subramanian, Sanjay, Bogin, Ben, Gupta, Nitish, Wolfson, Tomer, Singh, Sameer, Berant, Jonathan, Gardner, Matt

arXiv.org Artificial IntelligenceSep-8-2020

Neural module networks (NMNs) are a popular approach for modeling compositionality: they achieve high accuracy when applied to problems in language and vision, while reflecting the compositional structure of the problem in the network architecture. However, prior work implicitly assumed that the structure of the network modules, describing the abstract reasoning process, provides a faithful explanation of the model's reasoning; that is, that all modules perform their intended behaviour. In this work, we propose and conduct a systematic evaluation of the intermediate outputs of NMNs on NLVR2 and DROP, two datasets which require composing multiple reasoning steps. We find that the intermediate outputs differ from the expected output, illustrating that the network structure does not provide a faithful explanation of model behaviour. To remedy that, we train the model with auxiliary supervision and propose particular choices for module architecture that yield much better faithfulness, at a minimal cost to accuracy.

artificial intelligence, module, neural network, (19 more...)

arXiv.org Artificial Intelligence

2005.00724

Country: North America > United States (0.93)

Genre: Research Report > Experimental Study (0.46)

Industry: Leisure & Entertainment > Sports > Football (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.64)

Add feedback

Latent Compositional Representations Improve Systematic Generalization in Grounded Question Answering

Bogin, Ben, Subramanian, Sanjay, Gardner, Matt, Berant, Jonathan

arXiv.org Artificial IntelligenceJul-1-2020

Answering questions that involve multi-step reasoning requires decomposing them and using the answers of intermediate steps to reach the final answer. However, state-of-the-art models in grounded question answering often do not explicitly perform decomposition, leading to difficulties in generalization to out-of-distribution examples. In this work, we propose a model that computes a representation and denotation for all question spans in a bottom-up, compositional manner using a CKY-style parser. Our model effectively induces latent trees, driven by end-to-end (the answer) supervision only. We show that this inductive bias towards tree structures dramatically improves systematic generalization to out-of-distribution examples compared to strong baselines on an arithmetic expressions benchmark as well as on CLOSURE, a dataset that focuses on systematic generalization of models for grounded question answering. On this challenging dataset, our model reaches an accuracy of 92.8%, significantly higher than prior models that almost perfectly solve the task on a random, in-distribution split.

artificial intelligence, neural network, representation, (21 more...)

arXiv.org Artificial Intelligence

2007.00266

Country:

Asia (0.93)
Europe (0.68)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.81)

Add feedback

Emergence of Communication in an Interactive World with Consistent Speakers

Bogin, Ben, Geva, Mor, Berant, Jonathan

arXiv.org Artificial IntelligenceSep-3-2018

Training agents to communicate with one another given task-based supervision only has attracted considerable attention recently, due to the growing interest in developing models for human-agent interaction. Prior work on the topic focused on simple environments, where training using policy gradient was feasible despite the non-stationarity of the agents during training. In this paper, we present a more challenging environment for testing the emergence of communication from raw pixels, where training using policy gradient fails. We propose a new model and training algorithm, that utilizes the structure of a learned representation space to produce more consistent speakers at the initial phases of training, which stabilizes learning. We empirically show that our algorithm substantially improves performance compared to policy gradient. We also propose a new alignment-based metric for measuring context-independence in emerged communication and find our method increases context-independence compared to policy gradient and other competitive baselines.

artificial intelligence, listener, natural language, (18 more...)

arXiv.org Artificial Intelligence

1809.00549

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.89)

Add feedback