AITopics | Wang, Zhiruo

Collaborating Authors

Wang, Zhiruo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

What Are Tools Anyway? A Survey from the Language Model Perspective

Wang, Zhiruo, Cheng, Zhoujun, Zhu, Hao, Fried, Daniel, Neubig, Graham

arXiv.org Artificial IntelligenceMar-18-2024

Language models (LMs) are powerful yet mostly for text generation tasks. Tools have substantially enhanced their performance for tasks that require complex skills. However, many works adopt the term "tool" in different ways, raising the question: What is a tool anyway? Subsequently, where and how do tools help LMs? In this survey, we provide a unified definition of tools as external programs used by LMs, and perform a systematic review of LM tooling scenarios and approaches. Grounded on this review, we empirically study the efficiency of various tooling methods by measuring their required compute and performance gains on various benchmarks, and highlight some challenges and potential future research in the field.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2403.15452

Country: North America > United States (0.28)

Genre: Overview (1.00)

Industry:

Education (0.67)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback

RAGGED: Towards Informed Design of Retrieval Augmented Generation Systems

Hsia, Jennifer, Shaikh, Afreen, Wang, Zhiruo, Neubig, Graham

arXiv.org Artificial IntelligenceMar-13-2024

Retrieval-augmented generation (RAG) greatly benefits language models (LMs) by providing additional context for tasks such as document-based question answering (DBQA). Despite its potential, the power of RAG is highly dependent on its configuration, raising the question: What is the optimal RAG configuration? To answer this, we introduce the RAGGED framework to analyze and optimize RAG systems. On a set of representative DBQA tasks, we study two classic sparse and dense retrievers, and four top-performing LMs in encoder-decoder and decoder-only architectures. Through RAGGED, we uncover that different models suit substantially varied RAG setups. While encoder-decoder models monotonically improve with more documents, we find decoder-only models can only effectively use < 5 documents, despite often having a longer context window. RAGGED offers further insights into LMs' context utilization habits, where we find that encoder-decoder models rely more on contexts and are thus more sensitive to retrieval quality, while decoder-only models tend to rely on knowledge memorized during training.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2403.0904

Country:

North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)

Add feedback

TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks

Wang, Zhiruo, Fried, Daniel, Neubig, Graham

arXiv.org Artificial IntelligenceJan-23-2024

Language models (LMs) can solve tasks such as answering questions about tables or images by writing programs. However, using primitive functions often leads to verbose and error-prone programs, and higher-level functions require expert design. To enable better solutions without human labor, we ask code LMs to curate reusable high-level functions, and use them to write solutions. We present TROVE, a training-free method of inducing a verifiable and efficient toolbox of functions, by generating via using, growing, and periodically trimming the toolbox. On 11 datasets from math, table question answering, and image reasoning tasks, TROVE consistently yields simpler solutions with higher accuracy than baselines using CODELLAMA and previous methods using GPT, while using 79-98% smaller toolboxes. TROVE further enables 31% faster and 13% more accurate human verification than baselines. With the same pipeline, it creates diverse functions for varied tasks and datasets, providing insights into their individual characteristics.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2401.12869

Genre: Research Report (0.82)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.46)

Add feedback

StarCoder: may the source be with you!

Li, Raymond, Allal, Loubna Ben, Zi, Yangtian, Muennighoff, Niklas, Kocetkov, Denis, Mou, Chenghao, Marone, Marc, Akiki, Christopher, Li, Jia, Chim, Jenny, Liu, Qian, Zheltonozhskii, Evgenii, Zhuo, Terry Yue, Wang, Thomas, Dehaene, Olivier, Davaadorj, Mishig, Lamy-Poirier, Joel, Monteiro, João, Shliazhko, Oleh, Gontier, Nicolas, Meade, Nicholas, Zebaze, Armel, Yee, Ming-Ho, Umapathi, Logesh Kumar, Zhu, Jian, Lipkin, Benjamin, Oblokulov, Muhtasham, Wang, Zhiruo, Murthy, Rudra, Stillerman, Jason, Patel, Siva Sankalp, Abulkhanov, Dmitry, Zocca, Marco, Dey, Manan, Zhang, Zhihan, Fahmy, Nour, Bhattacharyya, Urvashi, Yu, Wenhao, Singh, Swayam, Luccioni, Sasha, Villegas, Paulo, Kunakov, Maxim, Zhdanov, Fedor, Romero, Manuel, Lee, Tony, Timor, Nadav, Ding, Jennifer, Schlesinger, Claire, Schoelkopf, Hailey, Ebert, Jan, Dao, Tri, Mishra, Mayank, Gu, Alex, Robinson, Jennifer, Anderson, Carolyn Jane, Dolan-Gavitt, Brendan, Contractor, Danish, Reddy, Siva, Fried, Daniel, Bahdanau, Dzmitry, Jernite, Yacine, Ferrandis, Carlos Muñoz, Hughes, Sean, Wolf, Thomas, Guha, Arjun, von Werra, Leandro, de Vries, Harm

arXiv.org Artificial IntelligenceDec-13-2023

The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process. We fine-tuned StarCoderBase on 35B Python tokens, resulting in the creation of StarCoder. We perform the most comprehensive evaluation of Code LLMs to date and show that StarCoderBase outperforms every open Code LLM that supports multiple programming languages and matches or outperforms the OpenAI code-cushman-001 model. Furthermore, StarCoder outperforms every model that is fine-tuned on Python, can be prompted to achieve 40\% pass@1 on HumanEval, and still retains its performance on other programming languages. We take several important steps towards a safe open-access model release, including an improved PII redaction pipeline and a novel attribution tracing tool, and make the StarCoder models publicly available under a more commercially viable version of the Open Responsible AI Model license.

large language model, machine learning research, natural language, (15 more...)

arXiv.org Artificial Intelligence

2305.06161

Country:

Europe (1.00)
Asia (1.00)
Africa (1.00)
(2 more...)

Genre:

Research Report (0.64)
Workflow (0.45)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Banking & Finance > Economy (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs

Yu, Lijun, Cheng, Yong, Wang, Zhiruo, Kumar, Vivek, Macherey, Wolfgang, Huang, Yanping, Ross, David A., Essa, Irfan, Bisk, Yonatan, Yang, Ming-Hsuan, Murphy, Kevin, Hauptmann, Alexander G., Jiang, Lu

arXiv.org Artificial IntelligenceOct-28-2023

In this work, we introduce Semantic Pyramid AutoEncoder (SPAE) for enabling frozen LLMs to perform both understanding and generation tasks involving non-linguistic modalities such as images or videos. SPAE converts between raw pixels and interpretable lexical tokens (or words) extracted from the LLM's vocabulary. The resulting tokens capture both the semantic meaning and the fine-grained details needed for visual reconstruction, effectively translating the visual content into a language comprehensible to the LLM, and empowering it to perform a wide array of multimodal tasks. Our approach is validated through in-context learning experiments with frozen PaLM 2 and GPT 3.5 on a diverse set of image understanding and generation tasks. Our method marks the first successful attempt to enable a frozen LLM to generate image content while surpassing state-of-the-art performance in image understanding tasks, under the same setting, by over 25%.

large language model, machine learning, spae, (17 more...)

arXiv.org Artificial Intelligence

2306.17842

Genre: Research Report > New Finding (0.93)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

API-Assisted Code Generation for Question Answering on Varied Table Structures

Cao, Yihan, Chen, Shuyi, Liu, Ryan, Wang, Zhiruo, Fried, Daniel

arXiv.org Artificial IntelligenceOct-23-2023

A persistent challenge to table question answering (TableQA) by generating executable programs has been adapting to varied table structures, typically requiring domain-specific logical forms. In response, this paper introduces a unified TableQA framework that: (1) provides a unified representation for structured tables as multi-index Pandas data frames, (2) uses Python as a powerful querying language, and (3) uses few-shot prompting to translate NL questions into Python programs, which are executable on Pandas data frames. Furthermore, to answer complex relational questions with extended program functionality and external knowledge, our framework allows customized APIs that Python programs can call. We experiment with four TableQA datasets that involve tables of different structures -- relational, multi-table, and hierarchical matrix shapes -- and achieve prominent improvements over past state-of-the-art systems. In ablation studies, we (1) show benefits from our multi-index representation and APIs over baselines that use only an LLM, and (2) demonstrate that our approach is modular and can incorporate additional APIs.

large language model, machine learning, question answering, (20 more...)

arXiv.org Artificial Intelligence

2310.14687

Country:

Europe (1.00)
North America > Canada (0.94)

Genre: Research Report (0.82)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Improving Factuality of Abstractive Summarization via Contrastive Reward Learning

Chern, I-Chun, Wang, Zhiruo, Das, Sanjan, Sharma, Bhavuk, Liu, Pengfei, Neubig, Graham

arXiv.org Artificial IntelligenceJul-10-2023

Modern abstractive summarization models often generate summaries that contain hallucinated or contradictory information. In this paper, we propose a simple but effective contrastive learning framework that incorporates recent developments in reward learning and factuality metrics. Empirical studies demonstrate that the proposed framework enables summarization models to learn from feedback of factuality metrics using contrastive reward learning, leading to more factual summaries by human evaluations. This suggests that further advances in learning and evaluation algorithms can feed directly into providing more factual summaries.

computational linguistic, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2307.04507

Country:

Europe (1.00)
North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Execution-Based Evaluation for Open-Domain Code Generation

Wang, Zhiruo, Zhou, Shuyan, Fried, Daniel, Neubig, Graham

arXiv.org Artificial IntelligenceMay-19-2023

To extend the scope of coding queries to more realistic settings, we propose ODEX, the first Open-Domain EXecution-based natural language (NL) to Python code generation dataset. ODEX has 945 NL-Code pairs spanning 79 diverse libraries, along with 1,707 human-written test cases for execution. Our NL-Code pairs are harvested from StackOverflow forums to encourage natural and practical coding queries. Moreover, ODEX supports four natural languages as intents, in English, Spanish, Japanese, and Russian. ODEX unveils intriguing behavioral differences among top-performing code language models (LM). While CODEX achieves better overall results, CODEGEN improves effectively via scaling -- CODEGEN 6.1B performs comparably with CODEX 12B. Both models show substantial gaps between open and closed domains, but CODEGEN gaps tend to decrease with model size while CODEX gaps increase. We release ODEX to facilitate research into open-domain problems for the code generation community.

artificial intelligence, machine translation, natural language, (16 more...)

arXiv.org Artificial Intelligence

2212.10481

Country: Europe (0.46)

Genre: Research Report (0.81)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (0.82)

Add feedback

DocPrompting: Generating Code by Retrieving the Docs

Zhou, Shuyan, Alon, Uri, Xu, Frank F., Wang, Zhiruo, Jiang, Zhengbao, Neubig, Graham

arXiv.org Artificial IntelligenceFeb-18-2023

Publicly available source-code libraries are continuously growing and changing. This makes it impossible for models of code to keep current with all available APIs by simply training these models on existing code repositories. Thus, existing models inherently cannot generalize to using unseen functions and libraries, because these would never appear in the training data. In contrast, when human programmers use functions and libraries for the first time, they frequently refer to textual resources such as code manuals and documentation, to explore and understand the available functionality. Inspired by this observation, we introduce DocPrompting: a natural-language-to-code generation approach that explicitly leverages documentation by (1) retrieving the relevant documentation pieces given an NL intent, and (2) generating code based on the NL intent and the retrieved documentation. DocPrompting is general: it can be applied to any programming language and is agnostic to the underlying neural model. We demonstrate that DocPrompting consistently improves NL-to-code models: DocPrompting improves strong base models such as CodeT5 by 2.85% in pass@1 (52% relative gain) and 4.39% in pass@10 (30% relative gain) in execution-based evaluation on the popular Python CoNaLa benchmark; on a new Bash dataset tldr, DocPrompting improves CodeT5 and GPT-Neo1.3B by up to absolute 6.9% exact match.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2207.05987

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.46)

Industry: Government (0.46)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.47)

Add feedback

MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages

Wang, Zhiruo, Cuenca, Grace, Zhou, Shuyan, Xu, Frank F., Neubig, Graham

arXiv.org Artificial IntelligenceFeb-6-2023

While there has been a recent burgeoning of applications at the intersection of natural and programming languages, such as code generation and code summarization, these applications are usually English-centric. This creates a barrier for program developers who are not proficient in English. To mitigate this gap in technology development across languages, we propose a multilingual dataset, MCoNaLa, to benchmark code generation from natural language commands extending beyond English. Modeled off of the methodology from the English Code/Natural Language Challenge (CoNaLa) dataset, we annotated a total of 896 NL-code pairs in three languages: Spanish, Japanese, and Russian. We present a quantitative evaluation of performance on the MCoNaLa dataset by testing with state-of-the-art code generation systems. While the difficulties vary across these three languages, all systems lag significantly behind their English counterparts, revealing the challenges in adapting code generation to new languages.

artificial intelligence, computational linguistic, natural language, (15 more...)

arXiv.org Artificial Intelligence

2203.08388

Country:

Europe (1.00)
North America > United States (0.68)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback