AITopics | Yue, Xiang

Collaborating Authors

Yue, Xiang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Automatic Evaluation of Attribution by Large Language Models

Yue, Xiang, Wang, Boshi, Chen, Ziru, Zhang, Kai, Su, Yu, Sun, Huan

arXiv.org Artificial IntelligenceOct-7-2023

A recent focus of large language model (LLM) development, as exemplified by generative search engines, is to incorporate external references to generate and support its claims. However, evaluating the attribution, i.e., verifying whether the generated statement is fully supported by the cited reference, remains an open problem. Although human evaluation is common practice, it is costly and time-consuming. In this paper, we investigate the automatic evaluation of attribution given by LLMs. We begin by defining different types of attribution errors, and then explore two approaches for automatic evaluation: prompting LLMs and fine-tuning smaller LMs. The fine-tuning data is repurposed from related tasks such as question answering, fact-checking, natural language inference, and summarization. We manually curate a set of test examples covering 12 domains from a generative search engine, New Bing. Our results on this curated test set and simulated examples from existing benchmarks highlight both promising signals and challenges. We hope our problem formulation, testbeds, and findings will help lay the foundation for future studies on this important problem.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.06311

Country:

Europe (1.00)
Asia (0.93)
South America > Argentina > Pampas (0.14)
(2 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Banking & Finance > Economy (0.47)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning

Yue, Xiang, Qu, Xingwei, Zhang, Ge, Fu, Yao, Huang, Wenhao, Sun, Huan, Su, Yu, Chen, Wenhu

arXiv.org Artificial IntelligenceOct-2-2023

We introduce MAmmoTH, a series of open-source large language models (LLMs) specifically tailored for general math problem-solving. The MAmmoTH models are trained on MathInstruct, our meticulously curated instruction tuning dataset. MathInstruct is compiled from 13 math datasets with intermediate rationales, six of which have rationales newly curated by us. It presents a unique hybrid of chain-of-thought (CoT) and program-of-thought (PoT) rationales, and also ensures extensive coverage of diverse fields in math. The hybrid of CoT and PoT not only unleashes the potential of tool use but also allows different thought processes for different math problems. As a result, the MAmmoTH series substantially outperform existing open-source models on nine mathematical reasoning datasets across all scales with an average accuracy gain between 16% and 32%. Remarkably, our MAmmoTH-7B model reaches 33% on MATH (a competition-level dataset), which exceeds the best open-source 7B model (WizardMath) by 23%, and the MAmmoTH-34B model achieves 44% accuracy on MATH, even surpassing GPT-4's CoT result. Our work underscores the importance of diverse problem coverage and the use of hybrid rationales in developing superior math generalist models. Weng earns $12 an hour for babysitting. Weng earns 12/60 = 0.2 per minute. Doing 50 mins, she earned 0.2 x 50 = 10 How much did she earn? Figure 1: The superior performance of MAmmoTH, a series of models instruction-tuned to solve a diverse set of mathematical problems using hybrid CoT and PoT rationales. MAmmoTH significantly outperforms base and SoTA models on both in-domain and out-of-domain test sets, across all scales.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2309.05653

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Roll Up Your Sleeves: Working with a Collaborative and Engaging Task-Oriented Dialogue System

Mo, Lingbo, Chen, Shijie, Chen, Ziru, Deng, Xiang, Lewis, Ashley, Singh, Sunit, Stevens, Samuel, Tai, Chang-You, Wang, Zhen, Yue, Xiang, Zhang, Tianshu, Su, Yu, Sun, Huan

arXiv.org Artificial IntelligenceJul-29-2023

We introduce TacoBot, a user-centered task-oriented digital assistant designed to guide users through complex real-world tasks with multiple steps. Covering a wide range of cooking and how-to tasks, we aim to deliver a collaborative and engaging dialogue experience. Equipped with language understanding, dialogue management, and response generation components supported by a robust search engine, TacoBot ensures efficient task assistance. To enhance the dialogue experience, we explore a series of data augmentation strategies using LLMs to train advanced neural models continuously. TacoBot builds upon our successful participation in the inaugural Alexa Prize TaskBot Challenge, where our team secured third place among ten competing teams. We offer TacoBot as an open-source framework that serves as a practical example for deploying task-oriented dialogue systems.

artificial intelligence, information retrieval, natural language, (15 more...)

arXiv.org Artificial Intelligence

2307.16081

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.47)

Add feedback

Synthetic Text Generation with Differential Privacy: A Simple and Practical Recipe

Yue, Xiang, Inan, Huseyin A., Li, Xuechen, Kumar, Girish, McAnallen, Julia, Shajari, Hoda, Sun, Huan, Levitan, David, Sim, Robert

arXiv.org Artificial IntelligenceJul-18-2023

Privacy concerns have attracted increasing attention in data-driven products due to the tendency of machine learning models to memorize sensitive training data. Generating synthetic versions of such data with a formal privacy guarantee, such as differential privacy (DP), provides a promising path to mitigating these privacy concerns, but previous approaches in this direction have typically failed to produce synthetic data of high quality. In this work, we show that a simple and practical recipe in the text domain is effective: simply fine-tuning a pretrained generative language model with DP enables the model to generate useful synthetic text with strong privacy protection. Through extensive empirical analyses on both benchmark and private customer data, we demonstrate that our method produces synthetic text that is competitive in terms of utility with its non-private counterpart, meanwhile providing strong protection against potential privacy leakages.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2210.14348

Country:

Europe (0.93)
North America > United States > California > Santa Clara County (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Consumer Products & Services > Restaurants (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Bootstrapping a User-Centered Task-Oriented Dialogue System

Chen, Shijie, Chen, Ziru, Deng, Xiang, Lewis, Ashley, Mo, Lingbo, Stevens, Samuel, Wang, Zhen, Yue, Xiang, Zhang, Tianshu, Su, Yu, Sun, Huan

arXiv.org Artificial IntelligenceJul-21-2022

We present TacoBot, a task-oriented dialogue system built for the inaugural Alexa Prize TaskBot Challenge, which assists users in completing multi-step cooking and home improvement tasks. TacoBot is designed with a user-centered principle and aspires to deliver a collaborative and accessible dialogue experience. Towards that end, it is equipped with accurate language understanding, flexible dialogue management, and engaging response generation. Furthermore, TacoBot is backed by a strong search engine and an automated end-to-end test suite. In bootstrapping the development of TacoBot, we explore a series of data augmentation strategies to train advanced neural language processing models and continuously improve the dialogue experience with collected real conversations. At the end of the semifinals, TacoBot achieved an average rating of 3.55/5.0.

machine learning, natural language, tacobot, (17 more...)

arXiv.org Artificial Intelligence

2207.05223

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.46)

Add feedback