AITopics | Yang, Xiaocong

Collaborating Authors

Yang, Xiaocong

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning by Analogy: Enhancing Few-Shot Prompting for Math Word Problem Solving with Computational Graph-Based Retrieval

Yang, Xiaocong, Lin, Jiacheng, Wang, Ziqi, Zhai, Chengxiang

arXiv.org Artificial IntelligenceNov-25-2024

Large language models (LLMs) are known to struggle with complicated reasoning tasks such as math word problems (MWPs). In this paper, we present how analogy from similarly structured questions can improve LLMs' problem-solving capabilities for MWPs. Specifically, we rely on the retrieval of problems with similar computational graphs to the given question to serve as exemplars in the prompt, providing the correct reasoning path for the generation model to refer to. Empirical results across six math word problem datasets demonstrate the effectiveness of our proposed method, which achieves a significant improvement of up to 6.7 percent on average in absolute value, compared to baseline methods. These results highlight our method's potential in addressing the reasoning challenges in current LLMs.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2411.16454

Country:

Europe (0.68)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.93)

Industry:

Education (0.68)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Cascade Speculative Drafting for Even Faster LLM Inference

Chen, Ziyi, Yang, Xiaocong, Lin, Jiacheng, Sun, Chenkai, Huang, Jie, Chang, Kevin Chen-Chuan

arXiv.org Artificial IntelligenceDec-21-2023

Speculative decoding enhances the efficiency of large language models (LLMs) by leveraging a draft model to draft for a larger target model to review. However, drafting in speculative decoding involves slow autoregressive generation and generating tokens of different importance with the same time allocation. These two inefficiencies lead to its suboptimal performance. To address this issue, we introduce Cascade Speculative Drafting (CS. Drafting), a novel approach that employs two types of cascades. The Vertical Cascade eliminates autoregressive generation from neural models. The Horizontal Cascade constitutes efficient time allocation in drafting with its optimality supported by our theoretical analysis. Combining both cascades, our CS. Drafting algorithm has achieved up to 72 percent additional speedup over speculative decoding in our experiments while keeping the same output distribution.

draft model, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2312.11462

Country: North America > United States (0.14)

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Parameter-Efficient Tuning with Special Token Adaptation

Yang, Xiaocong, Huang, James Y., Zhou, Wenxuan, Chen, Muhao

arXiv.org Artificial IntelligenceFeb-14-2023

Parameter-efficient tuning aims at updating only a small subset of parameters when adapting a pretrained model to downstream tasks. In this work, we introduce PASTA, in which we only modify the special token representations (e.g., [SEP] and [CLS] in BERT) before the self-attention module at each layer in Transformer-based models. PASTA achieves comparable performance to full finetuning in natural language understanding tasks including text classification and NER with up to only 0.029% of total parameters trained. Our work not only provides a simple yet effective way of parameter-efficient tuning, which has a wide range of practical applications when deploying finetuned models for multiple tasks, but also demonstrates the pivotal role of special tokens in pretrained language models

computational linguistic, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2210.04382

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

EVA: An Open-Domain Chinese Dialogue System with Large-Scale Generative Pre-Training

Zhou, Hao, Ke, Pei, Zhang, Zheng, Gu, Yuxian, Zheng, Yinhe, Zheng, Chujie, Wang, Yida, Wu, Chen Henry, Sun, Hao, Yang, Xiaocong, Wen, Bosi, Zhu, Xiaoyan, Huang, Minlie, Tang, Jie

arXiv.org Artificial IntelligenceAug-3-2021

Although pre-trained language models have remarkably enhanced the generation ability of dialogue systems, open-domain Chinese dialogue systems are still limited by the dialogue data and the model size compared with English ones. In this paper, we propose EVA, a Chinese dialogue system that contains the largest Chinese pre-trained dialogue model with 2.8B parameters. To build this model, we collect the largest Chinese dialogue dataset named WDC-Dialogue from various public social media. This dataset contains 1.4B context-response pairs and is used as the pre-training corpus of EVA. Extensive experiments on automatic and human evaluation show that EVA outperforms other Chinese pre-trained dialogue models especially in the multi-turn interaction of human-bot conversations.

artificial intelligence, dialogue, natural language, (19 more...)

arXiv.org Artificial Intelligence

2108.01547

Country:

North America > United States > Pennsylvania (0.14)
North America > United States > California (0.14)
North America > Canada > Quebec (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.47)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback