AITopics | Xiong, Wayne

Collaborating Authors

Xiong, Wayne

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Integrative Decoding: Improve Factuality via Implicit Self-consistency

Cheng, Yi, Liang, Xiao, Gong, Yeyun, Xiao, Wen, Wang, Song, Zhang, Yuji, Hou, Wenjun, Xu, Kaishuai, Liu, Wenge, Li, Wenjie, Jiao, Jian, Chen, Qi, Cheng, Peng, Xiong, Wayne

arXiv.org Artificial IntelligenceDec-8-2024

Self-consistency-based approaches, which involve repeatedly sampling multiple outputs and selecting the most consistent one as the final response, prove to be remarkably effective in improving the factual accuracy of large language models. Nonetheless, existing methods usually have strict constraints on the task format, largely limiting their applicability. In this paper, we present Integrative Decoding (ID), to unlock the potential of self-consistency in open-ended generation tasks. ID operates by constructing a set of inputs, each prepended with a previously sampled response, and then processes them concurrently, with the next token being selected by aggregating of all their corresponding predictions at each decoding step. In essence, this simple approach implicitly incorporates self-consistency in the decoding objective. Extensive evaluation shows that ID consistently enhances factuality over a wide range of language models, with substantial improvements on the TruthfulQA (+11.2%), Biographies (+15.4%) and LongFact (+8.5%) benchmarks. The performance gains amplify progressively as the number of sampled responses increases, indicating the potential of ID to scale up with repeated sampling.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2410.01556

Country: North America > United States > California (0.46)

Genre:

Personal (1.00)
Research Report > New Finding (0.45)

Industry:

Leisure & Entertainment > Sports > Tennis (1.00)
Information Technology (1.00)
Health & Medicine (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.92)

Add feedback

Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning

Fu, Yu, Cai, Zefan, Asi, Abedelkadir, Xiong, Wayne, Dong, Yue, Xiao, Wen

arXiv.org Artificial IntelligenceNov-13-2024

Key-Value (KV) caching is a common technique to enhance the computational efficiency of Large Language Models (LLMs), but its memory overhead grows rapidly with input length. Prior work has shown that not all tokens are equally important for text generation, proposing layer-level KV cache compression to selectively retain key information. Recognizing the distinct roles of attention heads in generation, we propose HeadKV, a head-level KV cache compression method, and HeadKV-R2, which leverages a novel contextual reasoning ability estimation for compression. Our approach operates at the level of individual heads, estimating their importance for contextual QA tasks that require both retrieval and reasoning capabilities. Extensive experiments across diverse benchmarks (LongBench, LooGLE), model architectures (e.g., Llama-3-8B-Instruct, Mistral-7B-Instruct), and long-context abilities tests demonstrate that our head-level KV cache compression significantly outperforms strong baselines, particularly in low-resource settings (KV size = 64 & 128). Notably, our method retains just 1.5% of the KV cache while achieving 97% of the performance of the full KV cache on the contextual question answering benchmark. Modern Large Language Models (LLMs) increasingly support extremely long inputs: GPT-4 (Achiam et al., 2023), Llama-3 (Dubey et al., 2024), and Qwen-2 (Yang et al., 2024) handle up to 128K tokens, while Claude (Anthropic, 2024) supports up to 1 million tokens. These extended capacities improve performance on tasks like dialogue generation (Li et al., 2024a; Yi et al., 2024), question answering (Ho et al., 2020; Xu et al., 2023), and summarization (Xiao & Carenini, 2019; Koh et al., 2022). As input lengths increase, memory usage and latency grow significantly due to self-attention in transformers.

large language model, machine learning, mistral-7b-instruct, (18 more...)

arXiv.org Artificial Intelligence

2410.19258

Country:

Asia (0.67)
Europe (0.46)
North America > United States > Wisconsin (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Can we only use guideline instead of shot in prompt?

Chen, Jiaxiang, Wang, Song, Li, Zhucong, Xiong, Wayne, Qu, Lizhen, Xu, Zenglin, Qi, Yuan

arXiv.org Artificial IntelligenceSep-3-2024

Currently, prompting techniques can be mainly divided into two categories:1)shot method implicitly inspires the model to answer the question by mimicing the steps in the given example, e.g., the few-shot CoT. 2) Guideline method explicitly instructs the model to reason by following guidelines, which contains succinct and concise task-specific knowledge. Shot method is prone to difficulties in terms of selection of shots type, the number of shots, and the design of the reasoning steps, so a question arises: can we only use guideline instead of shot in the prompt? To this end, we propose the FGT framework to automatically learn task-specific guidelines from dataset consisting of Feedback, Guideline, and Tree-gather agents. First, the feedback agent is designed to evaluate the outcomes, both right and wrong, of each Q&A to gather insights guiding more effective optimization strategies. Next, the guideline agent is tasked with deriving guidelines from each piece of feedback and storing them in local memory. Lastly, the tree-gather agent aggregates all guidelines hierarchically through a tree structure, ultimately obtaining all unduplicated guidelines from a global perspective. In addition, we induce the model to generate intermediate processes to ensure the reasoning consistent with the guidelines. Experimental results demonstrate that our approach achieves superior performance across multiple tasks, thereby highlighting the effectiveness of using the guidelines in prompt.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2409.12979

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.76)

Add feedback

PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling

Cai., Zefan, Zhang, Yichi, Gao, Bofei, Liu, Yuliang, Liu, Tianyu, Lu, Keming, Xiong, Wayne, Dong, Yue, Chang, Baobao, Hu, Junjie, Xiao, Wen

arXiv.org Artificial IntelligenceJun-16-2024

In this study, we investigate whether attention-based information flow inside large language models (LLMs) is aggregated through noticeable patterns for long context processing. Our observations reveal that LLMs aggregate information through Pyramidal Information Funneling where attention is scattering widely in lower layers, progressively consolidating within specific contexts, and ultimately focusing on critical tokens (a.k.a massive activation or attention sink) in higher layers. Motivated by these insights, we developed PyramidKV, a novel and effective KV cache compression method. This approach dynamically adjusts the KV cache size across different layers, allocating more cache in lower layers and less in higher ones, diverging from traditional methods that maintain a uniform KV cache size. Our experimental evaluations, utilizing the LongBench benchmark, show that PyramidKV matches the performance of models with a full KV cache while retaining only 12% of the KV cache, thus significantly reducing memory usage. In scenarios emphasizing memory efficiency, where only 0.7% of the KV cache is maintained, PyramidKV surpasses other KV cache compression techniques, achieving up to a 20.5 absolute accuracy improvement on TREC.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2406.02069

Country:

North America > United States > Wisconsin (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization

He, Pengcheng, Peng, Baolin, Lu, Liyang, Wang, Song, Mei, Jie, Liu, Yang, Xu, Ruochen, Awadalla, Hany Hassan, Shi, Yu, Zhu, Chenguang, Xiong, Wayne, Zeng, Michael, Gao, Jianfeng, Huang, Xuedong

arXiv.org Artificial IntelligenceJun-7-2023

This paper presents Z-Code++, a new pre-trained language model optimized for abstractive text summarization. The model extends the state of the art encoder-decoder model using three techniques. First, we use a two-phase pre-training process to improve model's performance on low-resource summarization tasks. The model is first pre-trained using text corpora for language understanding, and then is continually pre-trained on summarization corpora for grounded text generation. Second, we replace self-attention layers in the encoder with disentangled attention layers, where each word is represented using two vectors that encode its content and position, respectively. Third, we use fusion-in-encoder, a simple yet effective method of encoding long sequences in a hierarchical manner. Z-Code++ creates new state of the art on 9 out of 13 text summarization tasks across 5 languages. Our model is parameter-efficient in that it outperforms the 600x larger PaLM-540B on XSum, and the finetuned 200x larger GPT3-175B on SAMSum. In zero-shot and few-shot settings, our model substantially outperforms the competing models.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2208.0977

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)

Add feedback

Interactive Editing for Text Summarization

Xie, Yujia, Wang, Xun, Chen, Si-Qing, Xiong, Wayne, He, Pengcheng

arXiv.org Artificial IntelligenceJun-5-2023

Summarizing lengthy documents is a common and essential task in our daily lives. Although recent advancements in neural summarization models can assist in crafting general-purpose summaries, human writers often have specific requirements that call for a more customized approach. To address this need, we introduce REVISE (Refinement and Editing via Iterative Summarization Enhancement), an innovative framework designed to facilitate iterative editing and refinement of draft summaries by human writers. Within our framework, writers can effortlessly modify unsatisfactory segments at any location or length and provide optional starting phrases -- our system will generate coherent alternatives that seamlessly integrate with the existing summary. At its core, REVISE incorporates a modified fill-in-the-middle model with the encoder-decoder architecture while developing novel evaluation metrics tailored for the summarization task. In essence, our framework empowers users to create high-quality, personalized summaries by effectively harnessing both human expertise and AI capabilities, ultimately transforming the summarization process into a truly collaborative and adaptive experience.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2306.03067

Country: North America > Canada (0.14)

Genre: Research Report (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Momentum Calibration for Text Generation

Zhang, Xingxing, Liu, Yiran, Wang, Xun, He, Pengcheng, Yu, Yang, Chen, Si-Qing, Xiong, Wayne, Wei, Furu

arXiv.org Artificial IntelligenceDec-8-2022

The input and output of most text generation tasks can be transformed to two sequences of tokens and they can be modeled using sequence-to-sequence learning modeling tools such as Transformers. These models are usually trained by maximizing the likelihood the output text sequence and assumes the input sequence and all gold preceding tokens are given during training, while during inference the model suffers from the exposure bias problem (i.e., it only has access to its previously predicted tokens rather gold tokens during beam search). In this paper, we propose MoCa ({\bf Mo}mentum {\bf Ca}libration) for text generation. MoCa is an online method that dynamically generates slowly evolving (but consistent) samples using a momentum moving average generator with beam search and MoCa learns to align its model scores of these samples with their actual qualities. Experiments on four text generation datasets (i.e., CNN/DailyMail, XSum, SAMSum and Gigaword) show MoCa consistently improves strong pre-trained transformers using vanilla fine-tuning and we achieve the state-of-the-art results on CNN/DailyMail and SAMSum datasets.

computational linguistic, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2212.04257

Country: North America > United States (1.00)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.58)

Add feedback

Progressive Joint Modeling in Unsupervised Single-channel Overlapped Speech Recognition

Chen, Zhehuai, Droppo, Jasha, Li, Jinyu, Xiong, Wayne

arXiv.org Artificial IntelligenceOct-19-2017

Unsupervised single-channel overlapped speech recognition is one of the hardest problems in automatic speech recognition (ASR). Permutation invariant training (PIT) is a state of the art model-based approach, which applies a single neural network to solve this single-input, multiple-output modeling problem. We propose to advance the current state of the art by imposing a modular structure on the neural network, applying a progressive pretraining regimen, and improving the objective function with transfer learning and a discriminative training criterion. The modular structure splits the problem into three sub-tasks: frame-wise interpreting, utterance-level speaker tracing, and speech recognition. The pretraining regimen uses these modules to solve progressively harder tasks. Transfer learning leverages parallel clean speech to improve the training targets for the network. Our discriminative training formulation is a modification of standard formulations, that also penalizes competing outputs of the system. Experiments are conducted on the artificial overlapped Switchboard and hub5e-swb dataset. The proposed framework achieves over 30% relative improvement of WER over both a strong jointly trained system, PIT for ASR, and a separately optimized system, PIT for speech separation with clean speech ASR model. The improvement comes from better model generalization, training efficiency and the sequence level linguistic knowledge integration.

deep learning, neural network, speech recognition, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TASLP.2017.2765834

1707.07048

Country:

Asia > China (0.46)
North America > United States > Washington > King County (0.14)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback