AITopics | Wan, Guanglu

Collaborating Authors

Wan, Guanglu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs

Wang, Haoyu, Liu, Bei, Shao, Hang, Xiao, Bo, Zeng, Ke, Wan, Guanglu, Qian, Yanmin

arXiv.org Artificial IntelligenceJun-2-2024

Parameter quantization for Large Language Models (LLMs) has attracted increasing attentions recently in reducing memory costs and improving computational efficiency. Early approaches have been widely adopted. However, the existing methods suffer from poor performance in low-bit (such as 2 to 3 bits) scenarios. In this paper, we present a novel and effective Column-Level Adaptive weight Quantization (CLAQ) framework by introducing three different types of adaptive strategies for LLM quantization. Firstly, a K-Means clustering based algorithm is proposed that allows dynamic generation of quantization centroids for each column of a parameter matrix. Secondly, we design an outlier-guided adaptive precision search strategy which can dynamically assign varying bit-widths to different columns. Finally, a dynamic outlier reservation scheme is developed to retain some parameters in their original float point precision, in trade off of boosted model performance. Experiments on various mainstream open source LLMs including LLaMA-1, LLaMA-2 and Yi demonstrate that our methods achieve the state-of-the-art results across different bit settings, especially in extremely low-bit scenarios. Code is available at https://github.com/fayuge/CLAQ.

large language model, machine learning, quantization, (16 more...)

arXiv.org Artificial Intelligence

2405.17233

Country: Asia > China (0.29)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Learning or Self-aligning? Rethinking Instruction Fine-tuning

Ren, Mengjie, Cao, Boxi, Lin, Hongyu, Liu, Cao, Han, Xianpei, Zeng, Ke, Wan, Guanglu, Cai, Xunliang, Sun, Le

arXiv.org Artificial IntelligenceMar-2-2024

Instruction Fine-tuning~(IFT) is a critical phase in building large language models~(LLMs). Previous works mainly focus on the IFT's role in the transfer of behavioral norms and the learning of additional world knowledge. However, the understanding of the underlying mechanisms of IFT remains significantly limited. In this paper, we design a knowledge intervention framework to decouple the potential underlying factors of IFT, thereby enabling individual analysis of different factors. Surprisingly, our experiments reveal that attempting to learn additional world knowledge through IFT often struggles to yield positive impacts and can even lead to markedly negative effects. Further, we discover that maintaining internal knowledge consistency before and after IFT is a critical factor for achieving successful IFT. Our findings reveal the underlying mechanisms of IFT and provide robust support for some very recent and potential future works.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2402.18243

Country:

North America > United States > Connecticut > Fairfield County (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Energy > Renewable > Solar (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

A Task-oriented Dialog Model with Task-progressive and Policy-aware Pre-training

Zhong, Lucen, Lu, Hengtong, Yuan, Caixia, Wang, Xiaojie, Sun, Jiashen, Zeng, Ke, Wan, Guanglu

arXiv.org Artificial IntelligenceOct-1-2023

Pre-trained conversation models (PCMs) have achieved promising progress in recent years. However, existing PCMs for Task-oriented dialog (TOD) are insufficient for capturing the sequential nature of the TOD-related tasks, as well as for learning dialog policy information. To alleviate these problems, this paper proposes a task-progressive PCM with two policy-aware pre-training tasks. The model is pre-trained through three stages where TOD-related tasks are progressively employed according to the task logic of the TOD system. A global policy consistency task is designed to capture the multi-turn dialog policy sequential relation, and an act-based contrastive learning task is designed to capture similarities among samples with the same dialog policy. Our model achieves better results on both MultiWOZ and In-Car end-to-end dialog modeling benchmarks with only 18% parameters and 25% pre-training data compared to the previous state-of-the-art PCM, GALAXY. We make our code and data publicly available.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2310.00597

Country: Asia > China (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.47)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

Add feedback

CPPF: A contextual and post-processing-free model for automatic speech recognition

Zhang, Lei, Tian, Zhengkun, Chen, Xiang, Sun, Jiaming, Xiang, Hongyu, Ding, Ke, Wan, Guanglu

arXiv.org Artificial IntelligenceSep-20-2023

ASR systems have become increasingly widespread in recent years. However, their textual outputs often require post-processing tasks before they can be practically utilized. To address this issue, we draw inspiration from the multifaceted capabilities of LLMs and Whisper, and focus on integrating multiple ASR text processing tasks related to speech recognition into the ASR model. This integration not only shortens the multi-stage pipeline, but also prevents the propagation of cascading errors, resulting in direct generation of post-processed text. In this study, we focus on ASR-related processing tasks, including Contextual ASR and multiple ASR post processing tasks. To achieve this objective, we introduce the CPPF model, which offers a versatile and highly effective alternative to ASR processing. CPPF seamlessly integrates these tasks without any significant loss in recognition performance.

artificial intelligence, contextual and post-processing-free model, natural language, (2 more...)

arXiv.org Artificial Intelligence

2309.07413

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.87)

Add feedback

Enhancing Multilingual Speech Recognition through Language Prompt Tuning and Frame-Level Language Adapter

Li, Song, You, Yongbin, Wang, Xuezhi, Ding, Ke, Wan, Guanglu

arXiv.org Artificial IntelligenceSep-19-2023

Ref. [6, 7] introduced an additional language identification (LID) module Multilingual intelligent assistants, such as ChatGPT, have to predict language information, while Ref. [2] treated language recently gained popularity. To further expand the applications information as a special textual token and concatenated of multilingual artificial intelligence (AI) assistants and it to the input of the decoder of the autoregressive speech facilitate international communication, it is essential to enhance recognition model, achieving joint modeling of speech recognition the performance of multilingual speech recognition, and language identification. Ref. [3] provided language which is a crucial component of speech interaction. In this information directly as prior information to speech recognition paper, we propose two simple and parameter-efficient methods: models, this can be achieved by encoding language information language prompt tuning and f rame-level language as a one-hot vector or embedding and concatenating adapter, to respectively enhance language-configurable and it with acoustic features.

machine learning, natural language, recognition, (16 more...)

arXiv.org Artificial Intelligence

2309.09443

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Exploiting Pseudo Future Contexts for Emotion Recognition in Conversations

Wei, Yinyi, Liu, Shuaipeng, Yan, Hailei, Ye, Wei, Mo, Tong, Wan, Guanglu

arXiv.org Artificial IntelligenceJun-27-2023

With the extensive accumulation of conversational data on the Internet, emotion recognition in conversations (ERC) has received increasing attention. Previous efforts of this task mainly focus on leveraging contextual and speaker-specific features, or integrating heterogeneous external commonsense knowledge. Among them, some heavily rely on future contexts, which, however, are not always available in real-life scenarios. This fact inspires us to generate pseudo future contexts to improve ERC. Specifically, for an utterance, we generate its future context with pre-trained language models, potentially containing extra beneficial knowledge in a conversational form homogeneous with the historical ones. These characteristics make pseudo future contexts easily fused with historical contexts and historical speaker-specific contexts, yielding a conceptually simple framework systematically integrating multi-contexts. Experimental results on four ERC datasets demonstrate our method's superiority. Further in-depth analyses reveal that pseudo future contexts can rival real ones to some extent, especially in relatively context-independent conversations.

machine learning, natural language, utterance, (16 more...)

arXiv.org Artificial Intelligence

2306.15376

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (0.62)

Add feedback

Dialog-to-Actions: Building Task-Oriented Dialogue System via Action-Level Generation

Hua, Yuncheng, Xi, Xiangyu, Jiang, Zheng, Zhang, Guanwei, Sun, Chaobo, Wan, Guanglu, Ye, Wei

arXiv.org Artificial IntelligenceApr-3-2023

End-to-end generation-based approaches have been investigated and applied in task-oriented dialogue systems. However, in industrial scenarios, existing methods face the bottlenecks of controllability (e.g., domain-inconsistent responses, repetition problem, etc) and efficiency (e.g., long computation time, etc). In this paper, we propose a task-oriented dialogue system via action-level generation. Specifically, we first construct dialogue actions from large-scale dialogues and represent each natural language (NL) response as a sequence of dialogue actions. Further, we train a Sequence-to-Sequence model which takes the dialogue history as input and outputs sequence of dialogue actions. The generated dialogue actions are transformed into verbal responses. Experimental results show that our light-weighted method achieves competitive performance, and has the advantage of controllability and efficiency.

dialogue action, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2304.00884

Country:

Europe (0.46)
North America > United States (0.30)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

MUSIED: A Benchmark for Event Detection from Multi-Source Heterogeneous Informal Texts

Xi, Xiangyu, Lv, Jianwei, Liu, Shuaipeng, Ye, Wei, Yang, Fan, Wan, Guanglu

arXiv.org Artificial IntelligenceNov-25-2022

Event detection (ED) identifies and classifies event triggers from unstructured texts, serving as a fundamental task for information extraction. Despite the remarkable progress achieved in the past several years, most research efforts focus on detecting events from formal texts (e.g., news articles, Wikipedia documents, financial announcements). Moreover, the texts in each dataset are either from a single source or multiple yet relatively homogeneous sources. With massive amounts of user-generated text accumulating on the Web and inside enterprises, identifying meaningful events in these informal texts, usually from multiple heterogeneous sources, has become a problem of significant practical value. As a pioneering exploration that expands event detection to the scenarios involving informal and heterogeneous texts, we propose a new large-scale Chinese event detection dataset based on user reviews, text conversations, and phone conversations in a leading e-commerce platform for food service. We carefully investigate the proposed dataset's textual informality and multi-source heterogeneity characteristics by inspecting data samples quantitatively and qualitatively. Extensive experiments with state-of-the-art event detection methods verify the unique challenges posed by these characteristics, indicating that multi-source informal event detection remains an open problem and requires further efforts. Our benchmark and code are released at \url{https://github.com/myeclipse/MUSIED}.

computational linguistic, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2211.13896

Country:

Europe (0.68)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.82)

Industry:

Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.48)
Information Technology > Services (0.34)
Consumer Products & Services > Restaurants (0.34)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.88)

Add feedback

An Empirical Study of Language Model Integration for Transducer based Speech Recognition

Zheng, Huahuan, An, Keyu, Ou, Zhijian, Huang, Chen, Ding, Ke, Wan, Guanglu

arXiv.org Artificial IntelligenceAug-3-2022

Utilizing text-only data with an external language model (ELM) in end-to-end RNN-Transducer (RNN-T) for speech recognition is challenging. Recently, a class of methods such as density ratio (DR) and internal language model estimation (ILME) have been developed, outperforming the classic shallow fusion (SF) method. The basic idea behind these methods is that RNN-T posterior should first subtract the implicitly learned internal language model (ILM) prior, in order to integrate the ELM. While recent studies suggest that RNN-T only learns some low-order language model information, the DR method uses a well-trained neural language model with full context, which may be inappropriate for the estimation of ILM and deteriorate the integration performance. Based on the DR method, we propose a low-order density ratio method (LODR) by replacing the estimation with a low-order weak language model. Extensive empirical experiments are conducted on both in-domain and cross-domain scenarios on English LibriSpeech & Tedlium-2 and Chinese WenetSpeech & AISHELL-1 datasets. It is shown that LODR consistently outperforms SF in all tasks, while performing generally close to ILME and better than DR in most tests.

artificial intelligence, natural language, speech recognition, (17 more...)

arXiv.org Artificial Intelligence

2203.16776

Genre: Research Report > New Finding (0.49)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.87)

Add feedback

CUSIDE: Chunking, Simulating Future Context and Decoding for Streaming ASR

An, Keyu, Zheng, Huahuan, Ou, Zhijian, Xiang, Hongyu, Ding, Ke, Wan, Guanglu

arXiv.org Artificial IntelligenceAug-2-2022

History and future contextual information are known to be important for accurate acoustic modeling. However, acquiring future context brings latency for streaming ASR. In this paper, we propose a new framework - Chunking, Simulating Future Context and Decoding (CUSIDE) for streaming speech recognition. A new simulation module is introduced to recursively simulate the future contextual frames, without waiting for future context. The simulation module is jointly trained with the ASR model using a self-supervised loss; the ASR model is optimized with the usual ASR loss, e.g., CTC-CRF as used in our experiments. Experiments show that, compared to using real future frames as right context, using simulated future context can drastically reduce latency while maintaining recognition accuracy. With CUSIDE, we obtain new state-of-the-art streaming ASR results on the AISHELL-1 dataset.

artificial intelligence, encoder, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2203.16758

Genre: Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback