AITopics | Qin, Chengwei

Collaborating Authors

Qin, Chengwei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Improving In-context Learning via Bidirectional Alignment

Qin, Chengwei, Xia, Wenhan, Jiao, Fangkai, Joty, Shafiq

arXiv.org Artificial IntelligenceDec-28-2023

Large language models (LLMs) have shown impressive few-shot generalization on many tasks via in-context learning (ICL). Despite their success in showing such emergent abilities, the scale and complexity of larger models also lead to unprecedentedly high computational demands and deployment challenges. In reaction, researchers explore transferring the powerful capabilities of larger models to more efficient and compact models by typically aligning the output of smaller models with that of larger models. Existing methods either train smaller models on the generated outputs of larger models or to imitate their token-level probability distributions. However, these distillation methods pay little to no attention to the input part, which also plays a crucial role in ICL. Based on the finding that the performance of ICL is highly sensitive to the selection of demonstration examples, we propose Bidirectional Alignment (BiAlign) to fully leverage the models' preferences for ICL examples to improve the ICL abilities of smaller models. Specifically, we introduce the alignment of input preferences between smaller and larger models by incorporating a novel ranking loss, in addition to aligning the token-level output distribution. With extensive experiments and analysis, we demonstrate that BiAlign can consistently outperform existing baselines on a variety of tasks including language understanding, reasoning, and coding.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2312.17055

Country:

North America > United States (0.47)
Asia > Middle East > UAE (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Retrieving Multimodal Information for Augmented Generation: A Survey

Zhao, Ruochen, Chen, Hailin, Wang, Weishi, Jiao, Fangkai, Do, Xuan Long, Qin, Chengwei, Ding, Bosheng, Guo, Xiaobao, Li, Minzhi, Li, Xingxuan, Joty, Shafiq

arXiv.org Artificial IntelligenceNov-30-2023

As Large Language Models (LLMs) become popular, there emerged an important trend of using multimodality to augment the LLMs' generation ability, which enables LLMs to better interact with the world. However, there lacks a unified perception of at which stage and how to incorporate different modalities. In this survey, we review methods that assist and augment generative models by retrieving multimodal knowledge, whose formats range from images, codes, tables, graphs, to audio. Such methods offer a promising solution to important concerns such as factuality, reasoning, interpretability, and robustness. By providing an in-depth review, this survey is expected to provide scholars with a deeper understanding of the methods' applications and encourage them to adapt existing techniques to the fast-growing field of LLMs.

computational linguistic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2303.10868

Country:

Europe (1.00)
Asia > Middle East (0.69)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.34)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Lifelong Sequence Generation with Dynamic Module Expansion and Adaptation

Qin, Chengwei, Chen, Chen, Joty, Shafiq

arXiv.org Artificial IntelligenceNov-22-2023

Lifelong sequence generation (LSG), a problem in continual learning, aims to continually train a model on a sequence of generation tasks to learn constantly emerging new generation patterns while avoiding the forgetting of previous knowledge. Existing LSG methods mainly focus on maintaining old knowledge while paying little attention to knowledge transfer across tasks. In contrast, humans can better learn new tasks by leveraging previously acquired knowledge from similar tasks. Inspired by the learning paradigm of humans, we propose Dynamic Module Expansion and Adaptation (DMEA), which enables the model to dynamically determine the architecture for acquiring new knowledge based on task correlation and select the most similar previous tasks to facilitate adaptation to new tasks. In addition, as the learning process can easily be biased towards the current task which might cause more severe forgetting of previously learned knowledge, we propose dynamic gradient scaling to balance the learning of the current task and replayed tasks. With extensive experiments, we demonstrate that DMEA can consistently outperform existing methods in different LSG settings.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2310.09886

Country:

Europe (1.00)
North America > United States > California (0.14)
North America > United States > Texas (0.14)
North America > United States > Minnesota (0.14)

Genre: Research Report > Experimental Study (0.46)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Learning to Initialize: Can Meta Learning Improve Cross-task Generalization in Prompt Tuning?

Qin, Chengwei, Li, Qian, Zhao, Ruochen, Joty, Shafiq

arXiv.org Artificial IntelligenceNov-19-2023

Prompt tuning (PT) which only tunes the embeddings of an additional sequence of tokens per task, keeping the pre-trained language model (PLM) frozen, has shown remarkable performance in few-shot learning. Despite this, PT has been shown to rely heavily on good initialization of the prompt embeddings. In this work, we study meta prompt tuning (MPT) to systematically explore how meta-learning can help improve (if it can) cross-task generalization in PT through learning to initialize the prompt embeddings from other relevant tasks. We empirically analyze a representative set of meta learning algorithms in a wide range of adaptation settings with different source/target task configurations on a large set of few-shot tasks. With extensive experiments and analysis, we demonstrate the effectiveness of MPT. We find the improvement to be significant particularly on classification tasks. For other kinds of tasks such as question answering, we observe that while MPT can outperform PT in most cases, it does not always outperform multi-task learning. We further provide an in-depth analysis from the perspective of task similarity.

computational linguistic, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2302.08143

Country:

Europe (1.00)
Asia (0.94)
North America > Canada (0.68)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (1.00)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Is ChatGPT a General-Purpose Natural Language Processing Task Solver?

Qin, Chengwei, Zhang, Aston, Zhang, Zhuosheng, Chen, Jiaao, Yasunaga, Michihiro, Yang, Diyi

arXiv.org Artificial IntelligenceNov-19-2023

Spurred by advancements in scale, large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot -- i.e., without adaptation on downstream data. Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community due to the fact that it can generate high-quality responses to human input and self-correct previous mistakes based on subsequent conversations. However, it is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot. In this work, we empirically analyze the zero-shot learning ability of ChatGPT by evaluating it on 20 popular NLP datasets covering 7 representative task categories. With extensive empirical studies, we demonstrate both the effectiveness and limitations of the current version of ChatGPT. We find that ChatGPT performs well on many tasks favoring reasoning capabilities (e.g., arithmetic reasoning) while it still faces challenges when solving specific tasks such as sequence tagging. We additionally provide in-depth analysis through qualitative case studies.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2302.06476

Country:

Asia (1.00)
Europe > United Kingdom > England (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Personal (1.00)
Research Report (0.81)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Health & Medicine > Consumer Health (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

In-Context Learning with Iterative Demonstration Selection

Qin, Chengwei, Zhang, Aston, Dagar, Anirudh, Ye, Wenming

arXiv.org Artificial IntelligenceOct-22-2023

Spurred by advancements in scale, large language models (LLMs) have demonstrated strong few-shot learning ability via in-context learning (ICL). However, the performance of ICL has been shown to be highly sensitive to the selection of few-shot demonstrations. Selecting the most suitable examples as context remains an ongoing challenge and an open problem. Existing literature has highlighted the importance of selecting examples that are diverse or semantically similar to the test sample while ignoring the fact that the optimal selection dimension, i.e., diversity or similarity, is task-specific. Leveraging the merits of both dimensions, we propose Iterative Demonstration Selection (IDS). Using zero-shot chain-of-thought reasoning (Zero-shot-CoT), IDS iteratively selects examples that are diverse but still strongly correlated with the test sample as ICL demonstrations. Specifically, IDS applies Zero-shot-CoT to the test sample before demonstration selection. The output reasoning path is then used to choose demonstrations that are prepended to the test sample for inference. The generated answer is accompanied by its corresponding reasoning path for extracting a new set of demonstrations in the next iteration. After several iterations, IDS adopts majority voting to obtain the final result. Through extensive experiments on tasks including commonsense reasoning, question answering, topic classification, and sentiment analysis, we demonstrate that IDS can consistently outperform existing ICL demonstration selection methods.

demonstration, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2310.09881

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.83)

Industry: Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

PromptSum: Parameter-Efficient Controllable Abstractive Summarization

Ravaut, Mathieu, Chen, Hailin, Zhao, Ruochen, Qin, Chengwei, Joty, Shafiq, Chen, Nancy

arXiv.org Artificial IntelligenceAug-6-2023

Prompt tuning (PT), a parameter-efficient technique that only tunes the additional prompt embeddings while keeping the backbone pre-trained language model (PLM) frozen, has shown promising results in language understanding tasks, especially in low-resource scenarios. However, effective prompt design methods suitable for generation tasks such as summarization are still lacking. At the same time, summarization guided through instructions (discrete prompts) can achieve a desirable double objective of high quality and controllability in summary generation. Towards a goal of strong summarization performance under the triple conditions of parameter-efficiency, data-efficiency, and controllability, we introduce PromptSum, a method combining PT with a multi-task objective and discrete entity prompts for abstractive summarization. Our model achieves competitive ROUGE results on popular abstractive summarization benchmarks coupled with a strong level of controllability through entities, all while only tuning several orders of magnitude less parameters.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2308.03117

Country: North America > United States > Michigan (0.14)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Sports > Motorsports > Formula One (1.00)
Government > Regional Government > North America Government > United States Government (0.92)
Automobiles & Trucks > Manufacturer (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Is GPT-3 a Good Data Annotator?

Ding, Bosheng, Qin, Chengwei, Liu, Linlin, Chia, Yew Ken, Joty, Shafiq, Li, Boyang, Bing, Lidong

arXiv.org Artificial IntelligenceJun-14-2023

Evaluations show that GPT-3 has gained The democratization of artificial intelligence (AI) through pretraining a surprisingly wide range of (Garvey, 2018; Rubeis et al., 2022) aims to provide knowledge, which can be transferred to downstream access to AI technologies to all members of tasks through knowledge distillation (Kim society, including individuals, small-and mediumsized et al., 2022). We present some examples in Appendix enterprises (SMEs), academic research labs, A.12. Due to the model architecture and and nonprofit organizations. Achieving this goal is pretraining tasks designed for auto-regressive generation, crucial for the promotion of innovation, economic GPT-3 is capable of generating human-like growth, and fairness and equality. As typical AI text and performing a broad array of NLP tasks, models are usually data-hungry, one significant obstacle such as machine translation, summarization, and of AI democratization is the preparation of question-answering.

gpt-3, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2212.1045

Country:

Asia > Philippines > Luzon > National Capital Region > City of Manila (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Transportation > Air (0.93)
Media (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)

Add feedback

Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework

Zhao, Ruochen, Li, Xingxuan, Joty, Shafiq, Qin, Chengwei, Bing, Lidong

arXiv.org Artificial IntelligenceMay-4-2023

As large language models (LLMs) have become the norm in NLP, demonstrating good performance in generation and reasoning tasks, one of its most fatal disadvantages is the lack of factual correctness. Generating unfactual texts not only leads to lower performances but also degrades the trust and validity of their applications. Chain-of-Thought (CoT) prompting improves trust and model performance on complex reasoning tasks by generating interpretable reasoning chains, but still suffers from factuality concerns in knowledge-intensive tasks. In this paper, we propose the Verify-and-Edit framework for CoT prompting, which seeks to increase prediction factuality by post-editing reasoning chains according to external knowledge. Building on top of GPT-3, our framework lead to accuracy improvements in multiple open-domain question-answering tasks.

deep learning, knowledge-enhanced chain-of-thought framework, machine learning, (4 more...)

arXiv.org Artificial Intelligence

2305.03268

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)

Add feedback