AITopics | Cui, Leyang

Plotting

Cui, Leyang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Mitigating Hallucinations of Large Language Models via Knowledge Consistent Alignment

Wan, Fanqi, Huang, Xinting, Cui, Leyang, Quan, Xiaojun, Bi, Wei, Shi, Shuming

arXiv.org Artificial IntelligenceJan-28-2024

While Large Language Models (LLMs) have proven to be exceptional on a variety of tasks after alignment, they may still produce responses that contradict the context or world knowledge confidently, a phenomenon known as ``hallucination''. In this paper, we demonstrate that reducing the inconsistency between the external knowledge encapsulated in the training data and the intrinsic knowledge inherited in the pretraining corpus could mitigate hallucination in alignment. Specifically, we introduce a novel knowledge consistent alignment (KCA) approach, which involves automatically formulating examinations based on external knowledge for accessing the comprehension of LLMs. For data encompassing knowledge inconsistency, KCA implements several simple yet efficient strategies for processing. We illustrate the superior performance of the proposed KCA approach in mitigating hallucinations across six benchmarks using LLMs of different backbones and scales. Furthermore, we confirm the correlation between knowledge inconsistency and hallucination, signifying the effectiveness of reducing knowledge inconsistency in alleviating hallucinations. Our code, model weights, and data are public at \url{https://github.com/fanqiwan/KCA}.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2401.10768

Country:

North America > United States (1.00)
Europe (0.93)

Genre: Research Report > New Finding (0.46)

Industry:

Education (1.00)
Banking & Finance (1.00)
Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback

Inferflow: an Efficient and Highly Configurable Inference Engine for Large Language Models

Shi, Shuming, Zhao, Enbo, Cai, Deng, Cui, Leyang, Huang, Xinting, Li, Huayang

arXiv.org Artificial IntelligenceJan-16-2024

With Inferflow, users can serve most of the common transformer models by simply modifying some lines in corresponding configuration files, without writing a single line of source code. Compared with most existing inference engines, Inferflow has some key features. First, by implementing a modular framework of atomic build-blocks and technologies, Inferflow is compositionally generalizable to new models. Second, 3.5-bit quantization is introduced in Inferflow as a tradeoff between 3-bit and 4-bit quantization. Third, hybrid model partitioning for multi-GPU inference is introduced in Inferflow to better balance inference speed and throughput than the commonly-adopted partitionby-layer and partition-by-tensor strategies.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2401.08294

Country:

North America > United States (0.14)
Africa > Ethiopia (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

FactMix: Using a Few Labeled In-domain Examples to Generalize to Cross-domain Named Entity Recognition

Yang, Linyi, Yuan, Lifan, Cui, Leyang, Gao, Wenyang, Zhang, Yue

arXiv.org Artificial IntelligenceDec-25-2023

Few-shot Named Entity Recognition (NER) is imperative for entity tagging in limited resource domains and thus received proper attention in recent years. Existing approaches for few-shot NER are evaluated mainly under in-domain settings. In contrast, little is known about how these inherently faithful models perform in cross-domain NER using a few labeled in-domain examples. This paper proposes a two-step rationale-centric data augmentation method to improve the model's generalization ability. Results on several datasets show that our model-agnostic method significantly improves the performance of cross-domain NER tasks compared to previous state-of-the-art methods, including the data augmentation and prompt-tuning methods. Our codes are available at https://github.com/lifan-yuan/FactMix.

artificial intelligence, natural language, text processing, (13 more...)

arXiv.org Artificial Intelligence

2208.11464

Country:

Europe > Germany (0.14)
Europe > Italy (0.14)
Asia > Japan (0.14)
Asia > China (0.14)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

Alleviating Hallucinations of Large Language Models through Induced Hallucinations

Zhang, Yue, Cui, Leyang, Bi, Wei, Shi, Shuming

arXiv.org Artificial IntelligenceDec-25-2023

Despite their impressive capabilities, large language models (LLMs) have been observed to generate responses that include inaccurate or fabricated information, a phenomenon commonly known as ``hallucination''. In this work, we propose a simple \textit{Induce-then-Contrast} Decoding (ICD) strategy to alleviate hallucinations. We first construct a factually weak LLM by inducing hallucinations from the original LLMs. Then, we penalize these induced hallucinations during decoding to enhance the factuality of the generated content. Concretely, we determine the final next-token predictions by amplifying the predictions from the original model and downplaying the induced untruthful predictions via contrastive decoding. Experimental results on both discrimination-based and generation-based hallucination evaluation benchmarks, such as TruthfulQA and \textsc{FActScore}, demonstrate that our proposed ICD methods can effectively enhance the factuality of LLMs across various model sizes and families. For example, when equipped with ICD, Llama2-7B-Chat and Mistral-7B-Instruct achieve performance comparable to ChatGPT and GPT4 on TruthfulQA, respectively.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2312.1571

Country:

Asia (0.93)
North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Government > Military (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs

Yang, Sen, Li, Xin, Cui, Leyang, Bing, Lidong, Lam, Wai

arXiv.org Artificial IntelligenceNov-16-2023

Though prompting LLMs with various reasoning structures produces reasoning proofs along with answers, these proofs are not ensured to be causal and reliable due to the inherent defects of LLMs. Tracking such deficiencies, we present a neuro-symbolic integration method, in which a neural LLM is used to represent the knowledge of the problem while an LLM-free symbolic solver is adopted to do deliberative reasoning using the knowledge. Specifically, our customized meta-interpreters allow the production of reasoning proofs and support flexible search strategies. These reasoning proofs are ensured to be causal and reliable because of the deterministic executing nature of the symbolic solvers. Empirically, on ProofWriter, our method surpasses the CoT baseline by nearly double in accuracy and more than triple in proof similarity. On GSM8K, our method also shows accuracy improvements and nearly doubled proof similarity. Our code is released at https://github.com/DAMO-NLP-SG/CaRing

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2311.09802

Country:

North America > United States (0.28)
North America > Canada (0.28)
Asia > China (0.28)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Collaborative Evaluation: Exploring the Synergy of Large Language Models and Humans for Open-ended Generation Evaluation

Li, Qintong, Cui, Leyang, Kong, Lingpeng, Bi, Wei

arXiv.org Artificial IntelligenceOct-30-2023

Humans are widely involved in the evaluation of open-ended natural language generation tasks (NLG) that demand creativity, as automatic metrics often exhibit weak correlations with human judgments. Large language models (LLMs) recently have emerged as a scalable and cost-effective alternative to human evaluations. However, both humans and LLMs have limitations, i.e., inherent subjectivity and unreliable judgments, particularly for open-ended tasks that require adaptable metrics tailored to diverse task requirements. To explore the synergy between humans and LLM-based evaluators and address the challenges of existing inconsistent evaluation criteria in open-ended NLG tasks, we propose a Collaborative Evaluation pipeline CoEval, involving the design of a checklist of task-specific criteria and the detailed evaluation of texts, in which LLM generates initial ideation, and then humans engage in scrutiny. We conducted a series of experiments to investigate the mutual effects between LLMs and humans in CoEval. Results show that, by utilizing LLMs, CoEval effectively evaluates lengthy texts, saving significant time and reducing human evaluation outliers. Human scrutiny still plays a role, revising around 20% of LLM evaluation scores for ultimate reliability.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2310.1974

Country: Asia (0.28)

Genre: Research Report > New Finding (0.87)

Industry:

Media (0.47)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

LogiCoT: Logical Chain-of-Thought Instruction-Tuning

Liu, Hanmeng, Teng, Zhiyang, Cui, Leyang, Zhang, Chaoli, Zhou, Qiji, Zhang, Yue

arXiv.org Artificial IntelligenceOct-28-2023

Generative Pre-trained Transformer 4 (GPT-4) demonstrates impressive chain-of-thought reasoning ability. Recent work on self-instruction tuning, such as Alpaca, has focused on enhancing the general proficiency of models. These instructions enable the model to achieve performance comparable to GPT-3.5 on general tasks like open-domain text generation and paraphrasing. However, they fall short of helping the model handle complex reasoning tasks. To bridge the gap, this paper presents LogiCoT, a new instruction-tuning dataset for Logical Chain-of-Thought reasoning with GPT-4. We elaborate on the process of harvesting instructions for prompting GPT-4 to generate chain-of-thought rationales. LogiCoT serves as an instruction set for teaching models of logical reasoning and elicits general reasoning skills.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2305.12147

Country:

North America > Canada (0.14)
Europe > Italy (0.14)

Genre: Research Report (0.64)

Industry: Education > Curriculum > Subject-Specific Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RobustGEC: Robust Grammatical Error Correction Against Subtle Context Perturbation

Zhang, Yue, Cui, Leyang, Zhao, Enbo, Bi, Wei, Shi, Shuming

arXiv.org Artificial IntelligenceOct-11-2023

Grammatical Error Correction (GEC) systems play a vital role in assisting people with their daily writing tasks. However, users may sometimes come across a GEC system that initially performs well but fails to correct errors when the inputs are slightly modified. To ensure an ideal user experience, a reliable GEC system should have the ability to provide consistent and accurate suggestions when encountering irrelevant context perturbations, which we refer to as context robustness. In this paper, we introduce RobustGEC, a benchmark designed to evaluate the context robustness of GEC systems. RobustGEC comprises 5,000 GEC cases, each with one original error-correct sentence pair and five variants carefully devised by human annotators. Utilizing RobustGEC, we reveal that state-of-the-art GEC systems still lack sufficient robustness against context perturbations. In addition, we propose a simple yet effective method for remitting this issue.

robust grammatical error correction, robustgec, subtle context perturbation

arXiv.org Artificial Intelligence

2310.07299

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Quality > Data Cleaning (0.60)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.60)

Add feedback

Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A Preliminary Study on Writing Assistance

Zhang, Yue, Cui, Leyang, Cai, Deng, Huang, Xinting, Fang, Tao, Bi, Wei

arXiv.org Artificial IntelligenceOct-9-2023

Proprietary Large Language Models (LLMs), such as ChatGPT, have garnered significant attention due to their exceptional capabilities in handling a diverse range of tasks. Recent studies demonstrate that open-sourced smaller foundational models, such as 7B-size LLaMA, can also display remarkable proficiency in tackling diverse tasks when fine-tuned using instruction-driven data. In this work, we investigate a practical problem setting where the primary focus is on one or a few particular tasks rather than general-purpose instruction following, and explore whether LLMs can be beneficial and further improved for such targeted scenarios. We choose the writing-assistant scenario as the testbed, which includes seven writing tasks. We collect training data for these tasks, reframe them in an instruction-following format, and subsequently refine the LLM, specifically LLaMA, via instruction tuning. Experimental results show that fine-tuning LLaMA on writing instruction data significantly improves its ability on writing tasks. We also conduct more experiments and analyses to offer insights for future work on effectively fine-tuning LLaMA for specific scenarios. Finally, we initiate a discussion regarding the necessity of employing LLMs for only one targeted task, taking into account the efforts required for tuning and the resources consumed during deployment.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2305.13225

Country: Asia (0.46)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

Zhang, Yue, Li, Yafu, Cui, Leyang, Cai, Deng, Liu, Lemao, Fu, Tingchen, Huang, Xinting, Zhao, Enbo, Zhang, Yu, Chen, Yulong, Wang, Longyue, Luu, Anh Tuan, Bi, Wei, Shi, Freda, Shi, Shuming

arXiv.org Artificial IntelligenceSep-24-2023

While large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks, a significant concern revolves around their propensity to exhibit hallucinations: LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge. This phenomenon poses a substantial challenge to the reliability of LLMs in real-world scenarios. In this paper, we survey recent efforts on the detection, explanation, and mitigation of hallucination, with an emphasis on the unique challenges posed by LLMs. We present taxonomies of the LLM hallucination phenomena and evaluation benchmarks, analyze existing approaches aiming at mitigating LLM hallucination, and discuss potential directions for future research.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2309.01219

Country:

Europe (0.92)
Asia (0.67)
North America > United States > Colorado (0.14)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)
Instructional Material > Course Syllabus & Notes (0.46)

Industry:

Leisure & Entertainment (1.00)
Health & Medicine (1.00)
Media (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback