esponse
FactCHD: Benchmarking Fact-Conflicting Hallucination Detection
Chen, Xiang, Song, Duanzheng, Gui, Honghao, Wang, Chenxi, Zhang, Ningyu, Yong, Jiang, Huang, Fei, Lv, Chengfei, Zhang, Dan, Chen, Huajun
Despite their impressive generative capabilities, LLMs are hindered by fact-conflicting hallucinations in real-world applications. The accurate identification of hallucinations in texts generated by LLMs, especially in complex inferential scenarios, is a relatively unexplored area. To address this gap, we present FactCHD, a dedicated benchmark designed for the detection of fact-conflicting hallucinations from LLMs. FactCHD features a diverse dataset that spans various factuality patterns, including vanilla, multi-hop, comparison, and set operation. A distinctive element of FactCHD is its integration of fact-based evidence chains, significantly enhancing the depth of evaluating the detectors' explanations. Experiments on different LLMs expose the shortcomings of current approaches in detecting factual errors accurately. Furthermore, we introduce Truth-Triangulator that synthesizes reflective considerations by tool-enhanced ChatGPT and LoRA-tuning based on Llama2, aiming to yield more credible detection through the amalgamation of predictive results and evidence. The benchmark dataset is available at https://github.com/zjunlp/FactCHD.
- Asia > China > Shanghai > Shanghai (0.05)
- North America > United States > New York (0.04)
- Africa > Zambia (0.04)
- (8 more...)
- Personal > Obituary (0.46)
- Research Report > New Finding (0.46)
- Media (1.00)
- Leisure & Entertainment (0.94)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
- (2 more...)
Automatic Instruction Optimization for Open-source LLM Instruction Tuning
Liu, Yilun, Tao, Shimin, Zhao, Xiaofeng, Zhu, Ming, Ma, Wenbing, Zhu, Junhao, Su, Chang, Hou, Yutai, Zhang, Miao, Zhang, Min, Ma, Hongxia, Zhang, Li, Yang, Hao, Jiang, Yanfei
Instruction tuning is crucial for enabling Language Learning Models (LLMs) in responding to human instructions. The quality of instruction pairs used for tuning greatly affects the performance of LLMs. However, the manual creation of high-quality instruction datasets is costly, leading to the adoption of automatic generation of instruction pairs by LLMs as a popular alternative in the training of open-source LLMs. To ensure the high quality of LLM-generated instruction datasets, several approaches have been proposed. Nevertheless, existing methods either compromise dataset integrity by filtering a large proportion of samples, or are unsuitable for industrial applications. In this paper, instead of discarding low-quality samples, we propose CoachLM, a novel approach to enhance the quality of instruction datasets through automatic revisions on samples in the dataset. CoachLM is trained from the samples revised by human experts and significantly increases the proportion of high-quality samples in the dataset from 17.7% to 78.9%. The effectiveness of CoachLM is further assessed on various real-world instruction test sets. The results show that CoachLM improves the instruction-following capabilities of the instruction-tuned LLM by an average of 29.9%, which even surpasses larger LLMs with nearly twice the number of parameters. Furthermore, CoachLM is successfully deployed in a data management system for LLMs at Huawei, resulting in an efficiency improvement of up to 20% in the cleaning of 40k real-world instruction pairs. We release the training data and code of CoachLM (https://github.com/lunyiliu/CoachLM).
- Research Report > New Finding (0.66)
- Instructional Material > Course Syllabus & Notes (0.64)
- Instructional Material > Online (0.40)
A Novel Demand Response Model and Method for Peak Reduction in Smart Grids -- PowerTAC
Chandlekar, Sanjay, Boroju, Arthik, Jain, Shweta, Gujar, Sujit
One of the widely used peak reduction methods in smart grids is demand response, where one analyzes the shift in customers' (agents') usage patterns in response to the signal from the distribution company. Often, these signals are in the form of incentives offered to agents. This work studies the effect of incentives on the probabilities of accepting such offers in a real-world smart grid simulator, PowerTAC. We first show that there exists a function that depicts the probability of an agent reducing its load as a function of the discounts offered to them. We call it reduction probability (RP). RP function is further parametrized by the rate of reduction (RR), which can differ for each agent. We provide an optimal algorithm, MJS--ExpResponse, that outputs the discounts to each agent by maximizing the expected reduction under a budget constraint. When RRs are unknown, we propose a Multi-Armed Bandit (MAB) based online algorithm, namely MJSUCB--ExpResponse, to learn RRs. Experimentally we show that it exhibits sublinear regret. Finally, we showcase the efficacy of the proposed algorithm in mitigating demand peaks in a real-world smart grid system using the PowerTAC simulator as a test bed.
- Asia > India > Telangana > Hyderabad (0.04)
- South America > Brazil (0.04)
- North America > United States > California (0.04)
- North America > Canada (0.04)