Large Language Model
FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios
Chern, I-Chun, Chern, Steffi, Chen, Shiqi, Yuan, Weizhe, Feng, Kehua, Zhou, Chunting, He, Junxian, Neubig, Graham, Liu, Pengfei
The emergence of generative pre-trained models has facilitated the synthesis of high-quality text, but it has also posed challenges in identifying factual errors in the generated text. In particular: (1) A wider range of tasks now face an increasing risk of containing factual errors when handled by generative models. (2) Generated texts tend to be lengthy and lack a clearly defined granularity for individual facts. (3) There is a scarcity of explicit evidence available during the process of fact checking. With the above challenges in mind, in this paper, we propose FacTool, a task and domain agnostic framework for detecting factual errors of texts generated by large language models (e.g., ChatGPT). Experiments on four different tasks (knowledge-based QA, code generation, mathematical reasoning, and scientific literature review) show the efficacy of the proposed method. We release the code of FacTool associated with ChatGPT plugin interface at https://github.com/GAIR-NLP/factool .
MediaGPT : A Large Language Model For Chinese Media
Wang, Zhonghao, Lu, Zijia, Jin, Bo, Deng, Haiying
Large language models (LLMs) have shown remarkable capabilities in generating high-quality text and making predictions based on large amounts of data, including the media domain. However, in practical applications, the differences between the media's use cases and the general-purpose applications of LLMs have become increasingly apparent, especially Chinese. This paper examines the unique characteristics of media-domain-specific LLMs compared to general LLMs, designed a diverse set of task instruction types to cater the specific requirements of the domain and constructed unique datasets that are tailored to the media domain. Based on these, we proposed MediaGPT, a domain-specific LLM for the Chinese media domain, training by domain-specific data and experts SFT data. By performing human experts evaluation and strong model evaluation on a validation set, this paper demonstrated that MediaGPT outperforms mainstream models on various Chinese media domain tasks and verifies the importance of domain data and domain-defined prompt types for building an effective domain-specific LLM.
Information Retrieval Meets Large Language Models: A Strategic Report from Chinese IR Community
Ai, Qingyao, Bai, Ting, Cao, Zhao, Chang, Yi, Chen, Jiawei, Chen, Zhumin, Cheng, Zhiyong, Dong, Shoubin, Dou, Zhicheng, Feng, Fuli, Gao, Shen, Guo, Jiafeng, He, Xiangnan, Lan, Yanyan, Li, Chenliang, Liu, Yiqun, Lyu, Ziyu, Ma, Weizhi, Ma, Jun, Ren, Zhaochun, Ren, Pengjie, Wang, Zhiqiang, Wang, Mingwen, Wen, Ji-Rong, Wu, Le, Xin, Xin, Xu, Jun, Yin, Dawei, Zhang, Peng, Zhang, Fan, Zhang, Weinan, Zhang, Min, Zhu, Xiaofei
The research field of Information Retrieval (IR) has evolved significantly, expanding beyond traditional search to meet diverse user information needs. Recently, Large Language Models (LLMs) have demonstrated exceptional capabilities in text understanding, generation, and knowledge inference, opening up exciting avenues for IR research. LLMs not only facilitate generative retrieval but also offer improved solutions for user understanding, model evaluation, and user-system interactions. More importantly, the synergistic relationship among IR models, LLMs, and humans forms a new technical paradigm that is more powerful for information seeking. IR models provide real-time and relevant information, LLMs contribute internal knowledge, and humans play a central role of demanders and evaluators to the reliability of information services. Nevertheless, significant challenges exist, including computational costs, credibility concerns, domain-specific limitations, and ethical considerations. To thoroughly discuss the transformative impact of LLMs on IR research, the Chinese IR community conducted a strategic workshop in April 2023, yielding valuable insights. This paper provides a summary of the workshop's outcomes, including the rethinking of IR's core values, the mutual enhancement of LLMs and IR, the proposal of a novel IR technical paradigm, and open challenges.
Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study
Liu, Peiyu, Liu, Zikang, Gao, Ze-Feng, Gao, Dawei, Zhao, Wayne Xin, Li, Yaliang, Ding, Bolin, Wen, Ji-Rong
Despite the superior performance, Large Language Models~(LLMs) require significant computational resources for deployment and use. To overcome this issue, quantization methods have been widely applied to reduce the memory footprint of LLMs as well as increasing the inference rate. However, a major challenge is that low-bit quantization methods often lead to performance degradation. It is important to understand how quantization impacts the capacity of LLMs. Different from previous studies focused on overall performance, this work aims to investigate the impact of quantization on \emph{emergent abilities}, which are important characteristics that distinguish LLMs from small language models. Specially, we examine the abilities of in-context learning, chain-of-thought reasoning, and instruction-following in quantized LLMs. Our empirical experiments show that these emergent abilities still exist in 4-bit quantization models, while 2-bit models encounter severe performance degradation on the test of these abilities. To improve the performance of low-bit models, we conduct two special experiments: (1) fine-gained impact analysis that studies which components (or substructures) are more sensitive to quantization, and (2) performance compensation through model fine-tuning. Our work derives a series of important findings to understand the impact of quantization on emergent abilities, and sheds lights on the possibilities of extremely low-bit quantization for LLMs.
Sam Altman's Worldcoin Token Soars on First Day of Trading
Worldcoin, the token of the crypto project co-founded by OpenAI Chief Executive Officer Sam Altman, rallied on its first day of trading on Monday as investors piled into the hype around artificial intelligence. Worldcoin jumped to as high as $3.58 from the initial price of $1.70 before falling back to $2.52 as of 11:12 a.m. in London, data compiled by CoinMarketCap data showed. By then, roughly $145 million worth of the token had been traded, after exchanges like Binance listed it. Worldcoin, Altman's eyeball-scanning crypto project which officially launched on Monday, uses a small device called an "orb" to scan people's eyeballs in order to generate a a unique digital identity. That identity, or World ID, grants its holder "proof of personhood" in the Worldcoin parlance.
TechScape: Will Meta's open-source LLM make AI safer โ or put it into the wrong hands?
The AI summer is well and truly upon us. Whether we call this period the peak of the "hype cycle" or simply the moment the curve goes vertical will only be obvious in hindsight, but the cadence of big news in the field has gone from weekly to almost daily. Let's catch up with what the biggest players in AI โ Meta, Microsoft, Apple and OpenAI โ are doing. Always one to keep its cards close to its chest, don't expect to hear of many R&D breakthroughs from Cupertino. Even the AI work that has made it into shipping products is hidden rather than shouted from the rooftops, with the company talking about "machine learning" and "transformers" at its annual worldwide developer conference (WWDC) last month, but conspicuously steering clear of saying "AI".
"Open" alternatives to ChatGPT are on the rise, but how open is AI really?
OpenAI's ChatGPT seems ubiquitous, but open source versions of instruction-tuned text generators are gaining the upper hand. In just 6 months, at least 15 serious alternatives have emerged, all of which have at least one important advantage over ChatGPT: they are a lot more transparent. Insight into training data and algorithms is key for responsible use of generative AI, a team of linguists and language technology researchers at Radboud University claim. The researchers have mapped this rapidly evolving landscape in a paper and a live-updated website. This shows there are many working alternative "open source" text generators, but also that openness comes in degrees and that many models inherit legal restrictions.
How Can Large Language Models Help Humans in Design and Manufacturing?
Makatura, Liane, Foshey, Michael, Wang, Bohan, HรคhnLein, Felix, Ma, Pingchuan, Deng, Bolei, Tjandrasuwita, Megan, Spielberg, Andrew, Owens, Crystal Elaine, Chen, Peter Yichen, Zhao, Allan, Zhu, Amy, Norton, Wil J, Gu, Edward, Jacob, Joshua, Li, Yifei, Schulz, Adriana, Matusik, Wojciech
Advances in computational design and manufacturing (CDaM) have already permeated and transformed numerous industries, including aerospace, architecture, electronics, dental, and digital media, among others. Nevertheless, the full potential of the CDaM workflow is still limited by a number of barriers, such as the extensive domainspecific knowledge that is often required to use CDaM software packages or integrate CDaM solutions into existing workflows. Generative AI tools such as Large Language Models (LLMs) have the potential to remove these barriers, by expediting the CDaM process and providing an intuitive, unified, and user-friendly interface that connects each stage of the pipeline. However, to date, generative AI and LLMs have predominantly been applied to non-engineering domains. In this study, we show how these tools can also be used to develop new design and manufacturing workflows.
Prompt Generate Train (PGT): Few-shot Domain Adaption of Retrieval Augmented Generation Models for Open Book Question-Answering
We propose a framework - Prompt, Generate, Train (PGT) - to efficiently develop a generative question-answering model for open-book question-answering over a proprietary collection of text documents. The framework adapts a retriever augmented generation (RAG) model to the target domain using supervised fine-tuning and reinforcement learning with synthetic feedback in a few-shot setting. This, we hypothesize, will yield an aligned, uncertainty calibrated model that is competitive with GPT-4 based in-context retrieval augmented generation in generating relevant answers at lower serving costs. The framework's synthetic generation pipeline will generate synthetic training data comprising
A large language model-assisted education tool to provide feedback on open-ended responses
Matelsky, Jordan K., Parodi, Felipe, Liu, Tony, Lange, Richard D., Kording, Konrad P.
Open-ended questions are a favored tool among instructors for assessing student understanding and encouraging critical exploration of course material. Providing feedback for such responses is a time-consuming task that can lead to overwhelmed instructors and decreased feedback quality. Many instructors resort to simpler question formats, like multiple-choice questions, which provide immediate feedback but at the expense of personalized and insightful comments. Here, we present a tool that uses large language models (LLMs), guided by instructor-defined criteria, to automate responses to open-ended questions. Our tool delivers rapid personalized feedback, enabling students to quickly test their knowledge and identify areas for improvement. We provide open-source reference implementations both as a web application and as a Jupyter Notebook widget that can be used with instructional coding or math notebooks. With instructor guidance, LLMs hold promise to enhance student learning outcomes and elevate instructional methodologies.