Large Language Model
Roll Up Your Sleeves: Working with a Collaborative and Engaging Task-Oriented Dialogue System
Mo, Lingbo, Chen, Shijie, Chen, Ziru, Deng, Xiang, Lewis, Ashley, Singh, Sunit, Stevens, Samuel, Tai, Chang-You, Wang, Zhen, Yue, Xiang, Zhang, Tianshu, Su, Yu, Sun, Huan
We introduce TacoBot, a user-centered task-oriented digital assistant designed to guide users through complex real-world tasks with multiple steps. Covering a wide range of cooking and how-to tasks, we aim to deliver a collaborative and engaging dialogue experience. Equipped with language understanding, dialogue management, and response generation components supported by a robust search engine, TacoBot ensures efficient task assistance. To enhance the dialogue experience, we explore a series of data augmentation strategies using LLMs to train advanced neural models continuously. TacoBot builds upon our successful participation in the inaugural Alexa Prize TaskBot Challenge, where our team secured third place among ten competing teams. We offer TacoBot as an open-source framework that serves as a practical example for deploying task-oriented dialogue systems.
RoCar: A Relationship Network-based Evaluation Method to Large Language Models
Wang, Ming, Wu, Wenfang, Gao, Chongyun, Wang, Daling, Feng, Shi, Zhang, Yifei
Pre-trained Models have become the dominant approach in the field of deep learning since Transformer [1]. Buy now, the Large Language Models (LLMs) represented by ChatGPT [2] have received the widest attention from researchers in the field of Artificial Intelligence (AI), especially Natural Language Processing (NLP). Like LLaMA [3], many open-source LLMs [4, 5, 6, 3, 7, 8] have been published. Due to the strong reasoning, generative and memory abilities acquired by LLMs during training, they are able to operate a variety of traditional tasks based on specific prompts and achieve great performance. As a result, LLMs have gained widespread interest and applications, such as in the financial [9], emotional [10, 11], legal [12], medical [13, 14, 15] and educational [16] fields. To evaluate the capability of LLMs and to guide the selection of more appropriate LLMs in applications, many evaluation approaches [17] for LLMs have been proposed by researchers. C-Eval [18] constructed a reasoning test set of 13,948 questions in 52 subjects ranging from junior school to postgraduate university and vocational exams to evaluate LLM's problem-solving skills. Gaokao-Bench [19] collected questions from the 2010-2022 Chinese national college entrance examination papers, including 1,781 objective questions and 1,030 subjective questions, and constructed a framework for assessing the language comprehension and logical reasoning ability of LLMs. Microsoft has released a new benchmark test, AGIEval [20], by selecting 20 official, public, high-standard exams, including general university entrance exams (Chinese national college entrance examination and the U.S. SAT), law school entrance exams, maths competitions, bar exams, national civil service exams, and more.
ChatGPT is Good but Bing Chat is Better for Vietnamese Students
This study examines the efficacy of two SOTA large language models (LLMs), namely ChatGPT and Microsoft Bing Chat (BingChat), in catering to the needs of Vietnamese students. Although ChatGPT exhibits proficiency in multiple disciplines, Bing Chat emerges as the more advantageous option. We conduct a comparative analysis of their academic achievements in various disciplines, encompassing mathematics, literature, English language, physics, chemistry, biology, history, geography, and civic education. The results of our study suggest that BingChat demonstrates superior performance compared to ChatGPT across a wide range of subjects, with the exception of literature, where ChatGPT exhibits better performance. Additionally, BingChat utilizes the more advanced GPT-4 technology in contrast to ChatGPT, which is built upon GPT-3.5. This allows BingChat to improve to comprehension, reasoning and generation of creative and informative text. Moreover, the fact that BingChat is accessible in Vietnam and its integration of hyperlinks and citations within responses serve to reinforce its superiority. In our analysis, it is evident that while ChatGPT exhibits praiseworthy qualities, BingChat presents a more apdated solutions for Vietnamese students.
Science in the Era of ChatGPT, Large Language Models and Generative AI: Challenges for Research Ethics and How to Respond
Since the release of popular large language models (LLMs) such as ChatGPT, the transformative impact of artificial intelligence (AI) on broader society has been unprecedented. This is particularly alarming for science and its conquest of truth (Chomsky et al., 2023). Generative AI and, particularly, conversational AI based on language models has set new ethical dilemmas for knowledge, epistemology and research practice. From authorship, to misinformation, biases, fairness and safety of interactions with human subjects, research ethics boards need to adapt to this new era in order to protect research integrity and set high-quality ethical standards for research conduct (van Dis et al., 2023). This paper focuses on reviewing these challenges with the aim of laying foundations for a timely and effective response. ChatGPT is an AI chatbot released in November 2022 by OpenAI. It is a Generative Pre-trained Transformer (GPT), a type of artificial deep neural network with a number of parameters in the order of billions. It is designed to process sequential input data, i.e. natural language, without labeling (self-supervised learning), but with remarkable capabilities for parallelization that significantly reduce training time. The model is further enhanced by a combination of supervised and reinforcement learning based on past conversations as well as human feedback to fine-tune the model and its responses (Stiennon et al., 2020; Gao,
SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models
Thangarasa, Vithursan, Gupta, Abhay, Marshall, William, Li, Tianda, Leong, Kevin, DeCoste, Dennis, Lie, Sean, Saxena, Shreyas
The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural Language Processing (NLP). Instead of directly training on a downstream task, language models are first pre-trained on large datasets with cross-domain knowledge (e.g., Pile, MassiveText, etc.) and then fine-tuned on task-specific data (e.g., natural language generation, text summarization, etc.). Scaling the model and dataset size has helped improve the performance of LLMs, but unfortunately, this also lead to highly prohibitive computational costs. Pre-training LLMs often require orders of magnitude more FLOPs than fine-tuning and the model capacity often remains the same between the two phases. To achieve training efficiency w.r.t training FLOPs, we propose to decouple the model capacity between the two phases and introduce Sparse Pre-training and Dense Fine-tuning (SPDF). In this work, we show the benefits of using unstructured weight sparsity to train only a subset of weights during pre-training (Sparse Pre-training) and then recover the representational capacity by allowing the zeroed weights to learn (Dense Fine-tuning). We demonstrate that we can induce up to 75% sparsity into a 1.3B parameter GPT-3 XL model resulting in a 2.5x reduction in pre-training FLOPs, without a significant loss in accuracy on the downstream tasks relative to the dense baseline. By rigorously evaluating multiple downstream tasks, we also establish a relationship between sparsity, task complexity and dataset size. Our work presents a promising direction to train large GPT models at a fraction of the training FLOPs using weight sparsity, while retaining the benefits of pre-trained textual representations for downstream tasks.
Tech Companies' Friendly New Strategy to Destroy One Another
More than a decade ago, in a prescient essay for Scientific American, the inventor of the World Wide Web denounced what Facebook and other tech giants were doing to his signature invention. "Why should you care?" Tim Berners-Lee wrote at the time. "Because the Web is yours." These companies, he warned, were restructuring the web itself, turning an expanse of interconnected websites all built on the same open infrastructure into a series of "fragmented islands" where users were kept hostage. On Facebook's island, he wrote, people give over their entire digital life for the chance to connect with their friends, but have no way to transfer their information to any other platform.
A new crypto firm wants to scan your eyeballs โ should you look away?
Worldcoin wants to prove I am "actually human". At least that is the explanation a staff member gives for a cryptocurrency venture scanning my eyeball in a London office building. Without the optical scan, Worldcoin will not verify your "humanness" โ you could be a robot and you won't get any crypto. Welcome to financial security in the age of artificial intelligence. Concerns have been voiced about the privacy implications of Worldcoin, which was co-founded by Sam Altman, the chief executive of the ChatGPT developer OpenAI.
AI news recap: While Hollywood strikes, is ChatGPT getting worse?
Artificial intelligence can now create images, novels and source code from scratch. Except it isn't really from scratch, because a vast amount of human-generated examples are needed to train these AI models โ something that has angered artists, programmers and writers and led to a series of lawsuits. Hollywood actors are the latest group of creatives to turn against AI. They fear that film studios could take control of their likeness and have them "star" in films without ever being on set, perhaps taking on roles they would rather avoid and uttering lines or acting out scenes they would find distasteful. Worse still, they might not get paid for it.
Tutorials on Stance Detection using Pre-trained Language Models: Fine-tuning BERT and Prompting Large Language Models
This paper presents two self-contained tutorials on stance detection in Twitter data using BERT fine-tuning and prompting large language models (LLMs). The first tutorial explains BERT architecture and tokenization, guiding users through training, tuning, and evaluating standard and domain-specific BERT models with HuggingFace transformers. The second focuses on constructing prompts and few-shot examples to elicit stances from ChatGPT and open-source FLAN-T5 without fine-tuning. Various prompting strategies are implemented and evaluated using confusion matrices and macro F1 scores. The tutorials provide code, visualizations, and insights revealing the strengths of few-shot ChatGPT and FLAN-T5 which outperform fine-tuned BERTs. By covering both model fine-tuning and prompting-based techniques in an accessible, hands-on manner, these tutorials enable learners to gain applied experience with cutting-edge methods for stance detection.
WC-SBERT: Zero-Shot Text Classification via SBERT with Self-Training for Wikipedia Categories
Chi, Te-Yu, Tang, Yu-Meng, Lu, Chia-Wen, Zhang, Qiu-Xia, Jang, Jyh-Shing Roger
Our research focuses on solving the zero-shot text classification problem in NLP, with a particular emphasis on innovative self-training strategies. To achieve this objective, we propose a novel self-training strategy that uses labels rather than text for training, significantly reducing the model's training time. Specifically, we use categories from Wikipedia as our training set and leverage the SBERT pre-trained model to establish positive correlations between pairs of categories within the same text, facilitating associative training. For new test datasets, we have improved the original self-training approach, eliminating the need for prior training and testing data from each target dataset. Instead, we adopt Wikipedia as a unified training dataset to better approximate the zero-shot scenario. This modification allows for rapid fine-tuning and inference across different datasets, greatly reducing the time required for self-training. Our experimental results demonstrate that this method can adapt the model to the target dataset within minutes. Compared to other BERT-based transformer models, our approach significantly reduces the amount of training data by training only on labels, not the actual text, and greatly improves training efficiency by utilizing a unified training set. Additionally, our method achieves state-of-the-art results on both the Yahoo Topic and AG News datasets.