Large Language Model
Guess What? This Mystery Story Written by Robots Is Kind of Good!
In his afterword to the short murder mystery Death of an Author, the writer Stephen Marche invokes a concept called Moravec's paradox. Hans Moravec, a robotics scientist, observed that tasks human beings find challenging, such as playing chess, are easy for computers, while many of the actions human beings effortlessly perform without conscious thought, such as perception or oriented movement through space, are extremely difficult for the machines. Moravec's paradox is a useful way to think about the surprising ways that Death of an Author, described by its publisher as a "groundbreaking experiment" in artificial intelligence, succeeds. Jacob Weisberg, the head of podcast production company Pushkin Industries (and a former Slate editor in chief), asked Marche, a journalist who writes about artificial intelligence, to make Death of an Author earlier this year. The goal was a novella whose text was to be 95 percent computer-generated.
The UK will spend £100 million to develop its own 'sovereign' AI
The UK government doesn't want to sit idle while foundational AI models like ChatGPT flourish. Prime Minister Rishi Sunak and Technology Secretary Michelle Donelan have pledged an initial £100 million (about $124.5 million) to establish a Foundation Model Taskforce. The team will develop AI that ideally makes the country "globally competitive," and will work with the industry to make these systems safer and more reliable. The taskforce is inspired by the COVID-19 vaccine unit from the height of the pandemic. The group will report directly to both the Prime Minister and Technology Secretary, and have a chairperson announced this summer.
ChatGPT Can Help Doctors--and Hurt Patients
Robert Pearl, a professor at Stanford medical school, was previously CEO of Kaiser Permanente, a US medical group with more than 12 million patients. If he was still in charge, he'd insist that all of its 24,000 physicians start using ChatGPT in their practice now. "I think it will be more important to doctors than the stethoscope was in the past," Pearl says. "No physician who practices high-quality medicine will do so without accessing ChatGPT or other forms of generative AI." Pearl no longer practices medicine but says he knows physicians using ChatGPT to summarize patient care, write letters, and even--when stumped--ask for ideas on how to diagnose patients. He suspects doctors will discover hundreds of thousands of useful applications of the bot for the betterment of human health.
Generation-driven Contrastive Self-training for Zero-shot Text Classification with Instruction-tuned GPT
Zhang, Ruohong, Wang, Yau-Shian, Yang, Yiming
Moreover, GPT-based zero-shot classification models tend to make independent predictions over test instances, which can be sub-optimal as the instance correlations and the decision boundaries in the target space are ignored. To address these difficulties and limitations, we propose a new approach to zero-shot text classification, namely \ourmodelshort, which leverages the strong generative power of GPT to assist in training a smaller, more adaptable, and efficient sentence encoder classifier with contrastive self-training. Specifically, GenCo applies GPT in two ways: firstly, it generates multiple augmented texts for each input instance to enhance the semantic embedding of the instance and improve the mapping to relevant labels; secondly, it generates augmented texts conditioned on the predicted label during self-training, which makes the generative process tailored to the decision boundaries in the target space. In our experiments, GenCo outperforms previous state-of-the-art methods on multiple benchmark datasets, even when only limited in-domain text data is available.
Better Question-Answering Models on a Budget
Wijeratne, Yudhanjaya, Marikar, Ishan
Low-rank adaptation (LoRA) and question-answer datasets from large language models have made it much easier for much smaller models to be finetuned to the point where they display sophisticated conversational abilities. In this paper, we present Eluwa, a family of LoRA models that use the Stanford Alpaca dataset and massively improve the capabilities of Facebook's OPT 1.3B, 2.7B and 6.7B models. We benchmark these models in multiple ways, including letting GPT-4 judge their answers to prompts that span general knowledge, writing, programming and other tasks. We show that smaller models here can be fine-tuned to be as performant as models 3x larger - all for as little as 40 USD in compute.
KInITVeraAI at SemEval-2023 Task 3: Simple yet Powerful Multilingual Fine-Tuning for Persuasion Techniques Detection
Hromadka, Timo, Smolen, Timotej, Remis, Tomas, Pecher, Branislav, Srba, Ivan
This paper presents the best-performing solution to the SemEval 2023 Task 3 on the subtask 3 dedicated to persuasion techniques detection. Due to a high multilingual character of the input data and a large number of 23 predicted labels (causing a lack of labelled data for some language-label combinations), we opted for fine-tuning pre-trained transformer-based language models. Conducting multiple experiments, we find the best configuration, which consists of large multilingual model (XLM-RoBERTa large) trained jointly on all input data, with carefully calibrated confidence thresholds for seen and surprise languages separately. Our final system performed the best on 6 out of 9 languages (including two surprise languages) and achieved highly competitive results on the remaining three languages.
ChatLLM Network: More brains, More intelligence
Hao, Rui, Hu, Linmei, Qi, Weijian, Wu, Qingliu, Zhang, Yirui, Nie, Liqiang
Dialogue-based language models mark a huge milestone in the field of artificial intelligence, by their impressive ability to interact with users, as well as a series of challenging tasks prompted by customized instructions. However, the prevalent large-scale dialogue-based language models like ChatGPT still have room for improvement, such as unstable responses to questions and the inability to think cooperatively like humans. Considering the ability of dialogue-based language models in conversation and their inherent randomness in thinking, we propose ChatLLM network that allows multiple dialogue-based language models to interact, provide feedback, and think together. We design the network of ChatLLMs based on ChatGPT. Specifically, individual instances of ChatGPT may possess distinct perspectives towards the same problem, and by consolidating these diverse viewpoints via a separate ChatGPT, the ChatLLM network system can conduct decision-making more objectively and comprehensively. In addition, a language-based feedback mechanism comparable to backpropagation is devised to update the ChatGPTs within the network. Experiments on two datasets demonstrate that our network attains significant improvements in problem-solving, leading to observable progress amongst each member.
SocialDial: A Benchmark for Socially-Aware Dialogue Systems
Zhan, Haolan, Li, Zhuang, Wang, Yufei, Luo, Linhao, Feng, Tao, Kang, Xiaoxi, Hua, Yuncheng, Qu, Lizhen, Soon, Lay-Ki, Sharma, Suraj, Zukerman, Ingrid, Semnani-Azad, Zhaleh, Haffari, Gholamreza
Dialogue systems have been widely applied in many scenarios and are now more powerful and ubiquitous than ever before. With large neural models and massive available data, current dialogue systems have access to more knowledge than any people in their life. However, current dialogue systems still do not perform at a human level. One major gap between conversational agents and humans lies in their abilities to be aware of social norms. The development of socially-aware dialogue systems is impeded due to the lack of resources. In this paper, we present the first socially-aware dialogue corpus - SocialDial, based on Chinese social culture. SocialDial consists of two parts: 1,563 multi-turn dialogues between two human speakers with fine-grained labels, and 4,870 synthetic conversations generated by ChatGPT. The human corpus covers five categories of social norms, which have 14 sub-categories in total. Specifically, it contains social factor annotations including social relation, context, social distance, and social norms. However, collecting sufficient socially-aware dialogues is costly. Thus, we harness the power of ChatGPT and devise an ontology-based synthetic data generation framework. This framework is able to generate synthetic data at scale. To ensure the quality of synthetic dialogues, we design several mechanisms for quality control during data collection. Finally, we evaluate our dataset using several pre-trained models, such as BERT and RoBERTa. Comprehensive empirical results based on state-of-the-art neural models demonstrate that modeling of social norms for dialogue systems is a promising research direction. To the best of our knowledge, SocialDial is the first socially-aware dialogue dataset that covers multiple social factors and has fine-grained labels.
Enriching Source Code with Contextual Data for Code Completion Models: An Empirical Study
van Dam, Tim, Izadi, Maliheh, van Deursen, Arie
Transformer-based pre-trained models have recently achieved great results in solving many software engineering tasks including automatic code completion which is a staple in a developer's toolkit. While many have striven to improve the code-understanding abilities of such models, the opposite -- making the code easier to understand -- has not been properly investigated. In this study, we aim to answer whether making code easier to understand through using contextual data improves the performance of pre-trained code language models for the task of code completion. We consider type annotations and comments as two common forms of additional contextual information that often help developers understand code better. For the experiments, we study code completion in two granularity levels; token and line completion and take three recent and large-scale language models for source code: UniXcoder, CodeGPT, and InCoder with five evaluation metrics. Finally, we perform the Wilcoxon Signed Rank test to gauge significance and measure the effect size. Contrary to our expectations, all models perform better if type annotations are removed (albeit the effect sizes are small). For comments, we find that the models perform better in the presence of multi-line comments (again with small effect sizes). Based on our observations, we recommend making proper design choices when training, fine-tuning, or simply selecting such models given the intended data and application. Better evaluations and multi-modal techniques can also be further investigated to improve the practicality and accuracy of auto-completions.
AI, write an essay for me: A large-scale comparison of human-written versus ChatGPT-generated essays
Herbold, Steffen, Hautli-Janisz, Annette, Heuer, Ute, Kikteva, Zlata, Trautsch, Alexander
Background: Recently, ChatGPT and similar generative AI models have attracted hundreds of millions of users and become part of the public discourse. Many believe that such models will disrupt society and will result in a significant change in the education system and information generation in the future. So far, this belief is based on either colloquial evidence or benchmarks from the owners of the models -- both lack scientific rigour. Objective: Through a large-scale study comparing human-written versus ChatGPT-generated argumentative student essays, we systematically assess the quality of the AI-generated content. Methods: A large corpus of essays was rated using standard criteria by a large number of human experts (teachers). We augment the analysis with a consideration of the linguistic characteristics of the generated essays. Results: Our results demonstrate that ChatGPT generates essays that are rated higher for quality than human-written essays. The writing style of the AI models exhibits linguistic characteristics that are different from those of the human-written essays, e.g., it is characterized by fewer discourse and epistemic markers, but more nominalizations and greater lexical diversity. Conclusions: Our results clearly demonstrate that models like ChatGPT outperform humans in generating argumentative essays. Since the technology is readily available for anyone to use, educators must act immediately. We must re-invent homework and develop teaching concepts that utilize these AI models in the same way as math utilized the calculator: teach the general concepts first and then use AI tools to free up time for other learning objectives.