Large Language Model
Short Answer Grading Using One-shot Prompting and Text Similarity Scoring Model
In this study, we developed an automated short answer grading (ASAG) model that provided both analytic scores and final holistic scores. Short answer items typically consist of multiple sub-questions, and providing an analytic score and the text span relevant to each sub-question can increase the interpretability of the automated scores. Furthermore, they can be used to generate actionable feedback for students. Despite these advantages, most studies have focused on predicting only holistic scores due to the difficulty in constructing dataset with manual annotations. To address this difficulty, we used large language model (LLM)-based one-shot prompting and a text similarity scoring model with domain adaptation using small manually annotated dataset. The accuracy and quadratic weighted kappa of our model were 0.67 and 0.71 on a subset of the publicly available ASAG dataset. The model achieved a substantial improvement over the majority baseline.
Controllable Text-to-Image Generation with GPT-4
Zhang, Tianjun, Zhang, Yi, Vineet, Vibhav, Joshi, Neel, Wang, Xin
Current text-to-image generation models often struggle to follow textual instructions, especially the ones requiring spatial reasoning. On the other hand, Large Language Models (LLMs), such as GPT-4, have shown remarkable precision in generating code snippets for sketching out text inputs graphically, e.g., via TikZ. In this work, we introduce Control-GPT to guide the diffusion-based text-to-image pipelines with programmatic sketches generated by GPT-4, enhancing their abilities for instruction following. Control-GPT works by querying GPT-4 to write TikZ code, and the generated sketches are used as references alongside the text instructions for diffusion models (e.g., ControlNet) to generate photo-realistic images. One major challenge to training our pipeline is the lack of a dataset containing aligned text, images, and sketches. We address the issue by converting instance masks in existing datasets into polygons to mimic the sketches used at test time. As a result, Control-GPT greatly boosts the controllability of image generation. It establishes a new state-of-art on the spatial arrangement and object positioning generation and enhances users' control of object positions, sizes, etc., nearly doubling the accuracy of prior models. Our work, as a first attempt, shows the potential for employing LLMs to enhance the performance in computer vision tasks.
Self Information Update for Large Language Models through Mitigating Exposure Bias
Current LLMs have demonstrated remarkable capabilities in addressing users' requests for various types of information. However, these models are limited by the most recent data available in their pretraining corpora, rendering them incapable of providing up-to-date information. Retraining LLMs from scratch is cost-prohibitive, and the effectiveness of continual fine-tuning on new corpora has not been thoroughly examined. Additionally, current update procedures typically demand significant human input to prepare the information into more structured format, such as knowledge triples, conversational data or responses with human feedback. In this study, we conduct a comprehensive examination of a novel self information update task in LLMs, which only requires the provision of informative text corpora. For instance, we can use the latest news articles to update the LLMs' existing knowledge. We define the self information update task and assess the continual fine-tuning approach for this purpose. We observe that the naive method of continual fine-tuning can be problematic due to LLMs' exposure bias, which prioritizes existing information over new information we aim to integrate and leads to incorrect reasoning chains that ultimately diminish the efficacy of information updates. Based on our analysis, we propose an effective method to mitigate exposure bias by incorporating the selection of relevant facts into training losses. Furthermore, we develop a dataset to evaluate information updates, derived from news articles published after March 2023. Experimental results demonstrate that our proposed approach significantly increases the factual consistency score (0 to 1) by 0.16 while having minimal impact on performance for instructions not directly related to the new information.
Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation
Huang, Jiawei, Ren, Yi, Huang, Rongjie, Yang, Dongchao, Ye, Zhenhui, Zhang, Chen, Liu, Jinglin, Yin, Xiang, Ma, Zejun, Zhao, Zhou
Large diffusion models have been successful in text-to-audio (T2A) synthesis tasks, but they often suffer from common issues such as semantic misalignment and poor temporal consistency due to limited natural language understanding and data scarcity. Additionally, 2D spatial structures widely used in T2A works lead to unsatisfactory audio quality when generating variable-length audio samples since they do not adequately prioritize temporal information. To address these challenges, we propose Make-an-Audio 2, a latent diffusion-based T2A method that builds on the success of Make-an-Audio. Our approach includes several techniques to improve semantic alignment and temporal consistency: Firstly, we use pre-trained large language models (LLMs) to parse the text into structured
Transformer Language Models Handle Word Frequency in Prediction Head
Kobayashi, Goro, Kuribayashi, Tatsuki, Yokoi, Sho, Inui, Kentaro
Prediction head is a crucial component of Transformer language models. Despite its direct impact on prediction, this component has often been overlooked in analyzing Transformers. In this study, we investigate the inner workings of the prediction head, specifically focusing on bias parameters. Our experiments with BERT and GPT-2 models reveal that the biases in their word prediction heads play a significant role in the models' ability to reflect word frequency in a corpus, aligning with the logit adjustment method commonly used in long-tailed learning. We also quantify the effect of controlling the biases in practical auto-regressive text generation scenarios; under a particular setting, more diverse text can be generated without compromising text quality.
Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models
Cheng, Myra, Durmus, Esin, Jurafsky, Dan
To recognize and mitigate harms from large language models (LLMs), we need to understand the prevalence and nuances of stereotypes in LLM outputs. Toward this end, we present Marked Personas, a prompt-based method to measure stereotypes in LLMs for intersectional demographic groups without any lexicon or data labeling. Grounded in the sociolinguistic concept of markedness (which characterizes explicitly linguistically marked categories versus unmarked defaults), our proposed method is twofold: 1) prompting an LLM to generate personas, i.e., natural language descriptions, of the target demographic group alongside personas of unmarked, default groups; 2) identifying the words that significantly distinguish personas of the target group from corresponding unmarked ones. We find that the portrayals generated by GPT-3.5 and GPT-4 contain higher rates of racial stereotypes than human-written portrayals using the same prompts. The words distinguishing personas of marked (non-white, non-male) groups reflect patterns of othering and exoticizing these demographics. An intersectional lens further reveals tropes that dominate portrayals of marginalized groups, such as tropicalism and the hypersexualization of minoritized women. These representational harms have concerning implications for downstream applications like story generation.
Writing user personas with Large Language Models: Testing phase 6 of a Thematic Analysis of semi-structured interviews
The goal of this paper is establishing if we can satisfactorily perform a Thematic Analysis (TA) of semi-structured interviews using a Large Language Model (more precisely GPT3.5-Turbo). Building on previous work by the author, which established an embryonal process for conducting a TA with the model, this paper will perform a further analysis and then cover the last phase of a TA (phase 6), which entails the writing up of the result. This phase was not covered by the previous work. In particular, the focus will be on using the results of a TA done with the LLM on a dataset of user interviews, for writing user personas, with the model building on the TA to produce the personas narratives. User personas are models of real users, usually built from a data analysis like interviews with a sample of users. User personas are tools often used in User Centered Design processes. The paper shows that the model can build basic user personas with an acceptable quality deriving them from themes, and that the model can serve for the generation of ideas for user personas.
A Corpus for Sentence-level Subjectivity Detection on English News Articles
Antici, Francesco, Galassi, Andrea, Ruggeri, Federico, Korre, Katerina, Muti, Arianna, Bardi, Alessandra, Fedotova, Alice, Barrón-Cedeño, Alberto
We present a novel corpus for subjectivity detection at the sentence level. We develop new annotation guidelines for the task, which are not limited to language-specific cues, and apply them to produce a new corpus in English. The corpus consists of 411 subjective and 638 objective sentences extracted from ongoing coverage of political affairs from online news outlets. This new resource paves the way for the development of models for subjectivity detection in English and across other languages, without relying on language-specific tools like lexicons or machine translation. We evaluate state-of-the-art multilingual transformer-based models on the task, both in mono- and cross-lingual settings, the latter with a similar existing corpus in Italian language. We observe that enriching our corpus with resources in other languages improves the results on the task.
A lawyer faces sanctions after he used ChatGPT to write a brief riddled with fake citations
With the hype around AI reaching a fever pitch in recent months, many people fear programs like ChatGPT will one day put them out of a job. For one New York lawyer, that nightmare could become a reality sooner than expected, but not for the reasons you might think. As reported by The New York Times, attorney Steven Schwartz of the law firm Levidow, Levidow and Oberman recently turned to OpenAI's chatbot for assistance with writing a legal brief, with predictably disastrous results. A lawyer used ChatGPT to do "legal research" and cited a number of nonexistent cases in a filing, and is now in a lot of trouble with the judge pic.twitter.com/AJSE7Ts7W7 Schwartz's firm has been suing the Columbian airline Avianca on behalf of Roberto Mata, who claims he was injured on a flight to John F. Kennedy International Airport in New York City.
'They're afraid their AIs will come for them': Doug Rushkoff on why tech billionaires are in escape mode
It was a tough week in tech. The top US health official warned about the risks of social media to young people; tech billionaire Elon Musk further trashed his reputation with the disastrous Twitter launch of a presidential campaign; and senior executives at OpenAI, makers of ChatGPT, called for the urgent regulation of "super intelligence". But to Doug Rushkoff – a leading digital age theorist, early cyberpunk and professor at City University of New York – the triple whammy of rough events represented some timely corrective justice for the tech barons of Silicon Valley. And more may be to come as new developments in tech come ever thicker and faster. "They're torturing themselves now, which is kind of fun to see. They're afraid that their little AIs are going to come for them. They're apocalyptic, and so existential, because they have no connection to real life and how things work. They're afraid the AIs are going to be as mean to them as they've been to us," Rushkoff told The Guardian in an interview.