Generative AI
Equity in the Use of ChatGPT for the Classroom: A Comparison of the Accuracy and Precision of ChatGPT 3.5 vs. ChatGPT4 with Respect to Statistics and Data Science Exams
The association of social mobility with a college education has been studied since the early 1950's [1]. Although there are some indications that a college education is not as effective as it once was in helping graduates climb the social ladder [2], it is still the most reliable way of doing so. US News & World Report updated its rankings in 2023 to include social mobility [3], and many institutions of higher education are paying more attention to recruitment of first-generation college students and talented students from disadvantaged backgrounds. With the inclusion of such students in the typical college class comes some important considerations. For example, a student from difficult financial circumstances with an academic background to match the profile of any student an elite institution will have more difficulty paying for textbooks, a laptop, a smartphone, and other items that are almost essential to current college life [2]. As of November 2022, one such item that students from advantaged backgrounds will have access to that those from lower income brackets will not is ChatGPT4 [4]. It currently costs $20 per month for a subscription and has been called a "significant leap forward" compared to ChatGPT3.5 [5], which is free [6]. While use of generative AI is prohibited in some college classrooms, this is hard to police, and many students use it regardless of classroom restrictions [7]. When generative AI is allowed, there is a wide array of platforms from which students can choose.
ChatGPT is getting ready to roll its Search tool out to everyone
If you've been waiting patiently to try ChatGPT Search, you won't have to wait much longer. After rolling out to paid subscribers this fall, OpenAI announced Monday it would make the tool available to everyone, no Plus or Pro membership necessary, "over the coming months." At that point, all you need before you can start using ChatGPT Search is an OpenAI account. Once you're logged in, and if your query calls for it, ChatGPT will automatically search the web for the latest information to answer your question. You can also force it to search the web, thanks to a handy new icon located right in the prompt bar.
From Tesla to Trump, Elon Musk had a very busy 2024
It has been a typically chaotic 12 months for Elon Musk. The world's richest person has been pulled from one direction to another in the running of his many companies โ with mixed results. How this moment for AI will change society forever (and how it won't) There is no doubt that the latest advances in artificial intelligence from OpenAI, Google, Baidu and others are more impressive than what came before, but are we in just another bubble of AI hype?
Large Language Models in Politics and Democracy: A Comprehensive Survey
The advancement of generative AI, particularly large language models (LLMs), has a significant impact on politics and democracy, offering potential across various domains, including policymaking, political communication, analysis, and governance. This paper surveys the recent and potential applications of LLMs in politics, examining both their promises and the associated challenges. This paper examines the ways in which LLMs are being employed in legislative processes, political communication, and political analysis. Moreover, we investigate the potential of LLMs in diplomatic and national security contexts, economic and social modeling, and legal applications. While LLMs offer opportunities to enhance efficiency, inclusivity, and decision-making in political processes, they also present challenges related to bias, transparency, and accountability. The paper underscores the necessity for responsible development, ethical considerations, and governance frameworks to ensure that the integration of LLMs into politics aligns with democratic values and promotes a more just and equitable society.
Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World
Kazdan, Joshua, Schaeffer, Rylan, Dey, Apratim, Gerstgrasser, Matthias, Rafailov, Rafael, Donoho, David L., Koyejo, Sanmi
The increasing presence of AI-generated content on the internet raises a critical question: What happens when generative machine learning models are pretrained on web-scale datasets containing data created by earlier models? Some authors prophesy \textit{model collapse} under a `{\it replace}' scenario: a sequence of models, the first trained with real data and each later one trained {\it only on} synthetic data from its preceding model. In this scenario, models successively degrade. Others see collapse as avoidable; in an `{\it accumulate}' scenario, a sequence of models is trained, but each training uses all real and synthetic data generated so far. In this work, we deepen and extend the study of these contrasting scenarios. First, collapse versus avoidance of collapse is studied by comparing the replace and accumulate scenarios on each of three prominent generative modeling settings; we find the same contrast emerges in all three settings. Second, we study a compromise scenario; the available data remains the same as in the {\it accumulate} scenario -- but unlike {\it accumulate} and like {\it replace}, each model is trained using a fixed compute budget; we demonstrate that model test loss on real data is larger than in the {\it accumulate} scenario, but apparently plateaus, unlike the divergence seen with {\it replace}. Third, we study the relative importance of cardinality and proportion of real data for avoiding model collapse. Surprisingly, we find a non-trivial interaction between real and synthetic data, where the value of synthetic data for reducing test loss depends on the absolute quantity of real data. Our insights are particularly important when forecasting whether future frontier generative models will collapse or thrive, and our results open avenues for empirically and mathematically studying the context-dependent value of synthetic data.
Generate Any Scene: Evaluating and Improving Text-to-Vision Generation with Scene Graph Programming
Gao, Ziqi, Huang, Weikai, Zhang, Jieyu, Kembhavi, Aniruddha, Krishna, Ranjay
DALL-E and Sora have gained attention by producing implausible images, such as "astronauts riding a horse in space." Despite the proliferation of text-to-vision models that have inundated the internet with synthetic visuals, from images to 3D assets, current benchmarks predominantly evaluate these models on real-world scenes paired with captions. We introduce Generate Any Scene, a framework that systematically enumerates scene graphs representing a vast array of visual scenes, spanning realistic to imaginative compositions. Generate Any Scene leverages 'scene graph programming', a method for dynamically constructing scene graphs of varying complexity from a structured taxonomy of visual elements. This taxonomy includes numerous objects, attributes, and relations, enabling the synthesis of an almost infinite variety of scene graphs. Using these structured representations, Generate Any Scene translates each scene graph into a caption, enabling scalable evaluation of text-to-vision models through standard metrics. We conduct extensive evaluations across multiple text-to-image, text-to-video, and text-to-3D models, presenting key findings on model performance. We find that DiT-backbone text-to-image models align more closely with input captions than UNet-backbone models. Text-to-video models struggle with balancing dynamics and consistency, while both text-to-video and text-to-3D models show notable gaps in human preference alignment. We demonstrate the effectiveness of Generate Any Scene by conducting three practical applications leveraging captions generated by Generate Any Scene: 1) a self-improving framework where models iteratively enhance their performance using generated data, 2) a distillation process to transfer specific strengths from proprietary models to open-source counterparts, and 3) improvements in content moderation by identifying and generating challenging synthetic data.
Specifications: The missing link to making the development of LLM systems an engineering discipline
Stoica, Ion, Zaharia, Matei, Gonzalez, Joseph, Goldberg, Ken, Sen, Koushik, Zhang, Hao, Angelopoulos, Anastasios, Patil, Shishir G., Chen, Lingjiao, Chiang, Wei-Lin, Davis, Jared Q.
Despite the significant strides made by generative AI in just a few short years, its future progress is constrained by the challenge of building modular and robust systems. This capability has been a cornerstone of past technological revolutions, which relied on combining components to create increasingly sophisticated and reliable systems. Cars, airplanes, computers, and software consist of components-such as engines, wheels, CPUs, and libraries-that can be assembled, debugged, and replaced. A key tool for building such reliable and modular systems is specification: the precise description of the expected behavior, inputs, and outputs of each component. However, the generality of LLMs and the inherent ambiguity of natural language make defining specifications for LLM-based components (e.g., agents) both a challenging and urgent problem. In this paper, we discuss the progress the field has made so far-through advances like structured outputs, process supervision, and test-time compute-and outline several future directions for research to enable the development of modular and reliable LLM-based systems through improved specifications.
Using Machine Learning to Distinguish Human-written from Machine-generated Creative Fiction
McGlinchey, Andrea Cristina, Barclay, Peter J
Following the universal availability of generative AI systems with the release of ChatGPT, automatic detection of deceptive text created by Large Language Models has focused on domains such as academic plagiarism and "fake news". However, generative AI also poses a threat to the livelihood of creative writers, and perhaps to literary culture in general, through reduction in quality of published material. Training a Large Language Model on writers' output to generate "sham books" in a particular style seems to constitute a new form of plagiarism. This problem has been little researched. In this study, we trained Machine Learning classifier models to distinguish short samples of human-written from machine-generated creative fiction, focusing on classic detective novels. Our results show that a Naive Bayes and a Multi-Layer Perceptron classifier achieved a high degree of success (accuracy > 95%), significantly outperforming human judges (accuracy < 55%). This approach worked well with short text samples (around 100 words), which previous research has shown to be difficult to classify. We have deployed an online proof-of-concept classifier tool, AI Detective, as a first step towards developing lightweight and reliable applications for use by editors and publishers, with the aim of protecting the economic and cultural contribution of human authors.
Do Tutors Learn from Equity Training and Can Generative AI Assess It?
Thomas, Danielle R., Borchers, Conrad, Kakarla, Sanjit, Lin, Jionghao, Bhushan, Shambhavi, Guo, Boyuan, Gatz, Erin, Koedinger, Kenneth R.
Equity is a core concern of learning analytics. However, applications that teach and assess equity skills, particularly at scale are lacking, often due to barriers in evaluating language. Advances in generative AI via large language models (LLMs) are being used in a wide range of applications, with this present work assessing its use in the equity domain. We evaluate tutor performance within an online lesson on enhancing tutors' skills when responding to students in potentially inequitable situations. We apply a mixed-method approach to analyze the performance of 81 undergraduate remote tutors. We find marginally significant learning gains with increases in tutors' self-reported confidence in their knowledge in responding to middle school students experiencing possible inequities from pretest to posttest. Both GPT-4o and GPT-4-turbo demonstrate proficiency in assessing tutors ability to predict and explain the best approach. Balancing performance, efficiency, and cost, we determine that few-shot learning using GPT-4o is the preferred model. This work makes available a dataset of lesson log data, tutor responses, rubrics for human annotation, and generative AI prompts. Future work involves leveling the difficulty among scenarios and enhancing LLM prompts for large-scale grading and assessment.
Look Ahead Text Understanding and LLM Stitching
This paper proposes a look ahead text understanding problem with look ahead section identification (LASI) as an example. This problem may appear in generative AI as well as human interactions, where we want to understand the direction of a developing text or conversation. We tackle the problem using transformer-based LLMs. We show that LASI is more challenging than classic section identification (SI). We argue that both bidirectional contextual information (e.g., BERT) and unidirectional predictive ability (e.g., GPT) will benefit the task. We propose two approaches to stitch together BERT and GPT. Experiments show that our approach outperforms the established models, especially when there is noise in the text (which is often the case for developing text in generative AI). Our paper sheds light on other look ahead text understanding tasks that are important to social media, such as look ahead sentiment classification, and points out the opportunities to leverage pre-trained LLMs through stitching.