Generative AI
Hot AI Jesus Is Huge on Facebook
Jesus is punching the devil on Facebook. The two are in a boxing ring. Jesus is wearing a pair of white boxing shorts with his name embroidered on the waistband. He is ripped beyond belief; not only does he have six-pack abs, but every muscle on his body is bulging. Jesus is hitting the devil directly on the chin, a knockout blow.
OpenAI, Microsoft sued by news nonprofit for copyright infringement
The Center for Investigative Reporting (CIR), which publishes Mother Jones and Reveal, said on Thursday that it had filed the lawsuit accusing the tech firms of using its content without permission in a "rebuke to artificial intelligence and its exploitative practices". "OpenAI and Microsoft started vacuuming up our stories to make their product more powerful, but they never asked for permission or offered compensation, unlike other organisations that license our material," Monika Bauerlein, CEO of the Center for Investigative Reporting, said in a statement. The work of journalists, at CIR and everywhere, is valuable, and OpenAI and Microsoft know it." OpenAI and Microsoft did not immediately respond to requests for comment. OpenAI's ChatGPT chatbot relies on vast quantities of information scraped from the internet, including news sites, to respond to users' queries.
Please don't get your news from AI chatbots
This is your periodic reminder that AI-powered chatbots still make up things and lie with all the confidence of a GPS system telling you that the shortest way home is to drive through the lake. My reminder comes courtesy of Nieman Lab, which ran an experiment to see if ChatGPT would provide correct links to articles from news publications it pays millions of dollars to. It turns out that ChatGPT does not. Instead, it confidently makes up entire URLs, a phenomenon that the AI industry calls "hallucinating," a term that seems more apt for a real person high on their own bullshit. Nieman Lab's Andrew Deck asked the service to provide links to high-profile, exclusive stories published by 10 publishers that OpenAI has struck deals worth millions of dollars with. These included the Associated Press, The Wall Street Journal, the Financial Times, The Times (UK), Le Monde, El Paรญs, The Atlantic, The Verge, Vox, and Politico.
SMLT-MUGC: Small, Medium, and Large Texts -- Machine versus User-Generated Content Detection and Comparison
Rawal, Anjali, Wang, Hui, Zheng, Youjia, Lin, Yu-Hsuan, Sushmita, Shanu
Large language models (LLMs) have gained significant attention due to their ability to mimic human language. Identifying texts generated by LLMs is crucial for understanding their capabilities and mitigating potential consequences. This paper analyzes datasets of varying text lengths: small, medium, and large. We compare the performance of machine learning algorithms on four datasets: (1) small (tweets from Election, FIFA, and Game of Thrones), (2) medium (Wikipedia introductions and PubMed abstracts), and (3) large (OpenAI web text dataset). Our results indicate that LLMs with very large parameters (such as the XL-1542 variant of GPT2 with 1542 million parameters) were harder (74%) to detect using traditional machine learning methods. However, detecting texts of varying lengths from LLMs with smaller parameters (762 million or less) can be done with high accuracy (96% and above). We examine the characteristics of human and machine-generated texts across multiple dimensions, including linguistics, personality, sentiment, bias, and morality. Our findings indicate that machine-generated texts generally have higher readability and closely mimic human moral judgments but differ in personality traits. SVM and Voting Classifier (VC) models consistently achieve high performance across most datasets, while Decision Tree (DT) models show the lowest performance. Model performance drops when dealing with rephrased texts, particularly shorter texts like tweets. This study underscores the challenges and importance of detecting LLM-generated texts and suggests directions for future research to improve detection methods and understand the nuanced capabilities of LLMs.
The Pitfalls of Publishing in the Age of LLMs: Strange and Surprising Adventures with a High-Impact NLP Journal
Verma, Rakesh M., Dershowitz, Nachum
In the dawn of the age of Large Language Models (LLMs), already much has been said about how researchers are making use of LLMs to author articles. For example, according to an article in Scientific American [1], "One percent of scientific articles published in 2023 showed signs of generative AI's potential involvement, according to a recent analysis." However, far less has been said about how reviewers are now abusing their role, sometimes with the editor's collusion. Here is our report of a case in point. We submitted a manuscript on domain-independent deception detection to a highly respected journal. As a consequence of a reviewer's use of an LLM, we both received a most peculiar review and also lost the promised confidentiality regarding our submission.
Deceptive Diffusion: Generating Synthetic Adversarial Examples
Beerens, Lucas, Higham, Catherine F., Higham, Desmond J.
We introduce the concept of deceptive diffusion -- training a generative AI model to produce adversarial images. Whereas a traditional adversarial attack algorithm aims to perturb an existing image to induce a misclassificaton, the deceptive diffusion model can create an arbitrary number of new, misclassified images that are not directly associated with training or test images. Deceptive diffusion offers the possibility of strengthening defence algorithms by providing adversarial training data at scale, including types of misclassification that are otherwise difficult to find. In our experiments, we also investigate the effect of training on a partially attacked data set. This highlights a new type of vulnerability for generative diffusion models: if an attacker is able to stealthily poison a portion of the training data, then the resulting diffusion model will generate a similar proportion of misleading outputs.
Can GPT-4 Help Detect Quit Vaping Intentions? An Exploration of Automatic Data Annotation Approach
Vuruma, Sai Krishna Revanth, Wu, Dezhi, Gupta, Saborny Sen, Aust, Lucas, Lookingbill, Valerie, Bellamy, Wyatt, Ren, Yang, Kasson, Erin, Chen, Li-Shiun, Cavazos-Rehg, Patricia, Hu, Dian, Huang, Ming
In recent years, the United States has witnessed a significant surge in the popularity of vaping or e-cigarette use, leading to a notable rise in cases of e-cigarette and vaping use-associated lung injury (EVALI) that caused hospitalizations and fatalities during the EVALI outbreak in 2019, highlighting the urgency to comprehend vaping behaviors and develop effective strategies for cessation. Due to the ubiquity of social media platforms, over 4.7 billion users worldwide use them for connectivity, communications, news, and entertainment with a significant portion of the discourse related to health, thereby establishing social media data as an invaluable organic data resource for public health research. In this study, we extracted a sample dataset from one vaping sub-community on Reddit to analyze users' quit-vaping intentions. Leveraging OpenAI's latest large language model GPT-4 for sentence-level quit vaping intention detection, this study compares the outcomes of this model against layman and clinical expert annotations. Using different prompting strategies such as zero-shot, one-shot, few-shot and chain-of-thought prompting, we developed 8 prompts with varying levels of detail to explain the task to GPT-4 and also evaluated the performance of the strategies against each other. These preliminary findings emphasize the potential of GPT-4 in social media data analysis, especially in identifying users' subtle intentions that may elude human detection.
Bringing Generative AI to Adaptive Learning in Education
Li, Hang, Xu, Tianlong, Zhang, Chaoli, Chen, Eason, Liang, Jing, Fan, Xing, Li, Haoyang, Tang, Jiliang, Wen, Qingsong
The recent surge in generative AI technologies, such as large language models and diffusion models, has boosted the development of AI applications in various domains, including science, finance, and education. Concurrently, adaptive learning, a concept that has gained substantial interest in the educational sphere, has proven its efficacy in enhancing students' learning efficiency. In this position paper, we aim to shed light on the intersectional studies of these two methods, which combine generative AI with adaptive learning concepts. By presenting discussions about the benefits, challenges, and potentials in this field, we argue that this union will contribute significantly to the development of the next-stage learning format in education.
LatentExplainer: Explaining Latent Representations in Deep Generative Models with Multi-modal Foundation Models
Zhu, Mengdan, Kanjiani, Raasikh, Lu, Jiahui, Choi, Andrew, Ye, Qirui, Zhao, Liang
Deep generative models like VAEs and diffusion models have advanced various generation tasks by leveraging latent variables to learn data distributions and generate high-quality samples. Despite the field of explainable AI making strides in interpreting machine learning models, understanding latent variables in generative models remains challenging. This paper introduces LatentExplainer, a framework for automatically generating semantically meaningful explanations of latent variables in deep generative models. LatentExplainer tackles three main challenges: inferring the meaning of latent variables, aligning explanations with inductive biases, and handling varying degrees of explainability. By perturbing latent variables and interpreting changes in generated data, the framework provides a systematic approach to understanding and controlling the data generation process, enhancing the transparency and interpretability of deep generative models. We evaluate our proposed method on several real-world and synthetic datasets, and the results demonstrate superior performance in generating high-quality explanations of latent variables.
The nation's oldest nonprofit newsroom is suing OpenAI and Microsoft
The Center for Investigative Reporting, the nation's oldest nonprofit newsroom that produces Mother Jones and Reveal sued OpenAI and Microsoft in federal court on Thursday for allegedly using its content to train AI models without consent or compensation. "OpenAI and Microsoft started vacuuming up our stories to make their product more powerful, but they never asked for permission or offered compensation, unlike other organizations that license our material," said Monika Bauerlein, CEO of the Center for Investigative Reporting, in a statement. The work of journalists, at CIR and everywhere, is valuable, and OpenAI and Microsoft know it." Bauerlein said that OpenAI and Microsoft treat the work of nonprofit and independent publishers "as free raw material for their products," and added that such moves by generative AI companies hurt the public's access to truthful information in a "disappearing news landscape." OpenAI and Microsoft did not respond to a request for comment by Engadget.