Generative AI
Why GPT-4o's sudden shutdown left people grieving
OpenAI's decision to replace 4o with the more straightforward GPT-5 follows a steady drumbeat of news about the potentially harmful effects of extensive chatbot use. Reports of incidents in which ChatGPT sparked psychosis in users have been everywhere for the past few months, and in a blog post last week, OpenAI acknowledged 4o's failure to recognize when users were experiencing delusions. The company's internal evaluations indicate that GPT-5 blindly affirms users much less than 4o did. AI companionship is new, and there's still a great deal of uncertainty about how it affects people. Yet the experts we consulted warned that while emotionally intense relationships with large language models may or may not be harmful, ripping those models away with no warning almost certainly is.
Divergence Frontiers for Generative Models: Sample Complexity, Quantization Effects, and Frontier Integrals Lang Liu 1 Krishna Pillutla 2 Sean Welleck 2,3 Sewoong Oh
The spectacular success of deep generative models calls for quantitative tools to measure their statistical performance. Divergence frontiers have recently been proposed as an evaluation framework for generative models, due to their ability to measure the quality-diversity trade-off inherent to deep generative modeling.
div-frontier
The spectacular success of deep generative models calls for quantitative tools to measure their statistical performance. Divergence frontiers have recently been proposed as an evaluation framework for generative models, due to their ability to measure the quality-diversity trade-off inherent to deep generative modeling.
Researchers asked AI to show a typical Australian dad: he was white and had an iguana Tama Leaver and Suzanne Srdarov for the Conversation
Big tech company hype sells generative artificial intelligence (AI) as intelligent, creative, desirable, inevitable and about to radically reshape the future in many ways. Published by Oxford University Press, our new research on how generative AI depicts Australian themes directly challenges this perception. We found when generative AIs produce images of Australia and Australians, these outputs are riddled with bias. They reproduce sexist and racist caricatures more at home in the country's imagined monocultural past. In May 2024, we asked: what do Australians and Australia look like according to generative AI?
Echoes of Automation: The Increasing Use of LLMs in Newsmaking
Ansari, Abolfazl, Zhang, Delvin Ce, Tripto, Nafis Irtiza, Lee, Dongwon
The rapid rise of Generative AI (GenAI), particularly LLMs, poses concerns for journalistic integrity and authorship. This study examines AI-generated content across over 40,000 news articles from major, local, and college news media, in various media formats. Using three advanced AI-text detectors (e.g., Binoculars, Fast-Detect GPT, and GPTZero), we find substantial increase of GenAI use in recent years, especially in local and college news. Sentence-level analysis reveals LLMs are often used in the introduction of news, while conclusions usually written manually. Linguistic analysis shows GenAI boosts word richness and readability but lowers formality, leading to more uniform writing styles, particularly in local media.
Geospatial Diffusion for Land Cover Imperviousness Change Forecasting
Varshney, Debvrat, Vats, Vibhas, Pandey, Bhartendu, Brelsford, Christa, Dias, Philipe
Land cover, both present and future, has a significant effect on several important Earth system processes. For example, impervious surfaces heat up and speed up surface water runoff and reduce groundwater infiltration, with concomitant effects on regional hydrology and flood risk. While regional Earth System models have increasing skill at forecasting hydrologic and atmospheric processes at high resolution in future climate scenarios, our ability to forecast land-use and land-cover change (LULC), a critical input to risk and consequences assessment for these scenarios, has lagged behind. In this paper, we propose a new paradigm exploiting Generative AI (GenAI) for land cover change forecasting by framing LULC forecasting as a data synthesis problem conditioned on historical and auxiliary data-sources. We discuss desirable properties of generative models that fundament our research premise, and demonstrate the feasibility of our methodology through experiments on imperviousness forecasting using historical data covering the entire conterminous United States. Specifically, we train a diffusion model for decadal forecasting of imperviousness and compare its performance to a baseline that assumes no change at all. Evaluation across 12 metropolitan areas for a year held-out during training indicate that for average resolutions $\geq 0.7\times0.7km^2$ our model yields MAE lower than such a baseline. This finding corroborates that such a generative model can capture spatiotemporal patterns from historical data that are significant for projecting future change. Finally, we discuss future research to incorporate auxiliary information on physical properties about the Earth, as well as supporting simulation of different scenarios by means of driver variables.
Mathematical Computation and Reasoning Errors by Large Language Models
Zhang, Liang, Graf, Edith Aurora
Large Language Models (LLMs) are increasingly utilized in AI-driven educational instruction and assessment, particularly within mathematics education. The capability of LLMs to generate accurate answers and detailed solutions for math problem-solving tasks is foundational for ensuring reliable and precise feedback and assessment in math education practices. Our study focuses on evaluating the accuracy of four LLMs (OpenAI GPT-4o and o1, DeepSeek-V3 and DeepSeek-R1) solving three categories of math tasks, including arithmetic, algebra, and number theory, and identifies step-level reasoning errors within their solutions. Instead of relying on standard benchmarks, we intentionally build math tasks (via item models) that are challenging for LLMs and prone to errors. The accuracy of final answers and the presence of errors in individual solution steps were systematically analyzed and coded. Both single-agent and dual-agent configurations were tested. It is observed that the reasoning-enhanced OpenAI o1 model consistently achieved higher or nearly perfect accuracy across all three math task categories. Analysis of errors revealed that procedural slips were the most frequent and significantly impacted overall performance, while conceptual misunderstandings were less frequent. Deploying dual-agent configurations substantially improved overall performance. These findings offer actionable insights into enhancing LLM performance and underscore effective strategies for integrating LLMs into mathematics education, thereby advancing AI-driven instructional practices and assessment precision.
Swedish Whispers; Leveraging a Massive Speech Corpus for Swedish Speech Recognition
Vesterbacka, Leonora, Rekathati, Faton, Kurtz, Robin, Sikora, Justyna, Toftgård, Agnes
This work presents a suite of fine-tuned Whisper models for Swedish, trained on a dataset of unprecedented size and variability for this mid-resourced language. As languages of smaller sizes are often underrepresented in multilingual training datasets, substantial improvements in performance can be achieved by fine-tuning existing multilingual models, as shown in this work. This work reports an overall improvement across model sizes compared to OpenAI's Whisper evaluated on Swedish. Most notably, we report an average 47% reduction in WER comparing our best performing model to OpenAI's whisper-large-v3, in evaluations across FLEURS, Common V oice, and NST.
Improving watermelon (Citrullus lanatus) disease classification with generative artificial intelligence (GenAI)-based synthetic and real-field images via a custom EfficientNetV2-L model
Rai, Nitin, Boyd, Nathan S., Vallad, Gary E., Schumann, Arnold W.
The current advancements in generative artificial intelligence (GenAI) models have paved the way for new possibilities for generating high-resolution synthetic images, thereby offering a promising alternative to traditional image acquisition for training computer vision models in agriculture. In the context of crop disease diagnosis, GenAI models are being used to create synthetic images of various diseases, potentially facilitating model creation and reducing the dependency on resource-intensive in-field data collection. However, limited research has been conducted on evaluating the effectiveness of integrating real with synthetic images to improve disease classification performance. Therefore, this study aims to investigate whether combining a limited number of real images with synthetic images can enhance the prediction accuracy of an EfficientNetV2-L model for classifying watermelon \textit{(Citrullus lanatus)} diseases. The training dataset was divided into five treatments: H0 (only real images), H1 (only synthetic images), H2 (1:1 real-to-synthetic), H3 (1:10 real-to-synthetic), and H4 (H3 + random images to improve variability and model generalization). All treatments were trained using a custom EfficientNetV2-L architecture with enhanced fine-tuning and transfer learning techniques. Models trained on H2, H3, and H4 treatments demonstrated high precision, recall, and F1-score metrics. Additionally, the weighted F1-score increased from 0.65 (on H0) to 1.00 (on H3-H4) signifying that the addition of a small number of real images with a considerable volume of synthetic images improved model performance and generalizability. Overall, this validates the findings that synthetic images alone cannot adequately substitute for real images; instead, both must be used in a hybrid manner to maximize model performance for crop disease classification.
Detecting and explaining postpartum depression in real-time with generative artificial intelligence
García-Méndez, Silvia, de Arriba-Pérez, Francisco
Among the many challenges mothers undergo after childbirth, postpartum depression ( ppd) is a severe condition that significantly impacts their mental and physical well-being. Consequently, the rapid detection of ppd and their associated risk factors is critical for in-time assessment and intervention through specialized prevention procedures. Accordingly, this work addresses the need to help practitioners make decisions with the latest technological advancements to enable real-time screening and treatment recommendations. Mainly, our work contributes to an intelligent ppd screening system that combines Natural Language Processing, Machine Learning ( ml), and Large Language Models ( llm s) towards an affordable, real-time, and non-invasive free speech analysis. Moreover, it addresses the black box problem since the predictions are described to the end users thanks to the combination of llm s with interpretable ml models ( i.e., tree-based algorithms) using feature importance and natural language. The results obtained are 90 % on ppd detection for all evaluation metrics, outperforming the competing solutions in the literature. Ultimately, our solution contributes to the rapid detection of ppd and their associated risk factors, critical for in-time and proper assessment and intervention. Introduction Depression is a global public health concern that affects more than 150 million people, being more prevalent in women (Labaka et al., 2018; Moreira et al., 2019). Among the many challenges mothers undergo after childbirth, postpartum depression ( ppd) is a severe condition that usually requires medical intervention (Falana & Carrington, 2019). Mainly, ppd is a common non-psychotic mental disorder during the first year after childbirth that can lead to severe complications in the women's health (Abadiga, 2019). Current data indicates that between 10 % to 15 % of mothers worldwide are affected with ppd yearly (Fatima et al., 2019; Liu et al., 2023). Moreover, only 20% of the target population is diagnosed or even treated promptly (Mazumder & Baruah, 2021).