Generative AI
A Research Leader Behind ChatGPT's Mental Health Work Is Leaving OpenAI
A Research Leader Behind ChatGPT's Mental Health Work Is Leaving OpenAI The model policy team leads core parts of AI safety research, including how ChatGPT responds to users in crisis. An OpenAI safety research leader who helped shape ChatGPT's responses to users experiencing mental health crises announced her departure from the company internally last month, WIRED has learned. Andrea Vallone, the head of a safety research team known as model policy, is slated to leave OpenAI at the end of the year. Wood said OpenAI is actively looking for a replacement and that, in the interim, Vallone's team will report directly to Johannes Heidecke, the company's head of safety systems. Vallone's departure comes as OpenAI faces growing scrutiny over how its flagship product responds to users in distress .
Don't Learn, Ground: A Case for Natural Language Inference with Visual Grounding
Ignatev, Daniil, Santeer, Ayman, Gatt, Albert, Paperno, Denis
We propose a zero-shot method for Natural Language Inference (NLI) that leverages multimodal representations by grounding language in visual contexts. Our approach generates visual representations of premises using text-to-image models and performs inference by comparing these representations with textual hypotheses. We evaluate two inference techniques: cosine similarity and visual question answering. Our method achieves high accuracy without task-specific fine-tuning, demonstrating robustness against textual biases and surface heuristics. Additionally, we design a controlled adversarial dataset to validate the robustness of our approach. Our findings suggest that leveraging visual modality as a meaning representation provides a promising direction for robust natural language understanding.
Monte Carlo Expected Threat (MOCET) Scoring
Evaluating and measuring AI Safety Level (ASL) threats are crucial for guiding stakeholders to implement safeguards that keep risks within acceptable limits. ASL-3+ models present a unique risk in their ability to uplift novice non-state actors, especially in the realm of biosecurity. Existing evaluation metrics, such as LAB-Bench, BioLP-bench, and WMDP, can reliably assess model uplift and domain knowledge. However, metrics that better contextualize "real-world risks" are needed to inform the safety case for LLMs, along with scalable, open-ended metrics to keep pace with their rapid advancements. To address both gaps, we introduce MOCET, an interpretable and doubly-scalable metric (automatable and open-ended) that can quantify real-world risks.
Stable diffusion models reveal a persisting human and AI gap in visual creativity
Rondini, Silvia, Alvarez-Martin, Claudia, Angermair-Barkai, Paula, Penacchio, Olivier, Paz, M., Pelowski, Matthew, Dediu, Dan, Rodriguez-Fornells, Antoni, Cerda-Company, Xim
While recent research suggests Large Language Models match human creative performance in divergent thinking tasks, visual creativity remains underexplored. This study compared image generation in human participants (Visual Artists and Non Artists) and using an image generation AI model (two prompting conditions with varying human input: high for Human Inspired, low for Self Guided). Human raters (N=255) and GPT4o evaluated the creativity of the resulting images. We found a clear creativity gradient, with Visual Artists being the most creative, followed by Non Artists, then Human Inspired generative AI, and finally Self Guided generative AI. Increased human guidance strongly improved GenAI's creative output, bringing its productions close to those of Non Artists. Notably, human and AI raters also showed vastly different creativity judgment patterns. These results suggest that, in contrast to language centered tasks, GenAI models may face unique challenges in visual domains, where creativity depends on perceptual nuance and contextual sensitivity, distinctly human capacities that may not be readily transferable from language models.
Supervised Contrastive Learning for Few-Shot AI-Generated Image Detection and Attribution
Urueña, Jaime Álvarez, Camacho, David, Tato, Javier Huertas
The rapid advancement of generative artificial intelligence has enabled the creation of synthetic images that are increasingly indistinguishable from authentic content, posing significant challenges for digital media integrity. This problem is compounded by the accelerated release cycle of novel generative models, which renders traditional detection approaches (reliant on periodic retraining) computationally infeasible and operationally impractical. This work proposes a novel two-stage detection framework designed to address the generalization challenge inherent in synthetic image detection. The first stage employs a vision deep learning model trained via supervised contrastive learning to extract discriminative embeddings from input imagery. Critically, this model was trained on a strategically partitioned subset of available generators, with specific architectures withheld from training to rigorously ablate cross-generator generalization capabilities. The second stage utilizes a k-nearest neighbors (k-NN) classifier operating on the learned embedding space, trained in a few-shot learning paradigm incorporating limited samples from previously unseen test generators. With merely 150 images per class in the few-shot learning regime, which are easily obtainable from current generation models, the proposed framework achieves an average detection accuracy of 91.3%, representing a 5.2 percentage point improvement over existing approaches . For the source attribution task, the proposed approach obtains improvements of of 14.70% and 4.27% in AUC and OSCR respectively on an open set classification context, marking a significant advancement toward robust, scalable forensic attribution systems capable of adapting to the evolving generative AI landscape without requiring exhaustive retraining protocols.
Evaluating Large Language Models for Diacritic Restoration in Romanian Texts: A Comparative Study
Automatic diacritic restoration is crucial for text processing in languages with rich diacritical marks, such as Romanian. This study evaluates the performance of several large language models (LLMs) in restoring diacritics in Romanian texts. Using a comprehensive corpus, we tested models including OpenAI's GPT-3.5, GPT-4, GPT-4o, Google's Gemini 1.0 Pro, Meta's Llama 2 and Llama 3, MistralAI's Mixtral 8x7B Instruct, airoboros 70B, and OpenLLM-Ro's RoLlama 2 7B, under multiple prompt templates ranging from zero-shot to complex multi-shot instructions. Results show that models such as GPT-4o achieve high diacritic restoration accuracy, consistently surpassing a neutral echo baseline, while others, including Meta's Llama family, exhibit wider variability. These findings highlight the impact of model architecture, training data, and prompt design on diacritic restoration performance and outline promising directions for improving NLP tools for diacritic-rich languages.
Model-Agnostic Gender Bias Control for Text-to-Image Generation via Sparse Autoencoder
Wu, Chao, Wang, Zhenyi, Xie, Kangxian, Devulapally, Naresh Kumar, Lokhande, Vishnu Suresh, Gao, Mingchen
Text-to-image (T2I) diffusion models often exhibit gender bias, particularly by generating stereotypical associations between professions and gendered subjects. This paper presents SAE Debias, a lightweight and model-agnostic framework for mitigating such bias in T2I generation. Unlike prior approaches that rely on CLIP-based filtering or prompt engineering, which often require model-specific adjustments and offer limited control, SAE Debias operates directly within the feature space without retraining or architectural modifications. By leveraging a k-sparse autoencoder pre-trained on a gender bias dataset, the method identifies gender-relevant directions within the sparse latent space, capturing professional stereotypes. Specifically, a biased direction per profession is constructed from sparse latents and suppressed during inference to steer generations toward more gender-balanced outputs. Trained only once, the sparse autoencoder provides a reusable debiasing direction, offering effective control and interpretable insight into biased subspaces. Extensive evaluations across multiple T2I models, including Stable Diffusion 1.4, 1.5, 2.1, and SDXL, demonstrate that SAE Debias substantially reduces gender bias while preserving generation quality. To the best of our knowledge, this is the first work to apply sparse autoencoders for identifying and intervening in gender bias within T2I models. These findings contribute toward building socially responsible generative AI, providing an interpretable and model-agnostic tool to support fairness in text-to-image generation.
OpenAI Locks Down San Francisco Offices Following Alleged Threat From Activist
A message on OpenAI's internal Slack claimed the activist in question had expressed interest in "causing physical harm to OpenAI employees." OpenAI employees in San Francisco were told to stay inside the office on Friday afternoon after the company purportedly received a threat from an individual who was previously associated with the Stop AI activist group. "Our information indicates that [name] from StopAI has expressed interest in causing physical harm to OpenAI employees," a member of the internal communications team wrote on Slack. "He has previously been on site at our San Francisco facilities." Just before 11 am, San Francisco police received a 911 call about a man allegedly making threats and intending to harm others at 550 Terry Francois Boulevard, which is near OpenAI's offices in the Mission Bay neighborhood, according to data tracked by the crime app Citizen.
The Download: the secrets of vitamin D, and an AI party in Africa
Plus: Google's new image generator has extremely loose guardrails We're learning more about what vitamin D does to our bodies At a checkup a few years ago, a doctor told me I was deficient in vitamin D. But he wouldn't write me a prescription for supplements, simply because, as he put it, everyone in the UK is deficient. Putting the entire population on vitamin D supplements would be too expensive for the country's national health service, he told me. But supplementation--whether covered by a health-care provider or not--can be important. As those of us living in the Northern Hemisphere spend fewer of our waking hours in sunlight, let's consider the importance of vitamin D. Read the full story . This article first appeared in The Checkup, MIT Technology Review's weekly biotech newsletter. Here's why we don't have a cold vaccine.