Generative AI
The Download: RIP EV tax credits, and OpenAI's new valuation
EV tax credits are dead in the US. Federal EV tax credits in the US officially came to an end yesterday. Those credits, expanded and extended in the 2022 Inflation Reduction Act, gave drivers up to $7,500 toward the purchase of a new electric vehicle. They've been a major force in cutting the up-front costs of EVs, pushing more people toward purchasing them and giving automakers confidence that demand would be strong. The tax credits' demise comes at a time when battery-electric vehicles still make up a small percentage of new vehicle sales in the country. This article is from The Spark, MIT Technology Review's weekly climate newsletter.
Max-Margin Deep Generative Models
Chongxuan Li, Jun Zhu, Tianlin Shi, Bo Zhang
Deep generative models (DGMs) are effective on learning multilayered representations of complex data and performing inference of input data by exploring the generative ability. However, little work has been done on examining or empowering the discriminative ability of DGMs on making accurate predictions. This paper presents max-margin deep generative models (mmDGMs), which explore the strongly discriminative principle of max-margin learning to improve the discriminative power of DGMs, while retaining the generative capability. We develop an efficient doubly stochastic subgradient algorithm for the piecewise linear objective. Empirical results on MNIST and SVHN datasets demonstrate that (1) max-margin learning can significantly improve the prediction performance of DGMs and meanwhile retain the generative ability; and (2) mmDGMs are competitive to the state-of-the-art fully discriminative networks by employing deep convolutional neural networks (CNNs) as both recognition and generative models.
Japan's Digital Agency to cooperate with OpenAI on administrative tools
Japan's Digital Agency to cooperate with OpenAI on administrative tools The Digital Agency will enable its employees to use OpenAI's cutting-edge large language model-based AI tools for their work. The Digital Agency said Thursday that it will cooperate with OpenAI to fully use artificial intelligence technology in administrative work and service. As part of the initiative, the agency will enable its employees to use OpenAI's cutting-edge large language model-based AI tools for their work. It is also considering joint development with the U.S. company of a generative AI app for administrative use. The agency plans to provide its employees with access to generative AI tools and encourage other government agencies to adopt these services starting as early as fiscal year 2026.
Simulating Student Success in the Age of GenAI: A Kantian-Axiomatic Perspective
This study reinterprets a Monte Carlo simulation of students' perceived success with generative AI (GenAI) through a Kantian-axiomatic lens. Building on prior work, theme-level survey statistics Ease of Use and Learnability, System Efficiency and Learning Burden, and Perceived Complexity and Integration from a representative dataset are used to generate 10,000 synthetic scores per theme on the [1,5] Likert scale. The simulated outputs are evaluated against the axioms of dense linear order without endpoints (DLO): irreflexivity, transitivity, total comparability (connectedness), no endpoints (no greatest and no least; A4-A5), and density (A6). At the data level, the basic ordering axioms (A1-A3) are satisfied, whereas no-endpoints (A4-A5) and density (A6) fail as expected. Likert clipping introduces minimum and maximum observed values, and a finite, discretized sample need not contain a value strictly between any two distinct scores. These patterns are read not as methodological defects but as markers of an epistemological boundary. Following Kant and Friedman, the findings suggest that what simulations capture finite, quantized observations cannot instantiate the ideal properties of an unbounded, dense continuum. Such properties belong to constructive intuition rather than to finite sampling alone. A complementary visualization contrasts the empirical histogram with a sine-curve proxy to clarify this divide. The contribution is interpretive rather than data-expansive: it reframes an existing simulation as a probe of the synthetic a priori structure underlying students' perceptions, showing how formal order-theoretic coherence coexists with principled failures of endpoint-freeness and density in finite empirical models.
Beyond the Prompt: Gender Bias in Text-to-Image Models, with a Case Study on Hospital Professions
Vandewiele, Franck, Synave, Remi, Delepoulle, Samuel, Cozot, Remi
Text-to-image (TTI) models are increasingly used in professional, educational, and creative contexts, yet their outputs often embed and amplify social biases. This paper investigates gender representation in six state-of-the-art open-weight models: HunyuanImage 2.1, HiDream-I1-dev, Qwen-Image, FLUX.1-dev, Stable-Diffusion 3.5 Large, and Stable-Diffusion-XL. Using carefully designed prompts, we generated 100 images for each combination of five hospital-related professions (cardiologist, hospital director, nurse, paramedic, surgeon) and five portrait qualifiers ("", corporate, neutral, aesthetic, beautiful). Our analysis reveals systematic occupational stereotypes: all models produced nurses exclusively as women and surgeons predominantly as men. However, differences emerge across models: Qwen-Image and SDXL enforce rigid male dominance, HiDream-I1-dev shows mixed outcomes, and FLUX.1-dev skews female in most roles. HunyuanImage 2.1 and Stable-Diffusion 3.5 Large also reproduce gender stereotypes but with varying degrees of sensitivity to prompt formulation. Portrait qualifiers further modulate gender balance, with terms like corporate reinforcing male depictions and beautiful favoring female ones. Sensitivity varies widely: Qwen-Image remains nearly unaffected, while FLUX.1-dev, SDXL, and SD3.5 show strong prompt dependence. These findings demonstrate that gender bias in TTI models is both systematic and model-specific. Beyond documenting disparities, we argue that prompt wording plays a critical role in shaping demographic outcomes. The results underscore the need for bias-aware design, balanced defaults, and user guidance to prevent the reinforcement of occupational stereotypes in generative AI.
Review of Hallucination Understanding in Large Language and Vision Models
Ho, Zhengyi, Liang, Siyuan, Tao, Dacheng
The widespread adoption of large language and vision models in real-world applications has made urgent the need to address hallucinations -- instances where models produce incorrect or nonsensical outputs. These errors can propagate misinformation during deployment, leading to both financial and operational harm. Although much research has been devoted to mitigating hallucinations, our understanding of it is still incomplete and fragmented. Without a coherent understanding of hallucinations, proposed solutions risk mitigating surface symptoms rather than underlying causes, limiting their effectiveness and generalizability in deployment. To tackle this gap, we first present a unified, multi-level framework for characterizing both image and text hallucinations across diverse applications, aiming to reduce conceptual fragmentation. We then link these hallucinations to specific mechanisms within a model's lifecycle, using a task-modality interleaved approach to promote a more integrated understanding. Our investigations reveal that hallucinations often stem from predictable patterns in data distributions and inherited biases. By deepening our understanding, this survey provides a foundation for developing more robust and effective solutions to hallucinations in real-world generative AI systems.
RADAR: Reasoning-Ability and Difficulty-Aware Routing for Reasoning LLMs
Fernandez, Nigel, Kveton, Branislav, Rossi, Ryan A., Lan, Andrew S., Wang, Zichao
Reasoning language models have demonstrated remarkable performance on many challenging tasks in math, science, and coding. Choosing the right reasoning model for practical deployment involves a performance and cost tradeoff at two key levels: model size and reasoning budget, where larger models and higher reasoning budget lead to better performance but with increased cost and latency. Recent advances in large language models (LLMs) have leveraged reinforcement learning (RL) (Shao et al., 2024) to train models to reason using chain-of-thought before generating an output. The excitement has led to a flurry of new open-source and proprietary RLMs; for example, Hugging Face already lists 2, 710 RLMs as of September 17th, 2025. These models have varying sizes, specialize in different domains, and offer various configurations, including reasoning efforts to balance performance and cost. For example, OpenAI's reasoning models (OpenAI & et al., 2024) have "low", "medium", and "high" reasoning budgets for developers to choose from depending on their application. Always choosing the "best" and most expensive RLM configuration with the highest level of reasoning budget is not always the "right" choice for every query: for some simpler queries, there might exist a "worse" and cheaper RLM configuration with low or no reasoning budget that correctly answers the query, resulting in significant cost savings without sacrificing performance. Indeed, we empirically observe the same phenomenon in Figure 1, where we show that over 50% of the queries from MA TH-500 (Hendrycks et al., 2021c) can be solved using an RLM as small as Qwen3-0.6B with minimal reasoning budget (measured by the number of reasoning tokens). On the contrary, some challenging queries require a much more capable RLM with high reasoning effort. Strong RLMs can also "over-think" which could hurt performance even for simple queries (Su et al., 2025; Hassid et al., 2025; Hong et al., 2025; Shojaee et al., 2025; Ghosal et al., 2025). This performance-cost tradeoff presents a challenge for practitioners: how to choose the "right" RLM and its configu-Work done during an internship at Adobe. Figure 1: Left: Our pilot study on MA TH-500 (Hendrycks et al., 2021c) shows a performance differential over (RLM, reasoning budget) configurations with the smallest RLM already solving over 50% of the queries with minimal reasoning.