Generative AI
Challenges for AI in Multimodal STEM Assessments: a Human-AI Comparison
de Chillaz, Aymeric, Sotnikova, Anna, Jermann, Patrick, Bosselut, Antoine
Generative AI systems have rapidly advanced, with multimodal input capabilities enabling reasoning beyond text-based tasks. In education, these advancements could influence assessment design and question answering, presenting both opportunities and challenges. To investigate these effects, we introduce a high-quality dataset of 201 university-level STEM questions, manually annotated with features such as image type, role, problem complexity, and question format. Our study analyzes how these features affect generative AI performance compared to students. We evaluate four model families with five prompting strategies, comparing results to the average of 546 student responses per question. Although the best model correctly answers on average 58.5 % of the questions using majority vote aggregation, human participants consistently outperform AI on questions involving visual components. Interestingly, human performance remains stable across question features but varies by subject, whereas AI performance is susceptible to both subject matter and question features. Finally, we provide actionable insights for educators, demonstrating how question design can enhance academic integrity by leveraging features that challenge current AI systems without increasing the cognitive burden for students.
Left-leaning actress Natasha Lyonne leading efforts to lobby Trump admin on AI regulation
The'Outnumbered' panel discusses how celebrities have reportedly authored a letter to President Donald Trump seeking his protection against artificial intelligence after smearing his name for years. Left-leaning actress Natasha Lyonne is at the forefront of Hollywood efforts to get the government to address creators' concerns about AI infringing on their work. "My primary interest is that people get paid for their life's work," Lyonne said in a report in the Wall Street Journal. The story detailed Lyonne's efforts to lobby Hollywood heavyweights to sign onto her letter to the Trump administration in March, urging against the loosening of regulations around AI, which they deem a potential threat to their intellectual property without proper protections in place. REPUBLICANS SCRAP DEAL IN'BIG, BEAUTIFUL BILL' TO LOWER RESTRICTIONS ON STATES' AI REGULATIONS Natasha Lyonne is among Hollywood figures lobbying the Trump administration to rein in AI. (Photo by Araya Doheny/Getty Images) The Hollywood letter said the companies "are arguing for a special government exemption so they can freely exploit America's creative and knowledge industries, despite their substantial revenues and available funds. Lyonne and more than 400 others, including such figures as Paul McCartney, Ron Howard and Ben Stiller, signed the letter. Lyonne, known for her roles in the series "Poker Face" and "Russian Doll," is a partner in a new studio called Asteria, which describes itself as "an artist-led generative AI film and animation studio powered by the first clean and ethical AI model." Like many figures in Hollywood, Lyonne is not a fan of the president, endorsing Kamala Harris in 2024 and posting in a now-deleted X post in 2020 about turning Texas blue to defeat Trump. Earlier this year, she told The Hollywood Reporter she was concerned for marginalized communities, saying of Trump, "It's very weird to have like a showbiz guy in charge, is surreal.
Red Teaming for Generative AI, Report on a Copyright-Focused Exercise Completed in an Academic Medical Center
Wen, James, Nalawade, Sahil, Liang, Zhiwei, Bielick, Catherine, Boston, Marisa Ferrara, Chowdhury, Alexander, Collin, Adele, De Angelis, Luigi, Ellen, Jacob, Frase, Heather, Gameiro, Rodrigo R., Gutierrez, Juan Manuel, Kadam, Pooja, Keceli, Murat, Krishnamurthy, Srikanth, Kwok, Anne, Lu, Yanan Lance, Mattie, Heather, McCoy, Liam G., Miller, Katherine, Morgan, Allison C., Moerig, Marlene Louisa, Nguyen, Trang, Owen-Post, Alexander, Ruiz, Alex D., Puchala, Sreekar Reddy, Samineni, Soujanya, Tohyama, Takeshi, Ullanat, Varun, Valenza, Carmine, Velez, Camilo, Wang, Pengcheng, Wuest, Anna, Zhou, Yuxiang, Zhu, Yingde, Johnson, Jason M., Lenane, Naomi, Willcox, Jennifer, Vitiello, Francis J., Celi, Leo Anthony G., Umeton, Renato
Background: Generative artificial intelligence (AI) deployment in academic medical settings raises copyright compliance concerns. Dana-Farber Cancer Institute implemented GPT4DFCI, an internal generative AI tool utilizing OpenAI models, that is approved for enterprise use in research and operations. Given (1) the exceptionally broad adoption of the tool in our organization, (2) our research mission, and (3) the shared responsibility model required to benefit from Customer Copyright Commitment in Azure OpenAI Service products, we deemed rigorous copyright compliance testing necessary. Case Description: We conducted a structured red teaming exercise in Nov. 2024, with 42 participants from academic, industry, and government institutions. Four teams attempted to extract copyrighted content from GPT4DFCI across four domains: literary works, news articles, scientific publications, and access-restricted clinical notes. Teams successfully extracted verbatim book dedications and near-exact passages through various strategies. News article extraction failed despite jailbreak attempts. Scientific article reproduction yielded only high-level summaries. Clinical note testing revealed appropriate privacy safeguards. Discussion: The successful extraction of literary content indicates potential copyrighted material presence in training data, necessitating inference-time filtering. Differential success rates across content types suggest varying protective mechanisms. The event led to implementation of a copyright-specific meta-prompt in GPT4DFCI; this mitigation has been in production since Jan. 2025. Conclusion: Systematic red teaming revealed specific vulnerabilities in generative AI copyright compliance, leading to concrete mitigation strategies. Academic medical institutions deploying generative AI should implement continuous testing protocols to ensure legal and ethical compliance.
Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory
Payne, Kenneth, Alloui-Cros, Baptiste
Are Large Language Models (LLMs) a new form of strategic intelligence, able to reason about goals in competitive settings? We present compelling supporting evidence. The Iterated Prisoner's Dilemma (IPD) has long served as a model for studying decision-making. We conduct the first ever series of evolutionary IPD tournaments, pitting canonical strategies (e.g., Tit-for-Tat, Grim Trigger) against agents from the leading frontier AI companies OpenAI, Google, and Anthropic. By varying the termination probability in each tournament (the "shadow of the future"), we introduce complexity and chance, confounding memorisation. Our results show that LLMs are highly competitive, consistently surviving and sometimes even proliferating in these complex ecosystems. Furthermore, they exhibit distinctive and persistent "strategic fingerprints": Google's Gemini models proved strategically ruthless, exploiting cooperative opponents and retaliating against defectors, while OpenAI's models remained highly cooperative, a trait that proved catastrophic in hostile environments. Anthropic's Claude emerged as the most forgiving reciprocator, showing remarkable willingness to restore cooperation even after being exploited or successfully defecting. Analysis of nearly 32,000 prose rationales provided by the models reveals that they actively reason about both the time horizon and their opponent's likely strategy, and we demonstrate that this reasoning is instrumental to their decisions. This work connects classic game theory with machine psychology, offering a rich and granular view of algorithmic decision-making under uncertainty.
Agentic Business Process Management: Practitioner Perspectives on Agent Governance in Business Processes
Vu, Hoang, Klievtsova, Nataliia, Leopold, Henrik, Rinderle-Ma, Stefanie, Kampik, Timotheus
With the rise of generative AI, industry interest in software agents is growing. Given the stochastic nature of generative AI-based agents, their effective and safe deployment in organizations requires robust governance, which can be facilitated by agentic business process management. However, given the nascence of this new-generation agent notion, it is not clear what BPM practitioners consider to be an agent, and what benefits, risks and governance challenges they associate with agent deployments. To investigate how organizations can effectively govern AI agents, we conducted a qualitative study involving semi-structured interviews with 22 BPM practitioners from diverse industries. They anticipate that agents will enhance efficiency, improve data quality, ensure better compliance, and boost scalability through automation, while also cautioning against risks such as bias, over-reliance, cybersecurity threats, job displacement, and ambiguous decision-making. To address these challenges, the study presents six key recommendations for the responsible adoption of AI agents: define clear business goals, set legal and ethical guardrails, establish human-agent collaboration, customize agent behavior, manage risks, and ensure safe integration with fallback options. Additionally, the paper outlines actions to align traditional BPM with agentic AI, including balancing human and agent roles, redefining human involvement, adapting process structures, and introducing performance metrics. These insights provide a practical foundation for integrating AI agents into business processes while preserving oversight, flexibility, and trust.
DiffusionLight-Turbo: Accelerated Light Probes for Free via Single-Pass Chrome Ball Inpainting
Chinchuthakun, Worameth, Phongthawee, Pakkapon, Raj, Amit, Jampani, Varun, Khungurn, Pramook, Suwajanakorn, Supasorn
We introduce a simple yet effective technique for estimating lighting from a single low-dynamic-range (LDR) image by reframing the task as a chrome ball inpainting problem. This approach leverages a pre-trained diffusion model, Stable Diffusion XL, to overcome the generalization failures of existing methods that rely on limited HDR panorama datasets. While conceptually simple, the task remains challenging because diffusion models often insert incorrect or inconsistent content and cannot readily generate chrome balls in HDR format. Our analysis reveals that the inpainting process is highly sensitive to the initial noise in the diffusion process, occasionally resulting in unrealistic outputs. To address this, we first introduce DiffusionLight, which uses iterative inpainting to compute a median chrome ball from multiple outputs to serve as a stable, low-frequency lighting prior that guides the generation of a high-quality final result. To generate high-dynamic-range (HDR) light probes, an Exposure LoRA is fine-tuned to create LDR images at multiple exposure values, which are then merged. While effective, DiffusionLight is time-intensive, requiring approximately 30 minutes per estimation. To reduce this overhead, we introduce DiffusionLight-Turbo, which reduces the runtime to about 30 seconds with minimal quality loss. This 60x speedup is achieved by training a Turbo LoRA to directly predict the averaged chrome balls from the iterative process. Inference is further streamlined into a single denoising pass using a LoRA swapping technique. Experimental results that show our method produces convincing light estimates across diverse settings and demonstrates superior generalization to in-the-wild scenarios. Our code is available at https://diffusionlight.github.io/turbo
AI Agents and Agentic AI-Navigating a Plethora of Concepts for Future Manufacturing
Ren, Yinwang, Liu, Yangyang, Ji, Tang, Xu, Xun
AI agents are autonomous systems designed to perceive, reason, and act within dynamic environments. With the rapid advancements in generative AI (GenAI), large language models (LLMs) and multimodal large language models (MLLMs) have significantly improved AI agents' capabilities in semantic comprehension, complex reasoning, and autonomous decision-making. At the same time, the rise of Agentic AI highlights adaptability and goal-directed autonomy in dynamic and complex environments. LLMs-based AI Agents (LLM-Agents), MLLMs-based AI Agents (MLLM-Agents), and Agentic AI contribute to expanding AI's capabilities in information processing, environmental perception, and autonomous decision-making, opening new avenues for smart manufacturing. However, the definitions, capability boundaries, and practical applications of these emerging AI paradigms in smart manufacturing remain unclear. To address this gap, this study systematically reviews the evolution of AI and AI agent technologies, examines the core concepts and technological advancements of LLM-Agents, MLLM-Agents, and Agentic AI, and explores their potential applications in and integration into manufacturing, along with the potential challenges they may face. Preprint submitted to Journal of Manufacturing System July 3, 2025 1. Introduction As a complex and data-intensive domain, manufacturing faces increasing challenges due to the increasing demand for customization, shorter product life cycles, and intense global competition [1, 2]. Traditional automated systems, reliant on fixed rules, struggle to adapt to evolving customer needs.
Can AI be Consentful?
Pistilli, Giada, Trevelin, Bruna
The evolution of generative AI systems exposes the challenges of traditional legal and ethical frameworks built around consent. This chapter examines how the conventional notion of consent, while fundamental to data protection and privacy rights, proves insufficient in addressing the implications of AI-generated content derived from personal data. Through legal and ethical analysis, we show that while individuals can consent to the initial use of their data for AI training, they cannot meaningfully consent to the numerous potential outputs their data might enable or the extent to which the output is used or distributed. We identify three fundamental challenges: the scope problem, the temporality problem, and the autonomy trap, which collectively create what we term a "consent gap" in AI systems and their surrounding ecosystem. We argue that current legal frameworks inadequately address these emerging challenges, particularly regarding individual autonomy, identity rights, and social responsibility, especially in cases where AI-generated content creates new forms of personal representation beyond the scope of the original consent. By examining how these consent limitations intersect with broader principles of responsible AI - including fairness, transparency, accountability, and autonomy - we demonstrate the need to evolve ethical and legal approaches to consent.
PathCoT: Chain-of-Thought Prompting for Zero-shot Pathology Visual Reasoning
Zhou, Junjie, Zuo, Yingli, Feng, Shichang, Wan, Peng, Zhu, Qi, Zhang, Daoqiang, Shao, Wei
With the development of generative artificial intelligence and instruction tuning techniques, multimodal large language models (MLLMs) have made impressive progress on general reasoning tasks. Benefiting from the chain-of-thought (CoT) methodology, MLLMs can solve the visual reasoning problem step-by-step. However, existing MLLMs still face significant challenges when applied to pathology visual reasoning tasks: (1) LLMs often underperforms because they lack domain-specific information, which can lead to model hallucinations. (2) The additional reasoning steps in CoT may introduce errors, leading to the divergence of answers. To address these limitations, we propose PathCoT, a novel zero-shot CoT prompting method which integrates the pathology expert-knowledge into the reasoning process of MLLMs and incorporates self-evaluation to mitigate divergence of answers. Specifically, PathCoT guides the MLLM with prior knowledge to perform as pathology experts, and provides comprehensive analysis of the image with their domain-specific knowledge. By incorporating the experts' knowledge, PathCoT can obtain the answers with CoT reasoning. Furthermore, PathCoT incorporates a self-evaluation step that assesses both the results generated directly by MLLMs and those derived through CoT, finally determining the reliable answer. The experimental results on the PathMMU dataset demonstrate the effectiveness of our method on pathology visual understanding and reasoning.
Sequential Diagnosis with Language Models
Nori, Harsha, Daswani, Mayank, Kelly, Christopher, Lundberg, Scott, Ribeiro, Marco Tulio, Wilson, Marc, Liu, Xiaoxuan, Sounderajah, Viknesh, Carlson, Jonathan, Lungren, Matthew P, Gross, Bay, Hames, Peter, Suleyman, Mustafa, King, Dominic, Horvitz, Eric
Artificial intelligence holds great promise for expanding access to expert medical knowledge and reasoning. However, most evaluations of language models rely on static vignettes and multiple-choice questions that fail to reflect the complexity and nuance of evidence-based medicine in real-world settings. In clinical practice, physicians iteratively formulate and revise diagnostic hypotheses, adapting each subsequent question and test to what they've just learned, and weigh the evolving evidence before committing to a final diagnosis. To emulate this iterative process, we introduce the Sequential Diagnosis Benchmark, which transforms 304 diagnostically challenging New England Journal of Medicine clinicopathological conference (NEJM-CPC) cases into stepwise diagnostic encounters. A physician or AI begins with a short case abstract and must iteratively request additional details from a gatekeeper model that reveals findings only when explicitly queried. Performance is assessed not just by diagnostic accuracy but also by the cost of physician visits and tests performed. We also present the MAI Diagnostic Orchestrator (MAI-DxO), a model-agnostic orchestrator that simulates a panel of physicians, proposes likely differential diagnoses and strategically selects high-value, cost-effective tests. When paired with OpenAI's o3 model, MAI-DxO achieves 80% diagnostic accuracy--four times higher than the 20% average of generalist physicians. MAI-DxO also reduces diagnostic costs by 20% compared to physicians, and 70% compared to off-the-shelf o3. When configured for maximum accuracy, MAI-DxO achieves 85.5% accuracy. These performance gains with MAI-DxO generalize across models from the OpenAI, Gemini, Claude, Grok, DeepSeek, and Llama families. We highlight how AI systems, when guided to think iteratively and act judiciously, can advance diagnostic precision and cost-effectiveness in clinical care.