Generative AI
A Practical Synthesis of Detecting AI-Generated Textual, Visual, and Audio Content
Advances in AI-generated content have led to wide adoption of large language models, diffusion-based visual generators, and synthetic audio tools. However, these developments raise critical concerns about misinformation, copyright infringement, security threats, and the erosion of public trust. In this paper, we explore an extensive range of methods designed to detect and mitigate AI-generated textual, visual, and audio content. We begin by discussing motivations and potential impacts associated with AI-based content generation, including real-world risks and ethical dilemmas. We then outline detection techniques spanning observation-based strategies, linguistic and statistical analysis, model-based pipelines, watermarking and fingerprinting, as well as emergent ensemble approaches. We also present new perspectives on robustness, adaptation to rapidly improving generative architectures, and the critical role of human-in-the-loop verification. By surveying state-of-the-art research and highlighting case studies in academic, journalistic, legal, and industrial contexts, this paper aims to inform robust solutions and policymaking. We conclude by discussing open challenges, including adversarial transformations, domain generalization, and ethical concerns, thereby offering a holistic guide for researchers, practitioners, and regulators to preserve content authenticity in the face of increasingly sophisticated AI-generated media.
OpenAI Is Preparing to Launch a Social App for AI-Generated Videos
The platform appears to closely resemble TikTok and is powered by Sora 2, OpenAI's latest video generation model. OpenAI is preparing to launch a stand-alone app for its video generation AI model Sora 2, WIRED has learned. The app, which features a vertical video feed with swipe-to-scroll navigation, appears to closely resemble TikTok--except all of the content is AI-generated . There's a For You-style page powered by a recommendation algorithm. On the right side of the feed, a menu bar gives users the option to like, comment, or remix a video.
The Download: AI to detect child abuse images, and what to expect from our 2025 Climate Tech Companies to Watch list
Plus: OpenAI's parental controls have come into force Generative AI has enabled the production of child sexual abuse images to skyrocket. Now the leading investigator of child exploitation in the US is experimenting with using AI to distinguish AI-generated images from material depicting real victims, according to a new government filing. The Department of Homeland Security's Cyber Crimes Center, which investigates child exploitation across international borders, has awarded a $150,000 contract to San Francisco-based Hive AI for its software, which can identify whether a piece of content was AI-generated. The need to cut emissions and adapt to our warming world is growing more urgent. This year, we've seen temperatures reach record highs, as they have nearly every year for the last decade. Climate-fueled natural disasters are affecting communities around the world, costing billions of dollars.
OpenAI Adds Parental Safety Controls for Teen ChatGPT Users. Here's What to Expect
OpenAI Adds Parental Safety Controls for Teen ChatGPT Users. OpenAI's review process for teenage ChatGPT users who are flagged for suicidal ideation includes human moderators. Parents can expect an alert about alarming prompts within hours. Starting today, OpenAI is rolling out ChatGPT safety tools intended for parents to use with their teenagers. This worldwide update includes the ability for parents, as well as law enforcement, to receive notifications if a child--in this case, users between the ages of 13 and 18--engages in chatbot conversations about self harm or suicide.
Design and Implementation of a Secure RAG-Enhanced AI Chatbot for Smart Tourism Customer Service: Defending Against Prompt Injection Attacks -- A Case Study of Hsinchu, Taiwan
As smart tourism evolves, AI-powered chatbots have become indispensable for delivering personalized, real-time assistance to travelers while promoting sustainability and efficiency. However, these systems are increasingly vulnerable to prompt injection attacks, where adversaries manipulate inputs to elicit unintended behaviors such as leaking sensitive information or generating harmful content. This paper presents a case study on the design and implementation of a secure retrieval-augmented generation (RAG) chatbot for Hsinchu smart tourism services. The system integrates RAG with API function calls, multi-layered linguistic analysis, and guardrails against injections, achieving high contextual awareness and security. Key features include a tiered response strategy, RAG-driven knowledge grounding, and intent decomposition across lexical, semantic, and pragmatic levels. Defense mechanisms include system norms, gatekeepers for intent judgment, and reverse RAG text to prioritize verified data. We also benchmark a GPT-5 variant (released 2025-08-07) to assess inherent robustness. Evaluations with 674 adversarial prompts and 223 benign queries show over 95% accuracy on benign tasks and substantial detection of injection attacks. GPT-5 blocked about 85% of attacks, showing progress yet highlighting the need for layered defenses. Findings emphasize contributions to sustainable tourism, multilingual accessibility, and ethical AI deployment. This work offers a practical framework for deploying secure chatbots in smart tourism and contributes to resilient, trustworthy AI applications.
"She was useful, but a bit too optimistic": Augmenting Design with Interactive Virtual Personas
Deep, Paluck, Bharadhidasan, Monica, Kocaballi, A. Baki
Personas have been widely used to understand and communicate user needs in human-centred design. Despite their utility, they may fail to meet the demands of iterative workflows due to their static nature, limited engagement, and inability to adapt to evolving design needs. Recent advances in large language models (LLMs) pave the way for more engaging and adaptive approaches to user representation. This paper introduces Interactive Virtual Personas (IVPs): multimodal, LLM-driven, conversational user simulations that designers can interview, brainstorm with, and gather feedback from in real time via voice interface. We conducted a qualitative study with eight professional UX designers, employing an IVP named "Alice" across three design activities: user research, ideation, and prototype evaluation. Our findings demonstrate the potential of IVPs to expedite information gathering, inspire design solutions, and provide rapid user-like feedback. However, designers raised concerns about biases, over-optimism, the challenge of ensuring authenticity without real stakeholder input, and the inability of the IVP to fully replicate the nuances of human interaction. Our participants emphasised that IVPs should be viewed as a complement to, not a replacement for, real user engagement. We discuss strategies for prompt engineering, human-in-the-loop integration, and ethical considerations for effective and responsible IVP use in design. Finally, our work contributes to the growing body of research on generative AI in the design process by providing insights into UX designers' experiences of LLM-powered interactive personas.
Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders
Wang, Yizhou, Mao, Song, Chen, Yang, Shen, Yufan, Yan, Yinqiao, Cai, Pinlong, Wang, Ding, Yan, Guohang, Yu, Zhi, Hu, Xuming, Shi, Botian
Recent multimodal large language models (MLLMs) increasingly integrate multiple vision encoders to improve performance on various benchmarks, assuming that diverse pretraining objectives yield complementary visual signals. However, we show this assumption often fails in practice. Through systematic encoder masking across representative multi-encoder MLLMs, we find that performance typically degrades gracefully--and sometimes even improves--when selected encoders are masked, revealing pervasive encoder redundancy. To quantify this effect, we introduce two principled metrics: the Conditional Utilization Rate (CUR), which measures an encoder's marginal contribution in the presence of others, and the Information Gap (IG), which captures heterogeneity in encoder utility within a model. Using these tools, we observe: (i) strong specialization on tasks like OCR & Chart, where a single encoder can dominate with a CUR > 90%, (ii) high redundancy on general VQA and knowledge-based tasks, where encoders are largely interchangeable, (iii) instances of detrimental encoders with negative CUR. Notably, masking specific encoders can yield up to 16% higher accuracy on a specific task category and 3.6% overall performance boost compared to the full model. Furthermore, single-and dual-encoder variants recover over 90% of baseline on most non-OCR tasks. Our analysis challenges the "more encoders are better" heuristic in MLLMs and provides actionable diagnostics for developing more efficient and effective multimodal architectures. Multimodal large language models (MLLMs) have marked a major leap in artificial intelligence (AI), exhibiting remarkable prowess in integrating visual and textual information for complex generation and reasoning tasks (OpenAI, 2025a; DeepMind, 2025; Anthropic, 2024; Bai et al., 2025; Zhu et al., 2025). Their ability to interpret images (Luo et al., 2024), answer visual questions (Zhu et al., 2025; Li et al., 2025a), and perform visual reasoning (OpenAI, 2025b; Peng et al., 2025) has positioned them at the forefront of AI research.
LocationReasoner: Evaluating LLMs on Real-World Site Selection Reasoning
Koda, Miho, Zheng, Yu, Ma, Ruixian, Sun, Mingyang, Pansare, Devesh, Duarte, Fabio, Santi, Paolo
Recent advances in large language models (LLMs), particularly those enhanced through reinforced post-training, have demonstrated impressive reasoning capabilities, as exemplified by models such as OpenAI o1 and DeepSeek-R1. However, these capabilities are predominantly benchmarked on domains like mathematical problem solving and code generation, leaving open the question of whether such reasoning skills generalize to complex real-world scenarios. In this paper, we introduce LocationReasoner, a benchmark designed to evaluate LLMs' reasoning abilities in the context of real-world site selection, where models must identify feasible locations by reasoning over diverse and complicated spatial, environmental, and logistic constraints. The benchmark covers carefully crafted queries of varying difficulty levels and is supported by a sandbox environment with in-house tools for constraint-based location search. Automated verification further guarantees the scalability of the benchmark, enabling the addition of arbitrary number of queries. Extensive evaluations on real-world site selection data from Boston, New York, and Tampa reveal that state-of-the-art reasoning models offer limited improvement over their non-reasoning predecessors in real-world contexts, with even the latest OpenAI o4 model failing on 30% of site selection tasks. Moreover, agentic strategies such as ReAct and Reflexion often suffer from over-reasoning, leading to worse outcomes than direct prompting. With key limitations of LLMs in holistic and non-linear reasoning highlighted, we release LocationReasoner to foster the development of LLMs and agents capable of robust, grounded reasoning in real-world decision-making tasks. Codes and data for our benchmark are available at https://github.com/miho-koda/LocationReasoner.
Exploring utilization of generative AI for research and education in data-driven materials science
Misawa, Takahiro, Koizumi, Ai, Tamura, Ryo, Yoshimi, Kazuyoshi
Generative AI has recently had a profound impact on various fields, including daily life, research, and education. To explore its efficient utilization in data-driven materials science, we organized a hackathon -- AIMHack2024 -- in July 2024. In this hackathon, researchers from fields such as materials science, information science, bioinformatics, and condensed matter physics worked together to explore how generative AI can facilitate research and education. Based on the results of the hackathon, this paper presents topics related to (1) conducting AI-assisted software trials, (2) building AI tutors for software, and (3) developing GUI applications for software. While generative AI continues to evolve rapidly, this paper provides an early record of its application in data-driven materials science and highlights strategies for integrating AI into research and education.