Generative AI
Towards Synergistic Teacher-AI Interactions with Generative Artificial Intelligence
Cukurova, Mutlu, Suraworachet, Wannapon, Zhou, Qi, Bulathwela, Sahan
Generative artificial intelligence (GenAI) is increasingly used in education, posing significant challenges for teachers adapting to these changes. GenAI offers unprecedented opportunities for accessibility, scalability and productivity in educational tasks. However, the automation of teaching tasks through GenAI raises concerns about reduced teacher agency, potential cognitive atrophy, and the broader deprofessionalisation of teaching. Drawing findings from prior literature on AI in Education, and refining through a recent systematic literature review, this chapter presents a conceptualisation of five levels of teacher-AI teaming: transactional, situational, operational, praxical and synergistic teaming. The framework aims to capture the nuanced dynamics of teacher-AI interactions, particularly with GenAI, that may lead to the replacement, complementarity, or augmentation of teachers' competences and professional practice. GenAI technological affordances required in supporting teaming, along with empirical studies, are discussed. Drawing on empirical observations, we outline a future vision that moves beyond individual teacher agency toward collaborative decision-making between teachers and AI, in which both agents engage in negotiation, constructive challenge, and co-reasoning that enhance each other's capabilities and enable outcomes neither could realise independently. Further discussion of socio-technical factors beyond teacher-AI teaming is also included to streamline the synergy of teachers and AI in education ethically and practically.
Beyond Binary Classification: A Semi-supervised Approach to Generalized AI-generated Image Detection
Nguyen-Le, Hong-Hanh, Tran, Van-Tuan, Nguyen, Dinh-Thuc, Le-Khac, Nhien-An
The rapid advancement of generators (e.g., StyleGAN, Midjourney, DALL-E) has produced highly realistic synthetic images, posing significant challenges to digital media authenticity. These generators are typically based on a few core architectural families, primarily Generative Adversarial Networks (GANs) and Diffusion Models (DMs). A critical vulnerability in current forensics is the failure of detectors to achieve cross-generator generalization, especially when crossing architectural boundaries (e.g., from GANs to DMs). We hypothesize that this gap stems from fundamental differences in the artifacts produced by these \textbf{distinct architectures}. In this work, we provide a theoretical analysis explaining how the distinct optimization objectives of the GAN and DM architectures lead to different manifold coverage behaviors. We demonstrate that GANs permit partial coverage, often leading to boundary artifacts, while DMs enforce complete coverage, resulting in over-smoothing patterns. Motivated by this analysis, we propose the \textbf{Tri}archy \textbf{Detect}or (TriDetect), a semi-supervised approach that enhances binary classification by discovering latent architectural patterns within the "fake" class. TriDetect employs balanced cluster assignment via the Sinkhorn-Knopp algorithm and a cross-view consistency mechanism, encouraging the model to learn fundamental architectural distincts. We evaluate our approach on two standard benchmarks and three in-the-wild datasets against 13 baselines to demonstrate its generalization capability to unseen generators.
CGCE: Classifier-Guided Concept Erasure in Generative Models
Nguyen, Viet, Patel, Vishal M.
Recent advancements in large-scale generative models have enabled the creation of high-quality images and videos, but have also raised significant safety concerns regarding the generation of unsafe content. To mitigate this, concept erasure methods have been developed to remove undesirable concepts from pre-trained models. However, existing methods remain vulnerable to adversarial attacks that can regenerate the erased content. Moreover, achieving robust erasure often degrades the model's generative quality for safe, unrelated concepts, creating a difficult trade-off between safety and performance. To address this challenge, we introduce Classifier-Guided Concept Erasure (CGCE), an efficient plug-and-play framework that provides robust concept erasure for diverse generative models without altering their original weights. CGCE uses a lightweight classifier operating on text embeddings to first detect and then refine prompts containing undesired concepts. This approach is highly scalable, allowing for multi-concept erasure by aggregating guidance from several classifiers. By modifying only unsafe embeddings at inference time, our method prevents harmful content generation while preserving the model's original quality on benign prompts. Extensive experiments show that CGCE achieves state-of-the-art robustness against a wide range of red-teaming attacks. Our approach also maintains high generative utility, demonstrating a superior balance between safety and performance. We showcase the versatility of CGCE through its successful application to various modern T2I and T2V models, establishing it as a practical and effective solution for safe generative AI.
Interpretable Robot Control via Structured Behavior Trees and Large Language Models
Chekam, Ingrid Maรฉva, Pastor-Martinez, Ines, Tourani, Ali, Millan-Romera, Jose Andres, Ribeiro, Laura, Soares, Pedro Miguel Bastos, Voos, Holger, Sanchez-Lopez, Jose Luis
With the increasing presence of intelligent robots in everyday life, the demand for reliable and straightforward Human-Robot Interaction (HRI) interfaces is rapidly rising. Traditional robot control paradigms require users to learn particular commands [1] or interact with the robots through rigid user interfaces, especially in unstructured environments [2]. However, recent works target more flexible and adaptive communication strategies, unlocking the full potential of autonomous agents in human-centered environments. Accordingly, advances in generative AI and Large Language Models (LLMs) reveal new opportunities for enabling seamless communication between humans and robots, where natural language is the primary means of communication [3]. Such models are powerful enough to comprehend given instructions and even "reason" about the demanded tasks, intentions, and environmental context [4]. When paired with robotic perception and control systems, LLMs enable users to intuitively instruct the robot to perform complex tasks such as following multiple objects [5], navigating through dynamic scenes [6], or interacting with specific items [7], all using natural dialogue. Furthermore, integrating multimodal capabilities, including vision and speech, enhances HRI by enabling more natural, context-aware communication and improving adaptability across tasks and environments [8].
From Generation to Detection: A Multimodal Multi-Task Dataset for Benchmarking Health Misinformation
Zhang, Zhihao, Zhang, Yiran, Zhou, Xiyue, Huang, Liting, Razzak, Imran, Nakov, Preslav, Naseem, Usman
Infodemics and health misinformation have significant negative impact on individuals and society, exacerbating confusion and increasing hesitancy in adopting recommended health measures. Recent advancements in generative AI, capable of producing realistic, human like text and images, have significantly accelerated the spread and expanded the reach of health misinformation, resulting in an alarming surge in its dissemination. To combat the infodemics, most existing work has focused on developing misinformation datasets from social media and fact checking platforms, but has faced limitations in topical coverage, inclusion of AI generation, and accessibility of raw content. To address these issues, we present MM Health, a large scale multimodal misinformation dataset in the health domain consisting of 34,746 news article encompassing both textual and visual information. MM Health includes human-generated multimodal information (5,776 articles) and AI generated multimodal information (28,880 articles) from various SOTA generative AI models. Additionally, We benchmarked our dataset against three tasks (reliability checks, originality checks, and fine-grained AI detection) demonstrating that existing SOTA models struggle to accurately distinguish the reliability and origin of information. Our dataset aims to support the development of misinformation detection across various health scenarios, facilitating the detection of human and machine generated content at multimodal levels.
WIRED Roundup: Gemini 3 Release, Nvidia Earnings, Epstein Files Fallout
In this episode of we cover the news of the week and take a closer look at the Gemini 3, Google's latest AI model and chatbot. In today's episode, host Zoรซ Schiffer is joined by senior writer Max Zeff to discuss five stories you need to know about this week--from the political fallout after the release of the Epstein files, to why two young Mormon men created an app to help men stop "gooning." Then, we dive into Gemini 3's release and how companies like Google and OpenAI are honing in on AI profitability. Please help us improve by filling out our listener survey . Write to us at uncannyvalley@wired.com . You can always listen to this week's podcast through the audio player on this page, but if you want to subscribe for free to get every episode, here's how: If you're on an iPhone or iPad, open the app called Podcasts, or just tap this link . Today on the show we're bringing you five stories that you need to know about this week, including how companies like Google and OpenAI are honing in on profitability as they develop their AI consumer-facing products. I'm joined today by WIRED's Senior Writer Max Zeff. It's great to be here.
Learning to Reason: Training LLMs with GPT-OSS or DeepSeek R1 Reasoning Traces
Shmidman, Shaltiel, Fredman, Asher, Sudakov, Oleg, Bendris, Meriem
Test-time scaling, which leverages additional computation during inference to improve model accuracy, has enabled a new class of Large Language Models (LLMs) that are able to reason through complex problems by understanding the goal, turning this goal into a plan, working through intermediate steps, and checking their own work before answering . Frontier large language models with reasoning capabilities, such as DeepSeek-R1 and OpenAI's gpt-oss, follow the same procedure when solving complex problems by generating intermediate reasoning traces before giving the final answer. Today, these models are being increasingly used to generate reasoning traces that serve as high-quality supervised data for post-training of small and medium-sized language models to teach reasoning capabilities without requiring expensive human curation. In this work, we compare the performance of medium-sized LLMs on Math problems after post-training on two kinds of reasoning traces. We compare the impact of reasoning traces generated by DeepSeek-R1 and gpt-oss LLMs in terms of accuracy and inference efficiency.
AnimAgents: Coordinating Multi-Stage Animation Pre-Production with Human-Multi-Agent Collaboration
Wang, Wen-Fan, Lu, Chien-Ting, Ng, Jin Ping, Chiu, Yi-Ting, Lee, Ting-Ying, Wang, Miaosen, Chen, Bing-Yu, Chen, Xiang 'Anthony'
Animation pre-production lays the foundation of an animated film by transforming initial concepts into a coherent blueprint across interdependent stages such as ideation, scripting, design, and storyboarding. While generative AI tools are increasingly adopted in this process, they remain isolated, requiring creators to juggle multiple systems without integrated workflow support. Our formative study with 12 professional creative directors and independent animators revealed key challenges in their current practice: Creators must manually coordinate fragmented outputs, manage large volumes of information, and struggle to maintain continuity and creative control between stages. Based on the insights, we present AnimAgents, a human-multi-agent collaborative system that coordinates complex, multi-stage workflows through a core agent and specialized agents, supported by dedicated boards for the four major stages of pre-production. AnimAgents enables stage-aware orchestration, stage-specific output management, and element-level refinement, providing an end-to-end workflow tailored to professional practice. In a within-subjects summative study with 16 professional creators, AnimAgents significantly outperformed a strong single-agent baseline that equipped with advanced parallel image generation in coordination, consistency, information management, and overall satisfaction (p < .01). A field deployment with 4 creators further demonstrated AnimAgents' effectiveness in real-world projects.
Training Emergent Joint Associations: A Reinforcement Learning Approach to Creative Thinking in Language Models
Singh, Mukul, Singha, Ananya, Parab, Aishni, Mehrotra, Pronita, Gulwani, Sumit
Associative thinking--the ability to connect seemingly unrelated ideas--is a foundational element of human creativity and problem-solving. This paper explores whether reinforcement learning (RL) guided by associative thinking principles can enhance a model's performance across diverse generative tasks, including story writing, code generation, and chart creation. We introduce a reinforcement learning framework that uses a prompt-based evaluation mechanism, incorporating established divergent thinking metrics from creativity research. A base language model is fine-tuned using this framework to reward outputs demonstrating higher novelty through higher degrees of conceptual connectivity. Interestingly, the experimental results suggest that RL-based associative thinking-trained models not only generate more original and coherent stories but also exhibit improved abstraction and flexibility in tasks such as programming and data visualization. Our findings provide initial evidence that modeling cognitive creativity principles through reinforcement learning can yield more adaptive and generative AI.
Text to Robotic Assembly of Multi Component Objects using 3D Generative AI and Vision Language Models
Kyaw, Alexander Htet, Gupta, Richa, Shah, Dhruv, Sinha, Anoop, Mathewson, Kory, Pender, Stefanie, Chitta, Sachin, Koga, Yotto, Ahmed, Faez, Sass, Lawrence, Davis, Randall
Advances in 3D generative AI have enabled the creation of physical objects from text prompts, but challenges remain in creating objects involving multiple component types. We present a pipeline that integrates 3D generative AI with vision-language models (VLMs) to enable the robotic assembly of multi-component objects from natural language. Our method leverages VLMs for zero-shot, multi-modal reasoning about geometry and functionality to decompose AI-generated meshes into multi-component 3D models using predefined structural and panel components. We demonstrate that a VLM is capable of determining which mesh regions need panel components in addition to structural components, based on the object's geometry and functionality. Evaluation across test objects shows that users preferred the VLM-generated assignments 90.6% of the time, compared to 59.4% for rule-based and 2.5% for random assignment. Lastly, the system allows users to refine component assignments through conversational feedback, enabling greater human control and agency in making physical objects with generative AI and robotics.