Generative AI
Generative AI for Character Animation: A Comprehensive Survey of Techniques, Applications, and Future Directions
Abootorabi, Mohammad Mahdi, Ghahroodi, Omid, Zahraei, Pardis Sadat, Behzadasl, Hossein, Mirrokni, Alireza, Salimipanah, Mobina, Rasouli, Arash, Behzadipour, Bahar, Azarnoush, Sara, Maleki, Benyamin, Sadraiye, Erfan, Feriz, Kiarash Kiani, Nahad, Mahdi Teymouri, Moghadasi, Ali, Abianeh, Abolfazl Eshagh, Nazar, Nizi, Rabiee, Hamid R., Baghshah, Mahdieh Soleymani, Ahmadi, Meisam, Asgari, Ehsaneddin
Generative AI is reshaping art, gaming, and most notably animation. Recent breakthroughs in foundation and diffusion models have reduced the time and cost of producing animated content. Characters are central animation components, involving motion, emotions, gestures, and facial expressions. The pace and breadth of advances in recent months make it difficult to maintain a coherent view of the field, motivating the need for an integrative review. Unlike earlier overviews that treat avatars, gestures, or facial animation in isolation, this survey offers a single, comprehensive perspective on all the main generative AI applications for character animation. We begin by examining the state-of-the-art in facial animation, expression rendering, image synthesis, avatar creation, gesture modeling, motion synthesis, object generation, and texture synthesis. We highlight leading research, practical deployments, commonly used datasets, and emerging trends for each area. To support newcomers, we also provide a comprehensive background section that introduces foundational models and evaluation metrics, equipping readers with the knowledge needed to enter the field. We discuss open challenges and map future research directions, providing a roadmap to advance AI-driven character-animation technologies. This survey is intended as a resource for researchers and developers entering the field of generative AI animation or adjacent fields. Resources are available at: https://github.com/llm-lab-org/Generative-AI-for-Character-Animation-Survey.
Generative AI in Embodied Systems: System-Level Analysis of Performance, Efficiency and Scalability
Wan, Zishen, Qian, Jiayi, Du, Yuhang, Jabbour, Jason, Du, Yilun, Zhao, Yang Katie, Raychowdhury, Arijit, Krishna, Tushar, Reddi, Vijay Janapa
Embodied systems, where generative autonomous agents engage with the physical world through integrated perception, cognition, action, and advanced reasoning powered by large language models (LLMs), hold immense potential for addressing complex, long-horizon, multi-objective tasks in real-world environments. However, deploying these systems remains challenging due to prolonged runtime latency, limited scalability, and heightened sensitivity, leading to significant system inefficiencies. In this paper, we aim to understand the workload characteristics of embodied agent systems and explore optimization solutions. We systematically categorize these systems into four paradigms and conduct benchmarking studies to evaluate their task performance and system efficiency across various modules, agent scales, and embodied tasks. Our benchmarking studies uncover critical challenges, such as prolonged planning and communication latency, redundant agent interactions, complex low-level control mechanisms, memory inconsistencies, exploding prompt lengths, sensitivity to self-correction and execution, sharp declines in success rates, and reduced collaboration efficiency as agent numbers increase. Leveraging these profiling insights, we suggest system optimization strategies to improve the performance, efficiency, and scalability of embodied agents across different paradigms. This paper presents the first system-level analysis of embodied AI agents, and explores opportunities for advancing future embodied system design.
Generative to Agentic AI: Survey, Conceptualization, and Challenges
Agentic Artificial Intelligence (AI) builds upon Generative AI (GenAI). It constitutes the next major step in the evolution of AI with much stronger reasoning and interaction capabilities that enable more autonomous behavior to tackle complex tasks. Since the initial release of ChatGPT (3.5), Generative AI has seen widespread adoption, giving users firsthand experience. However, the distinction between Agentic AI and GenAI remains less well understood. To address this gap, our survey is structured in two parts. In the first part, we compare GenAI and Agentic AI using existing literature, discussing their key characteristics, how Agentic AI remedies limitations of GenAI, and the major steps in GenAI's evolution toward Agentic AI. This section is intended for a broad audience, including academics in both social sciences and engineering, as well as industry professionals. It provides the necessary insights to comprehend novel applications that are possible with Agentic AI but not with GenAI. In the second part, we deep dive into novel aspects of Agentic AI, including recent developments and practical concerns such as defining agents. Finally, we discuss several challenges that could serve as a future research agenda, while cautioning against risks that can emerge when exceeding human intelligence.
Proof-of-TBI -- Fine-Tuned Vision Language Model Consortium and OpenAI-o3 Reasoning LLM-Based Medical Diagnosis Support System for Mild Traumatic Brain Injury (TBI) Prediction
Gore, Ross, Bandara, Eranga, Shetty, Sachin, Musto, Alberto E., Rana, Pratip, Valencia-Romero, Ambrosio, Rhea, Christopher, Tayebi, Lobat, Richter, Heather, Yarlagadda, Atmaram, Edmonds, Donna, Wallace, Steven, Broshek, Donna
Mild Traumatic Brain Injury (TBI) detection presents significant challenges due to the subtle and often ambiguous presentation of symptoms in medical imaging, making accurate diagnosis a complex task. To address these challenges, we propose Proof-of-TBI, a medical diagnosis support system that integrates multiple fine-tuned vision-language models with the OpenAI-o3 reasoning large language model (LLM). Our approach fine-tunes multiple vision-language models using a labeled dataset of TBI MRI scans, training them to diagnose TBI symptoms effectively. The predictions from these models are aggregated through a consensus-based decision-making process. The system evaluates the predictions from all fine-tuned vision language models using the OpenAI-o3 reasoning LLM, a model that has demonstrated remarkable reasoning performance, to produce the most accurate final diagnosis. The LLM Agents orchestrates interactions between the vision-language models and the reasoning LLM, managing the final decision-making process with transparency, reliability, and automation. This end-to-end decision-making workflow combines the vision-language model consortium with the OpenAI-o3 reasoning LLM, enabled by custom prompt engineering by the LLM agents. The prototype for the proposed platform was developed in collaboration with the U.S. Army Medical Research team in Newport News, Virginia, incorporating five fine-tuned vision-language models. The results demonstrate the transformative potential of combining fine-tuned vision-language model inputs with the OpenAI-o3 reasoning LLM to create a robust, secure, and highly accurate diagnostic system for mild TBI prediction. To the best of our knowledge, this research represents the first application of fine-tuned vision-language models integrated with a reasoning LLM for TBI prediction tasks.
Toward Personalizing Quantum Computing Education: An Evolutionary LLM-Powered Approach
Elhaimeur, Iizalaarab, Chrisochoides, Nikos
--Quantum computing education faces significant challenges due to its complexity and the limitations of current tools; this paper introduces a novel Intelligent T eaching Assistant for quantum computing education and details its evolutionary design process. The system combines a knowledge-graph-augmented architecture with two specialized Large Language Model (LLM) agents: a T eaching Agent for dynamic interaction, and a Lesson Planning Agent for lesson plan generation. The system is designed to adapt to individual student needs, with interactions meticulously tracked and stored in a knowledge graph. This graph represents student actions, learning resources, and relationships, aiming to enable reasoning about effective learning pathways. We describe the implementation of the system, highlighting the challenges encountered and the solutions implemented, including introducing a dual-agent architecture where tasks are separated, all coordinated through a central knowledge graph that maintains system awareness, and a user-facing tag system intended to mitigate LLM hallucination and improve user control. Preliminary results illustrate the system's potential to capture rich interaction data, dynamically adapt lesson plans based on student feedback via a tag system in simulation, and facilitate context-aware tutoring through the integrated knowledge graph, though systematic evaluation is required. Quantum computing offers a revolutionary paradigm shift, but a significant workforce gap hinders its progress [1]. Teaching quantum computing is uniquely challenging, demanding an interdisciplinary understanding of physics, computer science, and mathematics, compounded by the counterintuitive nature of quantum principles. Traditional teaching methods and tools often fail, one of the many reasons is students' diverse background [2]. On the other hand, novel methods and tools based on generative artificial intelligence are still unproven in terms of successful teaching practices and quantifiable results.
BELL: Benchmarking the Explainability of Large Language Models
Ahmed, Syed Quiser, Ganesh, Bharathi Vokkaliga, P, Jagadish Babu, Selvaraj, Karthick, Devi, ReddySiva Naga Parvathi, Kappala, Sravya
Large language models have revolutionized natural language processing and generative Artificial Intelligence (AI), as shown by numerous foundational studies [1]. These models ' exceptional capabilities have attracted significant attention, enabling a wide range of applications. LLMs are utilized for tasks such as translation [2], content generation, content summarization, article writing [3], as well as enhancing search function s (Bing Chat [4]) etc., The impact of LLMs extends to fields like software develo pment, with models like Code Llama [5] aiding engineers . Their applications also span finance sector [6], scientific research [7] [8], including areas such as arts [9], education [10], oceanography [11], law [12], political science [13], medicine [14] [15], showcasing their broad and diverse influence. However, t he exponential rise in use of LLMs also brings challenges related to their explainability and interpretability.
T2VShield: Model-Agnostic Jailbreak Defense for Text-to-Video Models
Liang, Siyuan, Liu, Jiayang, Zhai, Jiecheng, Fang, Tianmeng, Tu, Rongcheng, Liu, Aishan, Cao, Xiaochun, Tao, Dacheng
The rapid development of generative artificial intelligence has made text to video models essential for building future multimodal world simulators. However, these models remain vulnerable to jailbreak attacks, where specially crafted prompts bypass safety mechanisms and lead to the generation of harmful or unsafe content. Such vulnerabilities undermine the reliability and security of simulation based applications. In this paper, we propose T2VShield, a comprehensive and model agnostic defense framework designed to protect text to video models from jailbreak threats. Our method systematically analyzes the input, model, and output stages to identify the limitations of existing defenses, including semantic ambiguities in prompts, difficulties in detecting malicious content in dynamic video outputs, and inflexible model centric mitigation strategies. T2VShield introduces a prompt rewriting mechanism based on reasoning and multimodal retrieval to sanitize malicious inputs, along with a multi scope detection module that captures local and global inconsistencies across time and modalities. The framework does not require access to internal model parameters and works with both open and closed source systems. Extensive experiments on five platforms show that T2VShield can reduce jailbreak success rates by up to 35 percent compared to strong baselines. We further develop a human centered audiovisual evaluation protocol to assess perceptual safety, emphasizing the importance of visual level defense in enhancing the trustworthiness of next generation multimodal simulators.
OpenAI Adds Shopping to ChatGPT
OpenAI announced today that users will soon be able to buy products through ChatGPT. The rollout of shopping buttons for AI-powered search queries will come to everyone, whether they are a signed-in user or not. Shoppers will not be able to check out inside of ChatGPT; instead they will be redirected to the merchant's website to finish the transaction. In a prelaunch demo for WIRED, Adam Fry, the ChatGPT search product lead at OpenAI, demonstrated how the updated user experience could be used to help people using the tool for product research decide which espresso machine or office chair to buy. The product recommendations shown to prospective shoppers are based on what ChatGPT remembers about a user's preferences as well as product reviews pulled from across the web.
LLMpatronous: Harnessing the Power of LLMs For Vulnerability Detection
Despite the transformative impact of Artificial Intelligence (AI) across various sectors, cyber security continues to rely on traditional static and dynamic analysis tools, hampered by high false positive rates and superficial code comprehension. While generative AI offers promising automation capabilities for software development, leveraging Large Language Models (LLMs) for vulnerability detection presents unique challenges. This paper explores the potential and limitations of LLMs in identifying vulnerabilities, acknowledging inherent weaknesses such as hallucinations, limited context length, and knowledge cut-offs. Previous attempts employing machine learning models for vulnerability detection have proven ineffective due to limited real-world applicability, feature engineering challenges, lack of contextual understanding, and the complexities of training models to keep pace with the evolving threat landscape. Therefore, we propose a robust AI-driven approach focused on mitigating these limitations and ensuring the quality and reliability of LLM based vulnerability detection. Through innovative methodologies combining Retrieval-Augmented Generation (RAG) and Mixtureof-Agents (MoA), this research seeks to leverage the strengths of LLMs while addressing their weaknesses, ultimately paving the way for dependable and efficient AI-powered solutions in securing the ever-evolving software landscape.
Paradigm shift on Coding Productivity Using GenAI
Generative AI (GenAI) applications are transforming software engineering by enabling automated code co-creation. However, empirical evidence on GenAI's productivity effects in industrial settings remains limited. This paper investigates the adoption of GenAI coding assistants (e.g., Codeium, Amazon Q) within telecommunications and FinTech domains. Through surveys and interviews with industrial domain-experts, we identify primary productivity-influencing factors, including task complexity, coding skills, domain knowledge, and GenAI integration. Our findings indicate that GenAI tools enhance productivity in routine coding tasks (e.g., refactoring and Javadoc generation) but face challenges in complex, domain-specific activities due to limited context-awareness of codebases and insufficient support for customized design rules. We highlight new paradigms for coding transfer, emphasizing iterative prompt refinement, immersive development environment, and automated code evaluation as essential for effective GenAI usage.