AITopics | Animation

Collaborating Authors

Animation

News Overviews Instructional Materials AI-Alerts Classics

The Gleeful Cruelty of the White House X Account

The Atlantic - TechnologyMar-28-2025, 22:03:50 GMT

On March 18, the official White House account on X posted two photographs of Virginia Basora-Gonzalez, a woman who was arrested earlier this month by U.S. Immigration and Customs Enforcement. The post described her as a "previously deported alien felon convicted of fentanyl trafficking," and celebrated her capture as a win for the administration. In one photograph, Basora-Gonzalez is shown handcuffed and weeping in a public parking lot. The White House account posted about Basora-Gonzalez again yesterday--this time, rendering her capture in the animated style of the beloved Japanese filmmaker Hayao Miyazaki, who co-founded the animation company Studio Ghibli. Presumably, whoever runs the account had used ChatGPT, which has been going viral this week for an update to its advanced "4o" model that enables it to transform photographs in the style of popular art, among other things.

video, white house, white house account, (14 more...)

The Atlantic - Technology

Country: North America > United States > Virginia (0.25)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Government > Immigration & Customs (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.56)
Information Technology > Graphics > Animation (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation

Neural Information Processing SystemsMar-27-2025, 10:18:03 GMT

Human image animation aims to generate a human motion video from the inputs of a reference human image and a target motion video. Current diffusion-based image animation systems exhibit high precision in transferring human identity into targeted motion, yet they still exhibit irregular quality in their outputs. Their optimal precision is achieved only when the physical compositions (i.e., scale and rotation) of the human shapes in the reference image and target pose frame are aligned. In the absence of such alignment, there is a noticeable decline in fidelity and consistency. Especially, in real-world environments, this compositional misalignment commonly occurs, posing significant challenges to the practical usage of current systems. To this end, we propose Test-time Procrustes Calibration (TPC), which enhances the robustness of diffusion-based image animation systems by maintaining optimal performance even when faced with compositional misalignment, effectively addressing real-world scenarios. The TPC provides a calibrated reference image for the diffusion model, enhancing its capability to understand the correspondence between human shapes in the reference and target images. Our method is simple and can be applied to any diffusion-based image animation system in a model-agnostic manner, improving the effectiveness at test time without additional training.

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Graphics > Animation (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(2 more...)

Add feedback

Implicit Warping for Animation with Image Sets

Neural Information Processing SystemsMar-27-2025, 06:13:04 GMT

We present a new implicit warping framework for image animation using sets of source images through the transfer of the motion of a driving video. A single crossmodal attention layer is used to find correspondences between the source images and the driving image, choose the most appropriate features from different source images, and warp the selected features. This is in contrast to the existing methods that use explicit flow-based warping, which is designed for animation using a single source and does not extend well to multiple sources. The pick-and-choose capability of our framework helps it achieve state-of-the-art results on multiple datasets for image animation using both single and multiple source images.

artificial intelligence, machine learning, source image, (19 more...)

Neural Information Processing Systems

Genre: Research Report (0.68)

Industry: Information Technology (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Graphics > Animation (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

AnimeRun: 2D Animation Visual Correspondence from Open Source 3D Movies

Neural Information Processing SystemsMar-26-2025, 10:14:35 GMT

Existing correspondence datasets for two-dimensional (2D) cartoon suffer from simple frame composition and monotonic movements, making them insufficient to simulate real animations. In this work, we present a new 2D animation visual correspondence dataset, AnimeRun, by converting open source three-dimensional (3D) movies to full scenes in 2D style, including simultaneous moving background and interactions of multiple subjects. Our analyses show that the proposed dataset not only resembles real anime more in image composition, but also possesses richer and more complex motion patterns compared to existing datasets. With this dataset, we establish a comprehensive benchmark by evaluating several existing optical flow and segment matching methods, and analyze shortcomings of these methods on animation data.

animerun, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Industry: Media > Film (1.00)

Technology:

Information Technology > Graphics > Animation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Vision (0.72)

Add feedback

DiffusionTalker: Efficient and Compact Speech-Driven 3D Talking Head via Personalizer-Guided Distillation

Chen, Peng, Wei, Xiaobao, Lu, Ming, Chen, Hui, Tian, Feng

arXiv.org Artificial IntelligenceMar-23-2025

Real-time speech-driven 3D facial animation has been attractive in academia and industry. Traditional methods mainly focus on learning a deterministic mapping from speech to animation. Recent approaches start to consider the nondeterministic fact of speech-driven 3D face animation and employ the diffusion model for the task. Existing diffusion-based methods can improve the diversity of facial animation. However, personalized speaking styles conveying accurate lip language is still lacking, besides, efficiency and compactness still need to be improved. In this work, we propose DiffusionTalker to address the above limitations via personalizer-guided distillation. In terms of personalization, we introduce a contrastive personalizer that learns identity and emotion embeddings to capture speaking styles from audio. We further propose a personalizer enhancer during distillation to enhance the influence of embeddings on facial animation. For efficiency, we use iterative distillation to reduce the steps required for animation generation and achieve more than 8x speedup in inference. To achieve compactness, we distill the large teacher model into a smaller student model, reducing our model's storage by 86.4\% while minimizing performance loss. After distillation, users can derive their identity and emotion embeddings from audio to quickly create personalized animations that reflect specific speaking styles. Extensive experiments are conducted to demonstrate that our method outperforms state-of-the-art methods. The code will be released at: https://github.com/ChenVoid/DiffusionTalker.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2503.18159

Country:

Asia > China (0.15)
Asia > Middle East > Israel (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Education (0.51)

Technology:

Information Technology > Graphics > Animation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.40)

Add feedback

PC-Talk: Precise Facial Animation Control for Audio-Driven Talking Face Generation

Wang, Baiqin, Zhu, Xiangyu, Shen, Fan, Xu, Hao, Lei, Zhen

arXiv.org Artificial IntelligenceMar-20-2025

Recent advancements in audio-driven talking face generation have made great progress in lip synchronization. However, current methods often lack sufficient control over facial animation such as speaking style and emotional expression, resulting in uniform outputs. In this paper, we focus on improving two key factors: lip-audio alignment and emotion control, to enhance the diversity and user-friendliness of talking videos. Lip-audio alignment control focuses on elements like speaking style and the scale of lip movements, whereas emotion control is centered on generating realistic emotional expressions, allowing for modifications in multiple attributes such as intensity. To achieve precise control of facial animation, we propose a novel framework, PC-Talk, which enables lip-audio alignment and emotion control through implicit keypoint deformations. First, our lip-audio alignment control module facilitates precise editing of speaking styles at the word level and adjusts lip movement scales to simulate varying vocal loudness levels, maintaining lip synchronization with the audio. Second, our emotion control module generates vivid emotional facial features with pure emotional deformation. This module also enables the fine modification of intensity and the combination of multiple emotions across different facial regions. Our method demonstrates outstanding control capabilities and achieves state-of-the-art performance on both HDTF and MEAD datasets in extensive experiments.

animation, artificial intelligence, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2503.14295

Country: Asia (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Graphics > Animation (0.84)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (0.70)
(2 more...)

Add feedback

KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation

Bigata, Antoni, Stypułkowski, Michał, Mira, Rodrigo, Bounareli, Stella, Vougioukas, Konstantinos, Landgraf, Zoe, Drobyshev, Nikita, Zieba, Maciej, Petridis, Stavros, Pantic, Maja

arXiv.org Artificial IntelligenceMar-19-2025

Current audio-driven facial animation methods achieve impressive results for short videos but suffer from error accumulation and identity drift when extended to longer durations. Existing methods attempt to mitigate this through external spatial control, increasing long-term consistency but compromising the naturalness of motion. We propose KeyFace, a novel two-stage diffusion-based framework, to address these issues. In the first stage, keyframes are generated at a low frame rate, conditioned on audio input and an identity frame, to capture essential facial expressions and movements over extended periods of time. In the second stage, an interpolation model fills in the gaps between keyframes, ensuring smooth transitions and temporal coherence. To further enhance realism, we incorporate continuous emotion representations and handle a wide range of non-speech vocalizations (NSVs), such as laughter and sighs. We also introduce two new evaluation metrics for assessing lip synchronization and NSV generation. Experimental results show that KeyFace outperforms state-of-the-art methods in generating natural, coherent facial animations over extended durations, successfully encompassing NSVs and continuous emotions.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.01715

Country:

Europe (0.67)
North America > United States > Hawaii (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Graphics > Animation (0.94)
(2 more...)

Add feedback

Motion Synthesis with Sparse and Flexible Keyjoint Control

Hwang, Inwoo, Bae, Jinseok, Lim, Donggeun, Kim, Young Min

arXiv.org Artificial IntelligenceMar-18-2025

Creating expressive character animations is labor-intensive, requiring intricate manual adjustment of animators across space and time. Previous works on controllable motion generation often rely on a predefined set of dense spatio-temporal specifications (e.g., dense pelvis trajectories with exact per-frame timing), limiting practicality for animators. To process high-level intent and intuitive control in diverse scenarios, we propose a practical controllable motions synthesis framework that respects sparse and flexible keyjoint signals. Our approach employs a decomposed diffusion-based motion synthesis framework that first synthesizes keyjoint movements from sparse input control signals and then synthesizes full-body motion based on the completed keyjoint trajectories. The low-dimensional keyjoint movements can easily adapt to various control signal types, such as end-effector position for diverse goal-driven motion synthesis, or incorporate functional constraints on a subset of keyjoints. Additionally, we introduce a time-agnostic control formulation, eliminating the need for frame-specific timing annotations and enhancing control flexibility. Then, the shared second stage can synthesize a natural whole-body motion that precisely satisfies the task requirement from dense keyjoint movements. We demonstrate the effectiveness of sparse and flexible keyjoint control through comprehensive experiments on diverse datasets and scenarios.

control signal, motion synthesis, synthesis, (15 more...)

arXiv.org Artificial Intelligence

2503.15557

Country:

Asia (0.28)
North America > United States (0.14)

Technology:

Information Technology > Graphics > Animation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

CausalChaos! Dataset for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes

Neural Information Processing SystemsMar-17-2025, 13:17:35 GMT

Causal video question answering (QA) has garnered increasing interest, yet existing datasets often lack depth in causal reasoning. To address this gap, we capitalize on the unique properties of cartoons and construct CausalChaos!, a novel, challenging causal Why-QA dataset built upon the iconic "Tom and Jerry" cartoon series. Cartoons use the principles of animation that allow animators to create expressive, unambiguous causal relationships between events to form a coherent storyline. Utilizing these properties, along with thought-provoking questions and multi-level answers (answer and detailed causal explanation), our questions involve causal chains that interconnect multiple dynamic interactions between characters and visual scenes. These factors demand models to solve more challenging, yet well-defined causal relationships.

animation, artificial intelligence, causalchao, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.64)
Information Technology > Graphics > Animation (0.62)

Add feedback

Learning 3D Garment Animation from Trajectories of A Piece of Cloth

Neural Information Processing SystemsMar-16-2025, 16:04:45 GMT

Garment animation is ubiquitous in various applications, such as virtual reality, gaming, and film producing. Recently, learning-based approaches obtain compelling performance in animating diverse garments under versatile scenarios. Nevertheless, to mimic the deformations of the observed garments, data-driven methods require large scale of garment data, which are both resource-wise expensive and time-consuming. In addition, forcing models to match the dynamics of observed garment animation may hinder the potentials to generalize to unseen cases. In this paper, instead of using garment-wise supervised-learning we adopt a disentangled scheme to learn how to animate observed garments: 1). Specifically, we propose Energy Unit network (EUNet) to model the constitutive relations in the format of energy.

artificial intelligence, garment, machine learning, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Graphics > Animation (0.87)
Information Technology > Artificial Intelligence > Machine Learning (0.78)

Add feedback