Goto

Collaborating Authors

 animate


Dream, Lift, Animate: From Single Images to Animatable Gaussian Avatars

Bühler, Marcel C., Yuan, Ye, Li, Xueting, Huang, Yangyi, Nagano, Koki, Iqbal, Umar

arXiv.org Artificial Intelligence

We introduce Dream, Lift, Animate (DLA), a novel framework that reconstructs animatable 3D human avatars from a single image. This is achieved by leveraging multi-view generation, 3D Gaussian lifting, and pose-aware UV-space mapping of 3D Gaussians. Given an image, we first dream plausible multi-views using a video diffusion model, capturing rich geometric and appearance details. These views are then lifted into unstructured 3D Gaussians. To enable animation, we propose a transformer-based encoder that models global spatial relationships and projects these Gaussians into a structured latent representation aligned with the UV space of a parametric body model. This latent code is decoded into UV-space Gaussians that can be animated via body-driven deformation and rendered conditioned on pose and viewpoint. By anchoring Gaussians to the UV manifold, our method ensures consistency during animation while preserving fine visual details. DLA enables real-time rendering and intuitive editing without requiring post-processing. Our method outperforms state-of-the-art approaches on the ActorsHQ and 4D-Dress datasets in both perceptual quality and photometric accuracy. By combining the generative strengths of video diffusion models with a pose-aware UV-space Gaussian mapping, DLA bridges the gap between unstructured 3D representations and high-fidelity, animation-ready avatars.


A Appendix A.1 Dataset An example of a single state from a nullS t,A,S

Neural Information Processing Systems

Game: ztuu Location: Cultural Complex This imposing ante-room, the center of what was apparently the cultural center of the GUE, is adorned in the ghastly style of the GUE's "Grotesque Period." With leering gargoyles, cartoonish friezes depicting long-forgotten scenes of GUE history, and primitive statuary of pointy-headed personages unknown (perhaps very, very distant progenitors of the Flatheads), the place would have been best left undiscovered. North of here, a large hallway passes under the roughly hewn inscription "Convention Center." To the east, under a fifty-story triumphal arch, a passageway the size of a large city boulevard opens into the Royal Theater. A relatively small and unobtrusive sign (perhaps ten feet high) stands nearby.


A First Context-Free Grammar Applied to Nawatl Corpora Augmentation

Guzmán-Landa, Juan-José, Torres-Moreno, Juan-Manuel, Figueroa-Saavedra, Miguel, Quintana-Torres, Ligia, Avendaño-Garrido, Martha-Lorena, Ranger, Graham

arXiv.org Artificial Intelligence

In this article we introduce a context-free grammar (CFG) for the Nawatl language. Nawatl (or Nahuatl) is an Amerindian language of the $π$-language type, i.e. a language with few digital resources, in which the corpora available for machine learning are virtually non-existent. The objective here is to generate a significant number of grammatically correct artificial sentences, in order to increase the corpora available for language model training. We want to show that a grammar enables us significantly to expand a corpus in Nawatl which we call $π$-\textsc{yalli}. The corpus, thus enriched, enables us to train algorithms such as FastText and to evaluate them on sentence-level semantic tasks. Preliminary results show that by using the grammar, comparative improvements are achieved over some LLMs. However, it is observed that to achieve more significant improvement, grammars that model the Nawatl language even more effectively are required.


A Appendix A.1 Dataset An example of a single state from a nullS t,A,S

Neural Information Processing Systems

Game: ztuu Location: Cultural Complex This imposing ante-room, the center of what was apparently the cultural center of the GUE, is adorned in the ghastly style of the GUE's "Grotesque Period." With leering gargoyles, cartoonish friezes depicting long-forgotten scenes of GUE history, and primitive statuary of pointy-headed personages unknown (perhaps very, very distant progenitors of the Flatheads), the place would have been best left undiscovered. North of here, a large hallway passes under the roughly hewn inscription "Convention Center." To the east, under a fifty-story triumphal arch, a passageway the size of a large city boulevard opens into the Royal Theater. A relatively small and unobtrusive sign (perhaps ten feet high) stands nearby.


The new Meta.ai website can draw amazing AI art instantly

PCWorld

Meta has launched Meta.ai, an AI-specific site that has a cool hook that its competitors don't offer: It can generate images in real time, and even animate them on demand. There is a catch, however: Meta would really like to continue improving Meta.ai, and to do so it's only offering image generation if you sign into your Facebook account. Meta joins other LLMS or AI chatbots like Google Gemini, Microsoft's various flavors of Copilot, Anthropic's Claude AI (used within Discord), and other sites offering AI solutions. Meta.ai feels like more of the same, though with some limitations: It can't accept uploaded documents, but it can summarize websites or web pages. Of course, it has creative purposes, too: It can also be used to write or rewrite text, as many other services can as well.


When Language Models Fall in Love: Animacy Processing in Transformer Language Models

Hanna, Michael, Belinkov, Yonatan, Pezzelle, Sandro

arXiv.org Artificial Intelligence

Animacy - whether an entity is alive and sentient - is fundamental to cognitive processing, impacting areas such as memory, vision, and language. However, animacy is not always expressed directly in language: in English it often manifests indirectly, in the form of selectional constraints on verbs and adjectives. This poses a potential issue for transformer language models (LMs): they often train only on text, and thus lack access to extralinguistic information from which humans learn about animacy. We ask: how does this impact LMs' animacy processing - do they still behave as humans do? We answer this question using open-source LMs. Like previous studies, we find that LMs behave much like humans when presented with entities whose animacy is typical. However, we also show that even when presented with stories about atypically animate entities, such as a peanut in love, LMs adapt: they treat these entities as animate, though they do not adapt as well as humans. Even when the context indicating atypical animacy is very short, LMs pick up on subtle clues and change their behavior. We conclude that despite the limited signal through which LMs can learn about animacy, they are indeed sensitive to the relevant lexical semantic nuances available in English.


StyleAvatar3D: Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation

Zhang, Chi, Chen, Yiwen, Fu, Yijun, Zhou, Zhenglin, YU, Gang, Wang, Billzb, Fu, Bin, Chen, Tao, Lin, Guosheng, Shen, Chunhua

arXiv.org Artificial Intelligence

The recent advancements in image-text diffusion models have stimulated research interest in large-scale 3D generative models. Nevertheless, the limited availability of diverse 3D resources presents significant challenges to learning. In this paper, we present a novel method for generating high-quality, stylized 3D avatars that utilizes pre-trained image-text diffusion models for data generation and a Generative Adversarial Network (GAN)-based 3D generation network for training. Our method leverages the comprehensive priors of appearance and geometry offered by image-text diffusion models to generate multi-view images of avatars in various styles. During data generation, we employ poses extracted from existing 3D models to guide the generation of multi-view images. To address the misalignment between poses and images in data, we investigate view-specific prompts and develop a coarse-to-fine discriminator for GAN training. We also delve into attribute-related prompts to increase the diversity of the generated avatars. Additionally, we develop a latent diffusion model within the style space of StyleGAN to enable the generation of avatars based on image inputs. Our approach demonstrates superior performance over current state-of-the-art methods in terms of visual quality and diversity of the produced avatars.


Startup Farmers Learn the Art of Animal Agriculture in "Chicken Stories"

The New Yorker

At a startup farm outside of Oakland, a young man reads on his phone in a dim bedroom. He's not scrolling through social-media feeds or playing games; he's trying to learn about caring for his livestock. A Siri-like voice-over says, "I found one article on how to take care of baby chicks." On the floor, a large blue bucket sits under the warm glow of a heat lamp, with about a dozen fluffy chicks inside. "Failure to maintain a warm environment will quickly prove to be fatal," the digital voice explains.


Creating an animated 3D character in Gradient with AvatarCLIP (Part 1)

#artificialintelligence

One of the greatest promises of deep learning has been the advent of generated media. This is largely because generated media is one of the, currently, most easily monetized solutions that is offered by these frameworks. Generated media, regardless of format be it video, audio, text or others, has the potential to be translated into content for a plethora of different purposes. By harnessing this creative power, we can automate a huge portion of the creative process on associated tasks, and the technology has no reached the point that this content can even be sometimes indistinguishable from content made by true human actors. This is particularly true for NLP and computer vision related tasks.


MyHeritage and D-ID partner to bring photos to life with both animations and voice – TechCrunch

#artificialintelligence

Last year, genealogy service MyHeritage went viral after introducing a new "deepfake" feature that allowed users to animate the faces of loved ones in still photos. TikTok users posted videos reacting to the technology, called "Deep Nostalgia," as they brought back relatives they never got to meet or those whose loss they still grieved. To date, more than 100 million photos have been animated with the feature. Now comes the next iteration. Today, MyHeritage along with technology partner D-ID is expanding upon "Deep Nostalgia," with the launch of "LiveStory," a feature that doesn't just bring the people in photos to life with movement, but actually has them speak.