visual effect
Training-free Diffusion Model Adaptation for V ariable-Sized Text-to-Image Synthesis (Supplementary Materials)
We now investigate the relation between the attention entropy and the token number. The revised code are shown in Algorithm 1. Both of them are top-ranked parameter files for downloading. Experiments are conducted on a server with Intel(R) Xeon(R) Gold 6226R CPUs @ 2.90GHz and We conduct an text-based pairwise preference test. The screenshot is depicted in Figure 1.
LART: Neural Correspondence Learning with Latent Regularization Transformer for 3D Motion Transfer
In this work, we propose a novel 3D Transformer framework called LART for 3D motion transfer. With carefully-designed architectures, LART is able to implicitly learn the correspondence via a flexible geometry perception. Thus, unlike other existing methods, LART does not require any key point annotations or pre-defined correspondence between the motion source and target meshes and can also handle large-size full-detailed unseen 3D targets. Besides, we introduce a novel latent metric regularization on the Transformer for better motion generation. Our rationale lies in the observation that the decoded motions can be approximately expressed as linearly geometric distortion at the frame level. The metric preservation of motions could be translated to the formation of linear paths in the underlying latent space as a rigorous constraint to control the synthetic motions occurring in the construction of the latent space. The proposed LART shows a high learning efficiency with the need for a few samples from the AMASS dataset to generate motions with plausible visual effects. The experimental results verify the potential of our generative model in applications of motion transfer, content generation, temporal interpolation, and motion denoising.
Associating Objects and Their Effects in Video through Coordination Games
We explore a feed-forward approach for decomposing a video into layers, where each layer contains an object of interest along with its associated shadows, reflections, and other visual effects. This problem is challenging since associated effects vary widely with the 3D geometry and lighting conditions in the scene, and ground-truth labels for visual effects are difficult (and in some cases impractical) to collect. We take a self-supervised approach and train a neural network to produce a foreground image and alpha matte from a rough object segmentation mask under a reconstruction and sparsity loss. Under reconstruction loss, the layer decomposition problem is underdetermined: many combinations of layers may reconstruct the input video.Inspired by the game theory concept of focal points---or \emph{Schelling points}---we pose the problem as a coordination game, where each player (network) predicts the effects for a single object without knowledge of the other players' choices. The players learn to converge on the ``natural'' layer decomposition in order to maximize the likelihood of their choices aligning with the other players'. We train the network to play this game with itself, and show how to design the rules of this game so that the focal point lies at the correct layer decomposition. We demonstrate feed-forward results on a challenging synthetic dataset, then show that pretraining on this dataset significantly reduces optimization time for real videos.
CrimEdit: Controllable Editing for Counterfactual Object Removal, Insertion, and Movement
Jeon, Boseong, Lee, Junghyuk, Park, Jimin, Kim, Kwanyoung, Jung, Jingi, Lee, Sangwon, Shim, Hyunbo
Recent works on object removal and insertion have enhanced their performance by handling object effects such as shadows and reflections, using diffusion models trained on counterfactual datasets. However, the performance impact of applying classifier-free guidance to handle object effects across removal and insertion tasks within a unified model remains largely unexplored. T o address this gap and improve efficiency in composite editing, we propose CrimEdit, which jointly trains the task embeddings for removal and insertion within a single model and leverages them in a classifier-free guidance scheme--enhancing the removal of both objects and their effects, and enabling controllable synthesis of object effects during insertion. CrimEdit also extends these two task prompts to be applied to spatially distinct regions, enabling object movement (repositioning) within a single denoising step. By employing both guidance techniques, extensive experiments show that CrimEdit achieves superior object removal, controllable effect insertion, and efficient object movement--without requiring additional training or separate removal and insertion stages.
Training-free Diffusion Model Adaptation for V ariable-Sized T ext-to-Image Synthesis (Supplementary Materials)
We now investigate the relation between the attention entropy and the token number. The revised code are shown in Algorithm 1. Both of them are top-ranked parameter files for downloading. Experiments are conducted on a server with Intel(R) Xeon(R) Gold 6226R CPUs @ 2.90GHz and We conduct an text-based pairwise preference test. The screenshot is depicted in Figure 1.
Hollywood turns to AI tools to rewire movie magic
Fox News anchor and executive editor Bret Baier has the latest on fears over the'darker side' of artificial intelligence on'Special Report.' Generative Artificial Intelligence can create lifelike imaging and audio, which is likely why an increasing number of film studios are incorporating A.I. into special effects. It comes just two years after Hollywood's largest union went on strike, in part over the impact A.I. would bring. "Popular culture movies like The Terminator have created a very dark dystopian version of what this could look like," White House A.I. and Crypto Czar David Sacks said. "The version of the future of A.I. that I think is probably most accurate if you want to pop cultural references is Star Trek Enterprise. Think about the ship computer in that. It can perform tasks for you. But it doesn't have a will of its own, it doesn't' have a mind of its' own. It's there to help the crew, and it needs to be supervised by humans."
- South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.06)
- North America > United States > District of Columbia > Washington (0.05)
- North America > Mexico (0.05)
- Europe > Ukraine (0.05)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Media > News (0.74)
Netflix uses generative AI in one of its shows for first time
Netflix has used artificial intelligence in one of its TV shows for the first time, in a move the streaming company's boss said would make films and programmes cheaper and of better quality. Ted Sarandos, a co-chief executive of Netflix, said the Argentinian science fiction series El Eternauta (The Eternaut) was the first it had made that involved using generative AI footage. "We remain convinced that AI represents an incredible opportunity to help creators make films and series better, not just cheaper," he told analysts on Thursday after Netflix reported its second-quarter results. He said the series, which follows survivors of a rapid and devastating toxic snowfall, involved Netflix and visual effects (VFX) artists using AI to show a building collapsing in Buenos Aires. "Using AI-powered tools, they were able to achieve an amazing result with remarkable speed and, in fact, that VFX sequence was completed 10 times faster than it could have been completed with traditional VFX tools and workflows," he said.
- Media > Television (1.00)
- Media > Film (1.00)
- Information Technology > Services (1.00)
Exploring In-Image Machine Translation with Real-World Background
Tian, Yanzhi, Liu, Zeming, Liu, Zhengyang, Guo, Yuhang
In-Image Machine Translation (IIMT) aims to translate texts within images from one language to another. Previous research on IIMT was primarily conducted on simplified scenarios such as images of one-line text with black font in white backgrounds, which is far from reality and impractical for applications in the real world. To make IIMT research practically valuable, it is essential to consider a complex scenario where the text backgrounds are derived from real-world images. To facilitate research of complex scenario IIMT, we design an IIMT dataset that includes subtitle text with real-world background. However previous IIMT models perform inadequately in complex scenarios. To address the issue, we propose the DebackX model, which separates the background and text-image from the source image, performs translation on text-image directly, and fuses the translated text-image with the background, to generate the target image. Experimental results show that our model achieves improvements in both translation quality and visual effect.
LART: Neural Correspondence Learning with Latent Regularization Transformer for 3D Motion Transfer
In this work, we propose a novel 3D Transformer framework called LART for 3D motion transfer. With carefully-designed architectures, LART is able to implicitly learn the correspondence via a flexible geometry perception. Thus, unlike other existing methods, LART does not require any key point annotations or pre-defined correspondence between the motion source and target meshes and can also handle large-size full-detailed unseen 3D targets. Besides, we introduce a novel latent metric regularization on the Transformer for better motion generation. Our rationale lies in the observation that the decoded motions can be approximately expressed as linearly geometric distortion at the frame level.
Associating Objects and Their Effects in Video through Coordination Games
We explore a feed-forward approach for decomposing a video into layers, where each layer contains an object of interest along with its associated shadows, reflections, and other visual effects. This problem is challenging since associated effects vary widely with the 3D geometry and lighting conditions in the scene, and ground-truth labels for visual effects are difficult (and in some cases impractical) to collect. We take a self-supervised approach and train a neural network to produce a foreground image and alpha matte from a rough object segmentation mask under a reconstruction and sparsity loss. Under reconstruction loss, the layer decomposition problem is underdetermined: many combinations of layers may reconstruct the input video.Inspired by the game theory concept of focal points---or \emph{Schelling points}---we pose the problem as a coordination game, where each player (network) predicts the effects for a single object without knowledge of the other players' choices. The players learn to converge on the natural'' layer decomposition in order to maximize the likelihood of their choices aligning with the other players'.