editing process
Object-AVEdit: An Object-level Audio-Visual Editing Model
Fu, Youquan, Si, Ruiyang, Wang, Hongfa, Zhou, Dongzhan, Sun, Jiacheng, Luo, Ping, Hu, Di, Zhang, Hongyuan, Li, Xuelong
There is a high demand for audio-visual editing in video post-production and the film making field. While numerous models have explored audio and video editing, they struggle with object-level audio-visual operations. Specifically, object-level audio-visual editing requires the ability to perform object addition, replacement, and removal across both audio and visual modalities, while preserving the structural information of the source instances during the editing process. In this paper, we present \textbf{Object-AVEdit}, achieving the object-level audio-visual editing based on the inversion-regeneration paradigm. To achieve the object-level controllability during editing, we develop a word-to-sounding-object well-aligned audio generation model, bridging the gap in object-controllability between audio and current video generation models. Meanwhile, to achieve the better structural information preservation and object-level editing effect, we propose an inversion-regeneration holistically-optimized editing algorithm, ensuring both information retention during the inversion and better regeneration effect. Extensive experiments demonstrate that our editing model achieved advanced results in both audio-video object-level editing tasks with fine audio-visual semantic alignment. In addition, our developed audio generation model also achieved advanced performance. More results on our project page: https://gewu-lab.github.io/Object_AVEdit-website/.
Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing
Iakovleva, Ekaterina, Pizzati, Fabio, Torr, Philip, Lathuilière, Stéphane
Text-based editing diffusion models exhibit limited performance when the user's input instruction is ambiguous. To solve this problem, we propose $\textit{Specify ANd Edit}$ (SANE), a zero-shot inference pipeline for diffusion-based editing systems. We use a large language model (LLM) to decompose the input instruction into specific instructions, i.e. well-defined interventions to apply to the input image to satisfy the user's request. We benefit from the LLM-derived instructions along the original one, thanks to a novel denoising guidance strategy specifically designed for the task. Our experiments with three baselines and on two datasets demonstrate the benefits of SANE in all setups. Moreover, our pipeline improves the interpretability of editing models, and boosts the output diversity. We also demonstrate that our approach can be applied to any edit, whether ambiguous or not. Our code is public at https://github.com/fabvio/SANE.
DragText: Rethinking Text Embedding in Point-based Image Editing
Choi, Gayoon, Jeong, Taejin, Hong, Sujung, Joo, Jaehoon, Hwang, Seong Jae
Point-based image editing enables accurate and flexible control through content dragging. However, the role of text embedding in the editing process has not been thoroughly investigated. A significant aspect that remains unexplored is the interaction between text and image embeddings. In this study, we show that during the progressive editing of an input image in a diffusion model, the text embedding remains constant. As the image embedding increasingly diverges from its initial state, the discrepancy between the image and text embeddings presents a significant challenge. Moreover, we found that the text prompt significantly influences the dragging process, particularly in maintaining content integrity and achieving the desired manipulation. To utilize these insights, we propose DragText, which optimizes text embedding in conjunction with the dragging process to pair with the modified image embedding. Simultaneously, we regularize the text optimization process to preserve the integrity of the original text prompt. Our approach can be seamlessly integrated with existing diffusion-based drag methods with only a few lines of code.
VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing
Gu, Jing, Fang, Yuwei, Skorokhodov, Ivan, Wonka, Peter, Du, Xinya, Tulyakov, Sergey, Wang, Xin Eric
Video editing stands as a cornerstone of digital media, from entertainment and education to professional communication. However, previous methods often overlook the necessity of comprehensively understanding both global and local contexts, leading to inaccurate and inconsistency edits in the spatiotemporal dimension, especially for long videos. In this paper, we introduce VIA, a unified spatiotemporal VIdeo Adaptation framework for global and local video editing, pushing the limits of consistently editing minute-long videos. First, to ensure local consistency within individual frames, the foundation of VIA is a novel test-time editing adaptation method, which adapts a pre-trained image editing model for improving consistency between potential editing directions and the text instruction, and adapts masked latent variables for precise local control. Furthermore, to maintain global consistency over the video sequence, we introduce spatiotemporal adaptation that adapts consistent attention variables in key frames and strategically applies them across the whole sequence to realize the editing effects. Extensive experiments demonstrate that, compared to baseline methods, our VIA approach produces edits that are more faithful to the source videos, more coherent in the spatiotemporal context, and more precise in local control. More importantly, we show that VIA can achieve consistent long video editing in minutes, unlocking the potentials for advanced video editing tasks over long video sequences.
MotifRetro: Exploring the Combinability-Consistency Trade-offs in retrosynthesis via Dynamic Motif Editing
Gao, Zhangyang, Chen, Xingran, Tan, Cheng, Li, Stan Z.
Is there a unified framework for graph-based retrosynthesis prediction? Through analysis of full-, semi-, and non-template retrosynthesis methods, we discovered that they strive to strike an optimal balance between combinability and consistency: \textit{Should atoms be combined as motifs to simplify the molecular editing process, or should motifs be broken down into atoms to reduce the vocabulary and improve predictive consistency?} Recent works have studied several specific cases, while none of them explores different combinability-consistency trade-offs. Therefore, we propose MotifRetro, a dynamic motif editing framework for retrosynthesis prediction that can explore the entire trade-off space and unify graph-based models. MotifRetro comprises two components: RetroBPE, which controls the combinability-consistency trade-off, and a motif editing model, where we introduce a novel LG-EGAT module to dynamiclly add motifs to the molecule. We conduct extensive experiments on USPTO-50K to explore how the trade-off affects the model performance and finally achieve state-of-the-art performance.
The Future of Writing in the Age of Artificial Intelligence
Artificial Intelligence has been promising for a long time to disrupt almost any industry that is knowledge based and relies on data and information. One platform is now starting to deliver. Open AI was founded in 2015 by Elon Musk and is expected to be valued at $29 Billion at its next round of funding. Open AI's latest technology tool, ChatGPT was released on November 29, 2022. In just one week one million users registered with the platform.
Disney creates new AI tool that can turn up and down actors age
Disney has hopped into the realm of being play with the knob of time, as the company has developed a new AI tool that is capable of winding back the clock for actors. The new artificial intelligence tool is called the Face Re-aging Network (FRAN), and is capable of automatically changing the age of actors, which will undoubtedly speed up the visual effects editing process that already takes several months to days, depending on the length of the content being altered. Manual de-aging typically involves an individual going through every single frame of the film and painting the appropriate effect onto the actor's skin. Another way is completely replacing the actor with a digital puppet to speed up the editing process. Now, Disney plans on putting the majority of that heavy lifting onto the shoulders of an AI, specifically FRAN, that the company says already complements traditional re-aging techniques that are already widely used in film production.
DiffusER: Discrete Diffusion via Edit-based Reconstruction
Reid, Machel, Hellendoorn, Vincent J., Neubig, Graham
In text generation, models that generate text from scratch one token at a time are currently the dominant paradigm. Despite being performant, these models lack the ability to revise existing text, which limits their usability in many practical scenarios. We look to address this, with DiffusER (Diffusion via Edit-based Reconstruction), a new edit-based generative model for text based on denoising diffusion models -- a class of models that use a Markov chain of denoising steps to incrementally generate data. DiffusER is not only a strong generative model in general, rivalling autoregressive models on several tasks spanning machine translation, summarization, and style transfer; it can also perform other varieties of generation that standard autoregressive models are not well-suited for. For instance, we demonstrate that DiffusER makes it possible for a user to condition generation on a prototype, or an incomplete sequence, and continue revising based on previous edit steps.
Top AI-Powered Photo Editing Tools In 2022
There are currently several AI-based image editing apps available that can not only edit images to meet specific requirements, such as removing backgrounds or enhancing colors but also do so quickly. As a result, post-processing time is reduced to a bare minimum. This artificial intelligence-based image editing software uses an algorithm based on machine learning and neural networks to completely change the look of images, rather than simply overlaying them as regular filters do. Sketch quickly became the go-to UI design app among professionals worldwide after its release in 2010. Although many competitors have chipped away at its market share since then, its position as the industry standard has remained relatively stable to this day.
Luminar AI Uses Human-Inspired Artificial Intelligence for A Faster Editing Experience
With traditional photo editors, creating the perfect photo is a time-consuming process that involves moving dozens of sliders. Many seek to use presets to speed this up, but there are severe limitations. Presets tend to only work on images that are virtually identical to the original. To change this tedious and frustrating process, innovative companies race to embrace Artificial Intelligence. But some creatives have been skeptical about its effectiveness and limitations.