Motion-Conditioned Image Animation for Video Editing
Yan, Wilson, Brown, Andrew, Abbeel, Pieter, Girdhar, Rohit, Azadi, Samaneh
–arXiv.org Artificial Intelligence
Recent advancements in image and video generation models have seen tremendous progress, with existing models able to synthesize highly complex images [26, 27, 28, 30, 6] or videos [37, 31, 2, 15, 12] given textual descriptions. Outside of generating purely novel content, these models have shown to be powerful tools in achieving advanced image and video editing capabilities for downstream content creation. Given a source video, a caption of the source video, and an editing textual prompt, a video editing method should produce a new video that is aligned with the provided editing prompt while retaining faithfulness to all other non-edited characteristics of the original source video. Video edit types can be broadly split into two main categories of spatial and temporal edits. Spatial edits generally consist of image-based edits extended to video, such as editing a video in the style of Van Gogh, inserting an object into the scene, or changing the background. Due to the added temporal dimension in video, we can also change the underlying motion of the object, such as making a panda play in a pile of ribbons, or replacing apricots in a video with apples and making them fall off a tree (see Figure 1).
arXiv.org Artificial Intelligence
Nov-30-2023