AITopics

2510.0005

Country: Asia > China (0.46)

Genre: Research Report (0.51)

Industry: Media (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)

Iakovleva, Ekaterina, Pizzati, Fabio, Torr, Philip, Lathuilière, Stéphane

Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing

arXiv.org Artificial IntelligenceJul-29-2024

Text-based editing diffusion models exhibit limited performance when the user's input instruction is ambiguous. To solve this problem, we propose $\textit{Specify ANd Edit}$ (SANE), a zero-shot inference pipeline for diffusion-based editing systems. We use a large language model (LLM) to decompose the input instruction into specific instructions, i.e. well-defined interventions to apply to the input image to satisfy the user's request. We benefit from the LLM-derived instructions along the original one, thanks to a novel denoising guidance strategy specifically designed for the task. Our experiments with three baselines and on two datasets demonstrate the benefits of SANE in all setups. Moreover, our pipeline improves the interpretability of editing models, and boosts the output diversity. We also demonstrate that our approach can be applied to any edit, whether ambiguous or not. Our code is public at https://github.com/fabvio/SANE.

ambiguous instruction, instruction, specific instruction, (15 more...)

2407.20232

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (0.46)
Media > Photography (0.42)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

arXiv.org Artificial IntelligenceJul-25-2024

DragText: Rethinking Text Embedding in Point-based Image Editing

Choi, Gayoon, Jeong, Taejin, Hong, Sujung, Joo, Jaehoon, Hwang, Seong Jae

Point-based image editing enables accurate and flexible control through content dragging. However, the role of text embedding in the editing process has not been thoroughly investigated. A significant aspect that remains unexplored is the interaction between text and image embeddings. In this study, we show that during the progressive editing of an input image in a diffusion model, the text embedding remains constant. As the image embedding increasingly diverges from its initial state, the discrepancy between the image and text embeddings presents a significant challenge. Moreover, we found that the text prompt significantly influences the dragging process, particularly in maintaining content integrity and achieving the desired manipulation. To utilize these insights, we propose DragText, which optimizes text embedding in conjunction with the dragging process to pair with the modified image embedding. Simultaneously, we regularize the text optimization process to preserve the integrity of the original text prompt. Our approach can be seamlessly integrated with existing diffusion-based drag methods with only a few lines of code.

artificial intelligence, machine learning, natural language, (17 more...)

2407.17843

Genre: Research Report > New Finding (1.00)

Industry: Media > Photography (0.64)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

arXiv.org Artificial IntelligenceJun-18-2024

VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing

Gu, Jing, Fang, Yuwei, Skorokhodov, Ivan, Wonka, Peter, Du, Xinya, Tulyakov, Sergey, Wang, Xin Eric

Video editing stands as a cornerstone of digital media, from entertainment and education to professional communication. However, previous methods often overlook the necessity of comprehensively understanding both global and local contexts, leading to inaccurate and inconsistency edits in the spatiotemporal dimension, especially for long videos. In this paper, we introduce VIA, a unified spatiotemporal VIdeo Adaptation framework for global and local video editing, pushing the limits of consistently editing minute-long videos. First, to ensure local consistency within individual frames, the foundation of VIA is a novel test-time editing adaptation method, which adapts a pre-trained image editing model for improving consistency between potential editing directions and the text instruction, and adapts masked latent variables for precise local control. Furthermore, to maintain global consistency over the video sequence, we introduce spatiotemporal adaptation that adapts consistent attention variables in key frames and strategically applies them across the whole sequence to realize the editing effects. Extensive experiments demonstrate that, compared to baseline methods, our VIA approach produces edits that are more faithful to the source videos, more coherent in the spatiotemporal context, and more precise in local control. More importantly, we show that VIA can achieve consistent long video editing in minutes, unlocking the potentials for advanced video editing tasks over long video sequences.

consistency, editing, video editing, (11 more...)

2406.12831

Country:

North America > United States > Texas (0.04)
North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
Asia (0.04)

Genre: Research Report (1.00)

Industry: Media (0.52)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

arXiv.org Artificial IntelligenceMay-20-2023

MotifRetro: Exploring the Combinability-Consistency Trade-offs in retrosynthesis via Dynamic Motif Editing

Gao, Zhangyang, Chen, Xingran, Tan, Cheng, Li, Stan Z.

Is there a unified framework for graph-based retrosynthesis prediction? Through analysis of full-, semi-, and non-template retrosynthesis methods, we discovered that they strive to strike an optimal balance between combinability and consistency: \textit{Should atoms be combined as motifs to simplify the molecular editing process, or should motifs be broken down into atoms to reduce the vocabulary and improve predictive consistency?} Recent works have studied several specific cases, while none of them explores different combinability-consistency trade-offs. Therefore, we propose MotifRetro, a dynamic motif editing framework for retrosynthesis prediction that can explore the entire trade-off space and unify graph-based models. MotifRetro comprises two components: RetroBPE, which controls the combinability-consistency trade-off, and a motif editing model, where we introduce a novel LG-EGAT module to dynamiclly add motifs to the molecule. We conduct extensive experiments on USPTO-50K to explore how the trade-off affects the model performance and finally achieve state-of-the-art performance.

artificial intelligence, data mining, machine learning, (17 more...)

2305.15153

Country: Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)

Genre: Research Report (0.64)

Industry: Materials (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
Information Technology > Data Science > Data Mining (0.68)

#artificialintelligenceJan-13-2023, 19:05:09 GMT

The Future of Writing in the Age of Artificial Intelligence

Artificial Intelligence has been promising for a long time to disrupt almost any industry that is knowledge based and relies on data and information. One platform is now starting to deliver. Open AI was founded in 2015 by Elon Musk and is expected to be valued at $29 Billion at its next round of funding. Open AI's latest technology tool, ChatGPT was released on November 29, 2022. In just one week one million users registered with the platform.

artificial intelligence, machine learning, natural language, (14 more...)

Industry: Media > Publishing (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.55)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.40)

#artificialintelligenceDec-5-2022, 13:03:35 GMT

Disney creates new AI tool that can turn up and down actors age

Disney has hopped into the realm of being play with the knob of time, as the company has developed a new AI tool that is capable of winding back the clock for actors. The new artificial intelligence tool is called the Face Re-aging Network (FRAN), and is capable of automatically changing the age of actors, which will undoubtedly speed up the visual effects editing process that already takes several months to days, depending on the length of the content being altered. Manual de-aging typically involves an individual going through every single frame of the film and painting the appropriate effect onto the actor's skin. Another way is completely replacing the actor with a digital puppet to speed up the editing process. Now, Disney plans on putting the majority of that heavy lifting onto the shoulders of an AI, specifically FRAN, that the company says already complements traditional re-aging techniques that are already widely used in film production.

actor age, disney create new ai tool, editing process, (3 more...)

Industry:

Media > Film (0.59)
Leisure & Entertainment (0.59)

Technology: Information Technology > Artificial Intelligence (1.00)

Reid, Machel, Hellendoorn, Vincent J., Neubig, Graham

DiffusER: Discrete Diffusion via Edit-based Reconstruction

arXiv.org Artificial IntelligenceOct-30-2022

In text generation, models that generate text from scratch one token at a time are currently the dominant paradigm. Despite being performant, these models lack the ability to revise existing text, which limits their usability in many practical scenarios. We look to address this, with DiffusER (Diffusion via Edit-based Reconstruction), a new edit-based generative model for text based on denoising diffusion models -- a class of models that use a Markov chain of denoising steps to incrementally generate data. DiffusER is not only a strong generative model in general, rivalling autoregressive models on several tasks spanning machine translation, summarization, and style transfer; it can also perform other varieties of generation that standard autoregressive models are not well-suited for. For instance, we demonstrate that DiffusER makes it possible for a user to condition generation on a prototype, or an incomplete sequence, and continue revising based on previous edit steps.

artificial intelligence, machine learning, natural language, (18 more...)

2210.16886

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > Nevada > Clark County > Las Vegas (0.05)
North America > United States > New Jersey > Middlesex County > Sayreville (0.05)
(5 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.90)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.69)

#artificialintelligenceNov-21-2021, 03:35:05 GMT

Top AI-Powered Photo Editing Tools In 2022

There are currently several AI-based image editing apps available that can not only edit images to meet specific requirements, such as removing backgrounds or enhancing colors but also do so quickly. As a result, post-processing time is reduced to a bare minimum. This artificial intelligence-based image editing software uses an algorithm based on machine learning and neural networks to completely change the look of images, rather than simply overlaying them as regular filters do. Sketch quickly became the go-to UI design app among professionals worldwide after its release in 2010. Although many competitors have chipped away at its market share since then, its position as the industry standard has remained relatively stable to this day.

background, prototype, top ai-powered photo editing tool, (11 more...)

Industry: Media > Photography (0.73)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.53)

#artificialintelligenceOct-1-2020, 22:00:29 GMT

Luminar AI Uses Human-Inspired Artificial Intelligence for A Faster Editing Experience

With traditional photo editors, creating the perfect photo is a time-consuming process that involves moving dozens of sliders. Many seek to use presets to speed this up, but there are severe limitations. Presets tend to only work on images that are virtually identical to the original. To change this tedious and frustrating process, innovative companies race to embrace Artificial Intelligence. But some creatives have been skeptical about its effectiveness and limitations.

ai use human-inspired artificial intelligence, artificial intelligence, template, (13 more...)

Technology: Information Technology > Artificial Intelligence (1.00)