AITopics | clip interrogator

Collaborating Authors

clip interrogator

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Reinforcement Learning-Based Prompt Template Stealing for Text-to-Image Models

Zou, Xiaotian

arXiv.org Artificial IntelligenceOct-2-2025

Multimodal Large Language Models (MLLMs) have transformed text-to-image workflows, allowing designers to create novel visual concepts with unprecedented speed. This progress has given rise to a thriving prompt trading market, where curated prompts that induce trademark styles are bought and sold. Although commercially attractive, prompt trading also introduces a largely unexamined security risk: the prompts themselves can be stolen. In this paper, we expose this vulnerability and present RLStealer, a reinforcement learning based prompt inversion framework that recovers its template from only a small set of example images. RLStealer treats template stealing as a sequential decision making problem and employs multiple similarity based feedback signals as reward functions to effectively explore the prompt space. Comprehensive experiments on publicly available benchmarks demonstrate that RLStealer gets state-of-the-art performance while reducing the total attack cost to under 13% of that required by existing baselines. Our further analysis confirms that RLStealer can effectively generalize across different image styles to efficiently steal unseen prompt templates. Our study highlights an urgent security threat inherent in prompt trading and lays the groundwork for developing protective standards in the emerging MLLMs marketplace.

large language model, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2510.00046

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (0.86)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models

Kim, Donghoon, Bae, Minji, Shim, Kyuhong, Shim, Byonghyo

arXiv.org Artificial IntelligenceJul-22-2025

Text-to-image generative models like DALL-E and Stable Diffusion have revolutionized visual content creation across various applications, including advertising, personalized media, and design prototyping. However, crafting effective textual prompts to guide these models remains challenging, often requiring extensive trial and error. Existing prompt inversion approaches, such as soft and hard prompt techniques, are not so effective due to the limited interpretability and incoherent prompt generation. To address these issues, we propose Visually Guided Decoding (VGD), a gradient-free approach that leverages large language models (LLMs) and CLIP-based guidance to generate coherent and semantically aligned prompts. In essence, VGD utilizes the robust text generation capabilities of LLMs to produce human-readable prompts. Further, by employing CLIP scores to ensure alignment with user-specified visual concepts, VGD enhances the interpretability, generalization, and flexibility of prompt generation without the need for additional training. Our experiments demonstrate that VGD outperforms existing prompt inversion techniques in generating understandable and contextually relevant prompts, facilitating more intuitive and controllable interactions with text-to-image models. Figure 1: Visually Guided Decoding ( VGD) works with any LLM without extra training, making it easy to integrate into a chat-based interface that offers interpretable and controllable text-to-image generation. In recent years, image generative models such as DALL-E and Stable Diffusion have shown remarkable success in generating high-fidelity images (Ramesh et al., 2022; Rombach et al., 2022; Podell et al., 2024). These models are widely used in a variety of applications, including visual content generation ( e.g., advertisement, movie, game), personalized content generation ( e.g., caricature, photo editing), and prototyping ( e.g., architecture and product design).

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.08622

Country: Europe > Switzerland (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.69)

Add feedback

Hidden traces of humanity: what AI images reveal about our world

The GuardianOct-1-2024, 04:00:22 GMT

When faced with a bit of downtime, many of my friends will turn to the same party game. It's based on the surrealist game Exquisite Corpse, and involves translating brief written descriptions into rapidly made drawings and back again. One group calls it Telephone Pictionary; another refers to it as Writey-Drawey. The internet tells me it is also called Eat Poop You Cat, a sequence of words surely inspired by one of the game's results. As recently as three years ago, it was rare to encounter text-to-image or image-to-text mistranslations in daily life, which made the outrageous outcomes of the game feel especially novel. But we have since entered a new era of image-making. With the aid of AI image generators like Dall-E 3, Stable Diffusion and Midjourney, and the generative features integrated into Adobe's Creative Cloud programs, you can now transform a sentence or phrase into a highly detailed image in mere seconds. Images, likewise, can be nearly instantly translated into descriptive text.

artist, illustration, neural network, (14 more...)

The Guardian

Country:

North America > United States > California > San Francisco County > San Francisco (0.04)
Asia > Middle East > Iraq (0.04)
Africa > Nigeria (0.04)

Industry:

Media > Photography (0.94)
Leisure & Entertainment (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.97)
Information Technology > Communications > Social Media (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.71)

Add feedback

Promptify: Text-to-Image Generation through Interactive Prompt Exploration with Large Language Models

Brade, Stephen, Wang, Bryan, Sousa, Mauricio, Oore, Sageev, Grossman, Tovi

arXiv.org Artificial IntelligenceApr-18-2023

Text-to-image generative models have demonstrated remarkable capabilities in generating high-quality images based on textual prompts. However, crafting prompts that accurately capture the user's creative intent remains challenging. It often involves laborious trial-and-error procedures to ensure that the model interprets the prompts in alignment with the user's intention. To address the challenges, we present Promptify, an interactive system that supports prompt exploration and refinement for text-to-image generative models. Promptify utilizes a suggestion engine powered by large language models to help users quickly explore and craft diverse prompts. Our interface allows users to organize the generated images flexibly, and based on their preferences, Promptify suggests potential changes to the original prompt. This feedback loop enables users to iteratively refine their prompts and enhance desired features while avoiding unwanted ones. Our user study shows that Promptify effectively facilitates the text-to-image workflow and outperforms an existing baseline tool widely used for text-to-image generation.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2304.09337

Country:

North America > Canada > Ontario > Toronto (0.47)
North America > United States > New York > New York County > New York City (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(8 more...)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Prompt Stealing Attacks Against Text-to-Image Generation Models

Shen, Xinyue, Qu, Yiting, Backes, Michael, Zhang, Yang

arXiv.org Artificial IntelligenceFeb-20-2023

Text-to-Image generation models have revolutionized the artwork design process and enabled anyone to create high-quality images by entering text descriptions called prompts. Creating a high-quality prompt that consists of a subject and several modifiers can be time-consuming and costly. In consequence, a trend of trading high-quality prompts on specialized marketplaces has emerged. In this paper, we propose a novel attack, namely prompt stealing attack, which aims to steal prompts from generated images by text-to-image generation models. Successful prompt stealing attacks direct violate the intellectual property and privacy of prompt engineers and also jeopardize the business model of prompt trading marketplaces. We first perform a large-scale analysis on a dataset collected by ourselves and show that a successful prompt stealing attack should consider a prompt's subject as well as its modifiers. We then propose the first learning-based prompt stealing attack, PromptStealer, and demonstrate its superiority over two baseline methods quantitatively and qualitatively. We also make some initial attempts to defend PromptStealer. In general, our study uncovers a new attack surface in the ecosystem created by the popular text-to-image generation models. We hope our results can help to mitigate the threat. To facilitate research in this field, we will share our dataset and code with the community.

artificial intelligence, machine learning, modifier, (18 more...)

arXiv.org Artificial Intelligence

2302.09923

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.87)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

This New AI Tool Feeds on Hurting the Egos of its Users

#artificialintelligenceOct-28-2022, 11:50:28 GMT

There is a trending new AI tool on the block, but instead of creating images, this AI tool analyzes them and spits out crude roasts of anyone they depict. Days are gone netizens leveraged DALL·E 2 and racist image-spewing DALL·E mini to generate silly art for Twitter shit posting. The new AI tool, known as the CLIP Interrogator and created by a generative artist who goes by the handle Pharma psychotic, is technically an artificial intelligence powered tool to discover "what a good prompt might be to generate new AI art like an existing one." In reality, CLIP Interrogator tends to spit out descriptions of people that can be mundane, puzzling, staggeringly insensitive, and, at times, admittedly a bit hilarious. The new AI tool told one user she looked "tired and drunk," for instance, and accused another user of having a "deformed face."

ai tool, clip interrogator, new ai tool feed, (7 more...)

#artificialintelligence

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.62)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.98)

Add feedback