Goto

Collaborating Authors

 macaronic prompting


The 'Nonsense Language' That Could Subvert Image Synthesis Moderation Systems

#artificialintelligence

New research from Columbia university suggests that the safeguards that prevent image synthesis models such as DALL-E 2, Imagen and Parti from being able to output damaging or controversial imagery are susceptible to a kind of adversarial attack that involves'made up' words. The author has developed two approaches that can potentially override the content moderation measures in an image synthesis system, and has found that they are remarkably robust even across different architectures, indicating that the weakness is more than just systemic, and may key on some of the most fundamental principle of text-to-image synthesis. The first, and the stronger of the two, is called macaronic prompting. The term'macaronic' originally refers to a mixture of multiple languages, as found in Esperanto or Unwinese. Perhaps the most culturally-diffused example would be Urdu-English, a type of'code mixing' common in Pakistan, which quite freely mixes English nouns and Urdu suffixes.


Adversarial Attacks on Image Generation With Made-Up Words

Millière, Raphaël

arXiv.org Artificial Intelligence

Text-guided image generation models have made impressive strides in recent years. State-of-the-art models, like DALL-E 2 [1], Imagen [2], and Parti [3], can generate coherent images matching a remarkably wide variety of prompts in virtually any visual domain and style. While the ability to generate high-quality images of any subject is an exciting development for content creation, it also raises ethical questions about potential misuse of this technology. In particular, text-guided image generation models may be used to produce fake imagery of existing individuals for misinformation (so-called "deepfakes" [4]), or produce visual content deemed offensive or harmful. These concerns have been used to justify the decision to limit access to large text-guided image generation models, as well as moderate their use according to content policies implemented in prompt filters.