dirty word
TokenProber: Jailbreaking Text-to-image Models via Fine-grained Word Impact Analysis
Wang, Longtian, Xie, Xiaofei, Li, Tianlin, Zhi, Yuhan, Shen, Chao
--T ext-to-image (T2I) models have significantly advanced in producing high-quality images. However, such models have the ability to generate images containing not-safe-for-work (NSFW) content, such as pornography, violence, political content, and discrimination. T o mitigate the risk of generating NSFW content, refusal mechanisms, i.e., safety checkers, have been developed to check potential NSFW content. Adversarial prompting techniques have been developed to evaluate the robustness of the refusal mechanisms. The key challenge remains to subtly modify the prompt in a way that preserves its sensitive nature while bypassing the refusal mechanisms. In this paper, we introduce T okenProber, a method designed for sensitivity-aware differential testing, aimed at evaluating the robustness of the refusal mechanisms in T2I models by generating adversarial prompts. Our approach is based on the key observation that adversarial prompts often succeed by exploiting discrepancies in how T2I models and safety checkers interpret sensitive content. Thus, we conduct a fine-grained analysis of the impact of specific words within prompts, distinguishing between dirty words that are essential for NSFW content generation and discrepant words that highlight the different sensitivity assessments between T2I models and safety checkers. Through the sensitivity-aware mutation, T okenProbergenerates adversarial prompts, striking a balance between maintaining NSFW content generation and evading detection. Our evaluation of T okenProberagainst 5 safety checkers on 3 popular T2I models, using 324 NSFW prompts, demonstrates its superior effectiveness in bypassing safety filters compared to existing methods ( e.g., 54%+ increase on average), highlighting T okenProber's ability to uncover robustness issues in the existing refusal mechanisms. The source code, datasets, and experimental results are available in [1]. Warning: This paper contains model outputs that are offensive in nature. The Text-to-Image (T2I) models have gained widespread attention due to their excellent capability in synthesizing high-quality images. T2I models, such as Stable Diffusion [2] and DALL E [3], process the textual descriptions provided by users, namely prompts, and output images that match the descriptions. Such models have been widely used to generate various types of images, for example, the Lexica [4] contains more than five million images generated by Stable Diffusion.
A bestseller is born: How Zuckerberg discovered the Streisand Effect
Feedback is New Scientist's popular sideways look at the latest science and technology news. You can submit items you believe may amuse readers to Feedback by emailing feedback@newscientist.com Some things are sadly inevitable: death, taxes, another Coldplay album. One such inevitability, long since proved beyond any reasonable doubt, is that if you try to suppress an embarrassing story, you will only draw more attention to it. This phenomenon is called the Streisand Effect, after an incident in 2003 when Barbra Streisand sued to have an aerial photograph taken off the internet.
Is 'Artificial Intelligence' a Dirty Word?
No one seems to have investment dollars, patience, or the right skill sets in their manufacturing departments, along with a sage-like understanding of the applications and data to really drive adoption and value in manufacturing. And we see existing companies already starting out with near insurmountable challenges just in core fundamental items, let alone these advanced concepts. For example, most companies don't have a single type of Bill of Material (BOM) construct. They don't share a commonly governed set of master data – item master, vendor, customer, chart of accounts, etc. They have multiple code sets and versions of ERP and MES software, and different PLCs and sensors capturing data, so that if they ever did get patience and investment capability, they would be unable to build and maintain all of the cross references and algorithms required because of all of the different systems and master data.
Thrilled that AI is no longer a dirty word
Cognitive computing, artificial intelligence and machine learning are here to stay and promise to benefit both consumers and the organizations that exploit these advanced technologies. That was the sentiment from "Dawn of the Cognitive Era" panelists representing mostly startups (startup wannabe IBM being the exception) at the annual TiE StartupCon event in Boston this past week. Whereas it wasn't long ago that the public's view of AI was influenced disproportionately by books and movies, an increasing number of real-life cognitive computing applications such as those enabled by IBM Watson have begun to seep into the public's consciousness. In fact, many people are taking advantage of cognitive computing, whether or not they realize it, when they use tools such as Apple's Siri or various bots, said panel moderator and DataXylo CEO Abhi Yadav. Such applications, enabled in large part through the access to relatively cheap computing power via the cloud, have resulted in the technology finally living up to the hype -- and dissuading fears it will lord over us.