transformed
Self-HarmLLM: Can Large Language Model Harm Itself?
Kim, Heehwan, Park, Sungjune, Choi, Daeseon
Large Language Models (LLMs) are generally equipped with guardrails to block the generation of harmful responses. However, existing defenses always assume that an external attacker crafts the harmful query, and the possibility of a model's own output becoming a new attack vector has not been sufficiently explored. In this study, we propose the Self-HarmLLM scenario, which uses a Mitigated Harmful Query (MHQ) generated by the same model as a new input. An MHQ is an ambiguous query whose original intent is preserved while its harmful nature is not directly exposed. We verified whether a jailbreak occurs when this MHQ is re-entered into a separate session of the same model. We conducted experiments on GPT-3.5-turbo, LLaMA3-8B-instruct, and DeepSeek-R1-Distill-Qwen-7B under Base, Zero-shot, and Few-shot conditions. The results showed up to 52% transformation success rate and up to 33% jailbreak success rate in the Zero-shot condition, and up to 65% transformation success rate and up to 41% jailbreak success rate in the Few-shot condition. By performing both prefix-based automated evaluation and human evaluation, we found that the automated evaluation consistently overestimated jailbreak success, with an average difference of 52%. This indicates that automated evaluation alone is not accurate for determining harmfulness. While this study is a toy-level study based on a limited query set and evaluators, it proves that our method can still be a valid attack scenario. These results suggest the need for a fundamental reconsideration of guardrail design and the establishment of a more robust evaluation methodology.
Reviews: Distinguishing Distributions When Samples Are Strategically Transformed
The paper introduces a new model of strategic classification. Given some type-dependent feature distribution, agents can transform these features into "signals" according to some background graph. The principal then classifies agents based on the signals. The model is a departure from many recent models of strategic classification, which frame agents as maximizing utility subject to a cost function penalty, and also deals with the setting of repeated samples from agents rather than a one-shot game. This has the benefit of elucidating the importance of differences in the agents' initial feature distribution (in terms of DTV) that may be intuitively true, but has not been captured in recent work.
Tabular data generation with tensor contraction layers and transformers
Silva, Aníbal, Restivo, André, Santos, Moisés, Soares, Carlos
Generative modeling for tabular data has recently gained significant attention in the Deep Learning domain. Its objective is to estimate the underlying distribution of the data. However, estimating the underlying distribution of tabular data has its unique challenges. Specifically, this data modality is composed of mixed types of features, making it a non-trivial task for a model to learn intra-relationships between them. One approach to address mixture is to embed each feature into a continuous matrix via tokenization, while a solution to capture intra-relationships between variables is via the transformer architecture. In this work, we empirically investigate the potential of using embedding representations on tabular data generation, utilizing tensor contraction layers and transformers to model the underlying distribution of tabular data within Variational Autoencoders. Specifically, we compare four architectural approaches: a baseline VAE model, two variants that focus on tensor contraction layers and transformers respectively, and a hybrid model that integrates both techniques. Our empirical study, conducted across multiple datasets from the OpenML CC18 suite, compares models over density estimation and Machine Learning efficiency metrics. The main takeaway from our results is that leveraging embedding representations with the help of tensor contraction layers improves density estimation metrics, albeit maintaining competitive performance in terms of machine learning efficiency.
Two-in-One: A Model Hijacking Attack Against Text Generation Models
Si, Wai Man, Backes, Michael, Zhang, Yang, Salem, Ahmed
Machine learning has progressed significantly in various applications ranging from face recognition to text generation. However, its success has been accompanied by different attacks. Recently a new attack has been proposed which raises both accountability and parasitic computing risks, namely the model hijacking attack. Nevertheless, this attack has only focused on image classification tasks. In this work, we broaden the scope of this attack to include text generation and classification models, hence showing its broader applicability. More concretely, we propose a new model hijacking attack, Ditto, that can hijack different text classification tasks into multiple generation ones, e.g., language translation, text summarization, and language modeling. We use a range of text benchmark datasets such as SST-2, TweetEval, AGnews, QNLI, and IMDB to evaluate the performance of our attacks. Our results show that by using Ditto, an adversary can successfully hijack text generation models without jeopardizing their utility.
How Is Data Quality Management Being Transformed by AI and ML?
Technology has risen to prominence in recent years, both at work and at home. The fields of artificial intelligence (AI) and machine learning (ML) are advancing at a rapid pace right now. Almost everyone's everyday life will be impacted by AI in some way. Siri, Google Maps, Netflix, and social media (Facebook/Snapchat) are just a few examples. Artificial Intelligence and Machine Learning (ML) are two buzzwords that are frequently used interchangeably.
How AI Is Being Transformed by 'Foundation Models'
In the world of computer science and artificial intelligence, few topics are generating as much interest as the rise of so-called "foundation models." These models can be thought of as meta-AI--but not Meta-AI, if you see what I mean--systems that incorporate vast neural networks with even bigger datasets. They are able to process a lot but, more importantly, they are easily adaptable across information domain areas, shortening and simplifying what has previously been a laborious process of training AI systems. If foundation models fulfill their promise, it could bring AI into much broader commercial use. To give a sense of the scale of these algorithms, GPT-3, a foundation model for natural language processing released two years ago, contains upwards of 170 billion parameters, the variables that guide functions within a model.
Council Post: 16 Business And Industry Functions Being Transformed By AI
From healthcare to manufacturing to retail, nearly every industry has been touched by artificial intelligence. Consumers may think businesses primarily use AI for targeted marketing (and some do), but there are actually many functions across industries being impacted by the technology. AI is helping companies protect employees and customers, maintain their stock, develop new products and services and more. So how is AI working "behind the scenes" to help companies and, by extension, the clients and customers they serve? Below, 16 members of Forbes Technology Council share industry functions that are being improved or taken over by artificial intelligence.
3 Mistakes That Transformed My Machine Learning Career
Let's face it, we all make mistakes. Mistakes can sometimes be costly but in them lie our greatest life lessons, and oftentimes, a major opportunity for growth. We humans, naturally try to avoid making mistakes, since for so long, mistakes have been associated with negative modes such as pain and failure. Despite our courageous efforts to avoid them, we always end up making mistakes anyways, therefore, it's better if we simply embrace them. Part of embracing my mistakes involves sharing them.
3 Mistakes That Transformed My Machine Learning Career
Let's face it, we all make mistakes. Mistakes can sometimes be costly but in them lie our greatest life lessons, and oftentimes, a major opportunity for growth. We humans, naturally try to avoid making mistakes, since for so long, mistakes have been associated with negative modes such as pain and failure. Despite our courageous efforts to avoid them, we always end up making mistakes anyways, therefore, it's better if we simply embrace them. Part of embracing my mistakes involves sharing them.
How Artificial Intelligence Has Transformed The eCommerce World
Artificial intelligence is sweeping through the eCommerce sector. In fact, marketers report that there's been a 186% increase in AI adoption since 2018. Not only that, but 79% of organizations say that marketing and sales AI has increased revenue for their companies. But it's not just profits that have been transformed by artificial intelligence. The way in which eCommerce companies operate has seen monumental innovations, thanks to AI. Artificial intelligence (AI) has crept into all corners of eCommerce, from customer support to manufacturing.