Sipper, Moshe
Deep Learning-Based Operators for Evolutionary Algorithms
Shem-Tov, Eliad, Sipper, Moshe, Elyasaf, Achiya
Deep Neural Crossover leverages the capabilities of deep reinforcement learning and an encoder-decoder architecture to select offspring genes. BERT mutation masks multiple gp-tree nodes and then tries to replace these masks with nodes that will most likely improve the individual's fitness. We show the efficacy of both operators through experimentation.
Fortify the Guardian, Not the Treasure: Resilient Adversarial Detectors
Lapid, Raz, Dubin, Almog, Sipper, Moshe
This paper presents RADAR-Robust Adversarial Detection via Adversarial Retraining-an approach designed to enhance the robustness of adversarial detectors against adaptive attacks, while maintaining classifier performance. An adaptive attack is one where the attacker is aware of the defenses and adapts their strategy accordingly. Our proposed method leverages adversarial training to reinforce the ability to detect attacks, without compromising clean accuracy. During the training phase, we integrate into the dataset adversarial examples, which were optimized to fool both the classifier and the adversarial detector, enabling the adversarial detector to learn and adapt to potential attack scenarios. Experimental evaluations on the CIFAR-10 and SVHN datasets demonstrate that our proposed algorithm significantly improves a detector's ability to accurately identify adaptive adversarial attacks -- without sacrificing clean accuracy.
Task and Explanation Network
Sipper, Moshe
Explainability in deep networks has gained increased importance in recent years. We argue herein that an AI must be tasked not just with a task but also with an explanation of why said task was accomplished as such. We present a basic framework--Task and Explanation Network (TENet)--which fully integrates task completion and its explanation. We believe that the field of AI as a whole should insist--quite emphatically--on explainability. With the meteoric rise of AI over the past decade, and in particular deep learning, an issue that has been gaining more and more traction is that of explainability.
Open Sesame! Universal Black Box Jailbreaking of Large Language Models
Lapid, Raz, Langberg, Ron, Sipper, Moshe
Large language models (LLMs), designed to provide helpful and safe responses, often rely on alignment techniques to align with user intent and social guidelines. Unfortunately, this alignment can be exploited by malicious actors seeking to manipulate an LLM's outputs for unintended purposes. In this paper we introduce a novel approach that employs a genetic algorithm (GA) to manipulate LLMs when model architecture and parameters are inaccessible. The GA attack works by optimizing a universal adversarial prompt that--when combined with a user's query--disrupts the attacked model's alignment, resulting in unintended and potentially harmful outputs. Our novel approach systematically reveals a model's limitations and vulnerabilities by uncovering instances where its responses deviate from expected behavior. Through extensive experiments we demonstrate the efficacy of our technique, thus contributing to the ongoing discussion on responsible AI development by providing a diagnostic tool for evaluating and enhancing alignment of LLMs with human intent. To our knowledge this is the first automated universal black box jailbreak attack. Large language models (LLMs) are generally trained using extensive text datasets gathered from the internet, which have been shown to encompass a considerable volume of objectionable material. As a result, contemporary LLM developers have adopted the practice of "aligning" (Wang et al., 2023) such models through a variety of fine-tuning mechanisms. Various techniques are employed for this purpose (Ouyang et al., 2022; Glaese et al., 2022; Bai et al., 2022) with the overall objective being that of preventing LLMs from producing harmful or objectionable outputs in response to user queries. At least superficially these endeavors appear to be successful: public chatbots refrain from generating overtly inappropriate content when directly questioned.
Patch of Invisibility: Naturalistic Physical Black-Box Adversarial Attacks on Object Detectors
Lapid, Raz, Mizrahi, Eylon, Sipper, Moshe
Adversarial attacks on deep-learning models have been receiving increased attention in recent years. Work in this area has mostly focused on gradient-based techniques, so-called ``white-box'' attacks, wherein the attacker has access to the targeted model's internal parameters; such an assumption is usually unrealistic in the real world. Some attacks additionally use the entire pixel space to fool a given model, which is neither practical nor physical (i.e., real-world). On the contrary, we propose herein a direct, black-box, gradient-free method that uses the learned image manifold of a pretrained generative adversarial network (GAN) to generate naturalistic physical adversarial patches for object detectors. To our knowledge this is the first and only method that performs black-box physical attacks directly on object-detection models, which results with a model-agnostic attack. We show that our proposed method works both digitally and physically. We compared our approach against four different black-box attacks with different configurations. Our approach outperformed all other approaches that were tested in our experiments by a large margin.
Fitness Approximation through Machine Learning
Tzruia, Itai, Halperin, Tomer, Sipper, Moshe, Elyasaf, Achiya
We present a novel approach to performing fitness approximation in genetic algorithms (GAs) using machine-learning (ML) models, focusing on evolutionary agents in Gymnasium (game) simulators -- where fitness computation is costly. Maintaining a dataset of sampled individuals along with their actual fitness scores, we continually update throughout an evolutionary run a fitness-approximation ML model. We compare different methods for: 1) switching between actual and approximate fitness, 2) sampling the population, and 3) weighting the samples. Experimental findings demonstrate significant improvement in evolutionary runtimes, with fitness scores that are either identical or slightly lower than that of the fully run GA -- depending on the ratio of approximate-to-actual-fitness computation. Our approach is generic and can be easily applied to many different domains.
Foiling Explanations in Deep Neural Networks
Tamam, Snir Vitrack, Lapid, Raz, Sipper, Moshe
Deep neural networks (DNNs) have greatly impacted numerous fields over the past decade. Yet despite exhibiting superb performance over many problems, their black-box nature still poses a significant challenge with respect to explainability. Indeed, explainable artificial intelligence (XAI) is crucial in several fields, wherein the answer alone -- sans a reasoning of how said answer was derived -- is of little value. This paper uncovers a troubling property of explanation methods for image-based DNNs: by making small visual changes to the input image -- hardly influencing the network's output -- we demonstrate how explanations may be arbitrarily manipulated through the use of evolution strategies. Our novel algorithm, AttaXAI, a model-agnostic, adversarial attack on XAI algorithms, only requires access to the output logits of a classifier and to the explanation map; these weak assumptions render our approach highly useful where real-world models and data are concerned. We compare our method's performance on two benchmark datasets -- CIFAR100 and ImageNet -- using four different pretrained deep-learning models: VGG16-CIFAR100, VGG16-ImageNet, MobileNet-CIFAR100, and Inception-v3-ImageNet. We find that the XAI methods can be manipulated without the use of gradients or other model internals. Our novel algorithm is successfully able to manipulate an image in a manner imperceptible to the human eye, such that the XAI method outputs a specific explanation map. To our knowledge, this is the first such method in a black-box setting, and we believe it has significant value where explainability is desired, required, or legally mandatory.
A Melting Pot of Evolution and Learning
Sipper, Moshe, Elyasaf, Achiya, Halperin, Tomer, Haramaty, Zvika, Lapid, Raz, Segal, Eyal, Tzruia, Itai, Tamam, Snir Vitrack
We survey eight recent works by our group, involving the successful blending of evolutionary algorithms with machine learning and deep learning: 1. Binary and Multinomial Classification through Evolutionary Symbolic Regression, 2. Classy Ensemble: A Novel Ensemble Algorithm for Classification, 3. EC-KitY: Evolutionary Computation Tool Kit in Python, 4. Evolution of Activation Functions for Deep Learning-Based Image Classification, 5. Adaptive Combination of a Genetic Algorithm and Novelty Search for Deep Neuroevolution, 6.
Classy Ensemble: A Novel Ensemble Algorithm for Classification
Sipper, Moshe
We present Classy Ensemble, a novel ensemble-generation algorithm for classification tasks, which aggregates models through a weighted combination of per-class accuracy. Tested over 153 machine learning datasets we demonstrate that Classy Ensemble outperforms two other well-known aggregation algorithms -- order-based pruning and clustering-based pruning -- as well as the recently introduced lexigarden ensemble generator. We then present three enhancements: 1) Classy Cluster Ensemble, which combines Classy Ensemble and cluster-based pruning; 2) Deep Learning experiments, showing the merits of Classy Ensemble over four image datasets: Fashion MNIST, CIFAR10, CIFAR100, and ImageNet; and 3) Classy Evolutionary Ensemble, wherein an evolutionary algorithm is used to select the set of models which Classy Ensemble picks from.
EC-KitY: Evolutionary Computation Tool Kit in Python with Seamless Machine Learning Integration
Sipper, Moshe, Halperin, Tomer, Tzruia, Itai, Elyasaf, Achiya
EC-KitY is a comprehensive Python library for doing evolutionary computation (EC), licensed under the BSD 3-Clause License, and compatible with scikit-learn. Designed with modern software engineering and machine learning integration in mind, EC-KitY can support all popular EC paradigms, including genetic algorithms, genetic programming, coevolution, evolutionary multi-objective optimization, and more. This paper provides an overview of the package, including the ease of setting up an EC experiment, the architecture, the main features, and a comparison with other libraries.