perturbation strategy
Fine-Tuning on Noisy Instructions: Effects on Generalization and Performance
Alajrami, Ahmed, Tan, Xingwei, Aletras, Nikolaos
Instruction-tuning plays a vital role in enhancing the task-solving abilities of large language models (LLMs), improving their usability in generating helpful responses on various tasks. However, previous work has demonstrated that they are sensitive to minor variations in instruction phrasing. In this paper, we explore whether introducing perturbations in instruction-tuning data can enhance LLMs' resistance against noisy instructions. We focus on how instruction-tuning with perturbations, such as removing stop words or shuffling words, affects LLMs' performance on the original and perturbed versions of widely-used benchmarks (MMLU, BBH, GSM8K). We further assess learning dynamics and potential shifts in model behavior. Surprisingly, our results suggest that instruction-tuning on perturbed instructions can, in some cases, improve downstream performance. These findings highlight the importance of including perturbed instructions in instruction-tuning, which can make LLMs more resilient to noisy user inputs.
Class-Dependent Perturbation Effects in Evaluating Time Series Attributions
Baer, Gregor, Grau, Isel, Zhang, Chao, Van Gorp, Pieter
As machine learning models become increasingly prevalent in time series applications, Explainable Artificial Intelligence (XAI) methods are essential for understanding their predictions. Within XAI, feature attribution methods aim to identify which input features contributed the most to a model's prediction, with their evaluation typically relying on perturbation-based metrics. Through empirical analysis across multiple datasets, model architectures, and perturbation strategies, we identify important class-dependent effects in these metrics: they show varying effectiveness across classes, achieving strong results for some while remaining less sensitive to others. In particular, we find that the most effective perturbation strategies often demonstrate the most pronounced class differences. Our analysis suggests that these effects arise from the learned biases of classifiers, indicating that perturbation-based evaluation may reflect specific model behaviors rather than intrinsic attribution quality. We propose an evaluation framework with a class-aware penalty term to help assess and account for these effects in evaluating feature attributions. Although our analysis focuses on time series classification, these class-dependent effects likely extend to other structured data domains where perturbation-based evaluation is common.
Iterated Local Search with Linkage Learning
Tinós, Renato, Przewozniczek, Michal W., Whitley, Darrell, Chicano, Francisco
In pseudo-Boolean optimization, a variable interaction graph represents variables as vertices, and interactions between pairs of variables as edges. In black-box optimization, the variable interaction graph may be at least partially discovered by using empirical linkage learning techniques. These methods never report false variable interactions, but they are computationally expensive. The recently proposed local search with linkage learning discovers the partial variable interaction graph as a side-effect of iterated local search. However, information about the strength of the interactions is not learned by the algorithm. We propose local search with linkage learning 2, which builds a weighted variable interaction graph that stores information about the strength of the interaction between variables. The weighted variable interaction graph can provide new insights about the optimization problem and behavior of optimizers. Experiments with NK landscapes, knapsack problem, and feature selection show that local search with linkage learning 2 is able to efficiently build weighted variable interaction graphs. In particular, experiments with feature selection show that the weighted variable interaction graphs can be used for visualizing the feature interactions in machine learning. Additionally, new transformation operators that exploit the interactions between variables can be designed. We illustrate this ability by proposing a new perturbation operator for iterated local search.
Image-Feature Weak-to-Strong Consistency: An Enhanced Paradigm for Semi-Supervised Learning
Image-level weak-to-strong consistency serves as the predominant paradigm in semi-supervised learning (SSL) due to its simplicity and impressive performance. Nonetheless, this approach confines all perturbations to the image level and suffers from the excessive presence of naive samples, thus necessitating further improvement. In this paper, we introduce feature-level perturbation with varying intensities and forms to expand the augmentation space, establishing the image-feature weak-to-strong consistency paradigm. Furthermore, our paradigm develops a triple-branch structure, which facilitates interactions between both types of perturbations within one branch to boost their synergy. Additionally, we present a confidence-based identification strategy to distinguish between naive and challenging samples, thus introducing additional challenges exclusively for naive samples. Notably, our paradigm can seamlessly integrate with existing SSL methods. We apply the proposed paradigm to several representative algorithms and conduct experiments on multiple benchmarks, including both balanced and imbalanced distributions for labeled samples. The results demonstrate a significant enhancement in the performance of existing SSL algorithms.
Assessing Robustness of Machine Learning Models using Covariate Perturbations
R, Arun Prakash, Bhattacharyya, Anwesha, Vaughan, Joel, Nair, Vijayan N.
As machine learning models become increasingly prevalent in critical decision-making models and systems in fields like finance, healthcare, etc., ensuring their robustness against adversarial attacks and changes in the input data is paramount, especially in cases where models potentially overfit. This paper proposes a comprehensive framework for assessing the robustness of machine learning models through covariate perturbation techniques. We explore various perturbation strategies to assess robustness and examine their impact on model predictions, including separate strategies for numeric and non-numeric variables, summaries of perturbations to assess and compare model robustness across different scenarios, and local robustness diagnosis to identify any regions in the data where a model is particularly unstable. Through empirical studies on real world dataset, we demonstrate the effectiveness of our approach in comparing robustness across models, identifying the instabilities in the model, and enhancing model robustness.
Escaping Local Optima in Global Placement
Xue, Ke, Lin, Xi, Shi, Yunqi, Kai, Shixiong, Xu, Siyuan, Qian, Chao
Placement is crucial in the physical design, as it greatly affects power, performance, and area metrics. Recent advancements in analytical methods, such as DREAMPlace, have demonstrated impressive performance in global placement. However, DREAMPlace has some limitations, e.g., may not guarantee legalizable placements under the same settings, leading to fragile and unpredictable results. This paper highlights the main issue as being stuck in local optima, and proposes a hybrid optimization framework to efficiently escape the local optima, by perturbing the placement result iteratively. The proposed framework achieves significant improvements compared to state-of-the-art methods on two popular benchmarks.
Exploring the Adversarial Capabilities of Large Language Models
Struppek, Lukas, Le, Minh Hieu, Hintersdorf, Dominik, Kersting, Kristian
The proliferation of large language models (LLMs) has sparked widespread and general interest due to their strong language generation capabilities, offering great potential for both industry and research. While previous research delved into the security and privacy issues of LLMs, the extent to which these models can exhibit adversarial behavior remains largely unexplored. Addressing this gap, we investigate whether common publicly available LLMs have inherent capabilities to perturb text samples to fool safety measures, so-called adversarial examples resp.~attacks. More specifically, we investigate whether LLMs are inherently able to craft adversarial examples out of benign samples to fool existing safe rails. Our experiments, which focus on hate speech detection, reveal that LLMs succeed in finding adversarial perturbations, effectively undermining hate speech detection systems. Our findings carry significant implications for (semi-)autonomous systems relying on LLMs, highlighting potential challenges in their interaction with existing systems and safety measures.
Generating counterfactual explanations of tumor spatial proteomes to discover effective strategies for enhancing immune infiltration
Wang, Zitong Jerry, Xu, Alexander M., Bhargava, Aman, Thomson, Matt W.
While therapies for altering the immune composition, including immunotherapies, have shown exciting results for treating hematological cancers, they are less effective for immunologically-cold, solid tumors. Spatial omics technologies capture the spatial organization of the TME with unprecedented molecular detail, revealing the relationship between immune cell localization and molecular signals. Here, we formulate T-cell infiltration prediction as a self-supervised machine learning problem and develop a counterfactual optimization strategy that leverages large scale spatial omics profiles of patient tumors to design tumor perturbations predicted to boost T-cell infiltration. A convolutional neural network predicts T-cell distribution based on signaling molecules in the TME provided by imaging mass cytometry. Gradient-based counterfactual generation, then, computes perturbations predicted to boost T-cell abundance. We apply our framework to melanoma, colorectal cancer (CRC) liver metastases, and breast tumor data, discovering combinatorial perturbations predicted to support T-cell infiltration across tens to hundreds of patients. This work presents a paradigm for counterfactual-based prediction and design of cancer therapeutics using spatial omics data.
A Deep Dive into Perturbations as Evaluation Technique for Time Series XAI
Schlegel, Udo, Keim, Daniel A.
Explainable Artificial Intelligence (XAI) has gained significant attention recently as the demand for transparency and interpretability of machine learning models has increased. In particular, XAI for time series data has become increasingly important in finance, healthcare, and climate science. However, evaluating the quality of explanations, such as attributions provided by XAI techniques, remains challenging. This paper provides an in-depth analysis of using perturbations to evaluate attributions extracted from time series models. A perturbation analysis involves systematically modifying the input data and evaluating the impact on the attributions generated by the XAI method. We apply this approach to several state-of-the-art XAI techniques and evaluate their performance on three time series classification datasets. Our results demonstrate that the perturbation analysis approach can effectively evaluate the quality of attributions and provide insights into the strengths and limitations of XAI techniques. Such an approach can guide the selection of XAI methods for time series data, e.g., focusing on return time rather than precision, and facilitate the development of more reliable and interpretable machine learning models for time series analysis.
Diving Deep into Modes of Fact Hallucinations in Dialogue Systems
Das, Souvik, Saha, Sougata, Srihari, Rohini K.
Knowledge Graph(KG) grounded conversations often use large pre-trained models and usually suffer from fact hallucination. Frequently entities with no references in knowledge sources and conversation history are introduced into responses, thus hindering the flow of the conversation -- existing work attempt to overcome this issue by tweaking the training procedure or using a multi-step refining method. However, minimal effort is put into constructing an entity-level hallucination detection system, which would provide fine-grained signals that control fallacious content while generating responses. As a first step to address this issue, we dive deep to identify various modes of hallucination in KG-grounded chatbots through human feedback analysis. Secondly, we propose a series of perturbation strategies to create a synthetic dataset named FADE (FActual Dialogue Hallucination DEtection Dataset). Finally, we conduct comprehensive data analyses and create multiple baseline models for hallucination detection to compare against human-verified data and already established benchmarks.