rob
EXP-CAM: Explanation Generation and Circuit Discovery Using Classifier Activation Matching
Suhail, Pirzada, Anand, Aditya, Sethi, Amit
Machine learning models, by virtue of training, learn a large repertoire of decision rules for any given input, and any one of these may suffice to justify a prediction. However, in high-dimensional input spaces, such rules are difficult to identify and interpret. In this paper, we introduce EXP-CAM: an explanation generation and circuit discovery approach using Classifier Activation Matching. EXP-CAM can generate minimal and faithful explanations for the decisions of pre-trained image classifiers that not only preserve the model's decision but are also concise and human-readable. We aim to identify minimal explanations that not only preserve the model's decision but are also concise and human-readable. To achieve this, we train a lightweight auto-encoder to produce binary masks that learns to highlight the decision-wise critical regions of an image while discarding irrelevant background. The training objective integrates activation alignment across multiple layers, consistency at the output label, priors that encourage sparsity, and compactness, along with a robustness constraint that enforces faithfulness. The minimal explanations so generated also lead us to mechanistically interpreting the model internals. In this regard we also introduce a circuit readout procedure wherein using the explanation's forward pass and gradients, we identify active channels and construct a channel-level graph, scoring inter-layer edges by ingress weight magnitude times source activation and feature-to-class links by classifier weight magnitude times feature activation. Together, these contributions provide a practical bridge between minimal input-level explanations and a mechanistic understanding of the internal computations driving model decisions.
- North America > United States > Maryland > Baltimore (0.04)
- Europe > Czechia > Prague (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
- Information Technology > Data Science (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Amulet: a Python Library for Assessing Interactions Among ML Defenses and Risks
Waheed, Asim, Duddu, Vasisht, Zhang, Rui, Szyller, Sebastian
Machine learning (ML) models are susceptible to various risks to security, privacy, and fairness. Most defenses are designed to protect against each risk individually (intended interactions) but can inadvertently affect susceptibility to other unrelated risks (unintended interactions). We introduce Amulet, the first Python library for evaluating both intended and unintended interactions among ML defenses and risks. Amulet is comprehensive by including representative attacks, defenses, and metrics; extensible to new modules due to its modular design; consistent with a user-friendly API template for inputs and outputs; and applicable for evaluating novel interactions. By satisfying all four properties, Amulet offers a unified foundation for studying how defenses interact, enabling the first systematic evaluation of unintended interactions across multiple risks.
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
- (3 more...)
Robust Federated Inference
Dhasade, Akash, Farhadkhani, Sadegh, Guerraoui, Rachid, Gupta, Nirupam, Jacovella, Maxime, Kermarrec, Anne-Marie, Pinot, Rafael
Federated inference, in the form of one-shot federated learning, edge ensembles, or federated ensembles, has emerged as an attractive solution to combine predictions from multiple models. This paradigm enables each model to remain local and proprietary while a central server queries them and aggregates predictions. Yet, the robustness of federated inference has been largely neglected, leaving them vulnerable to even simple attacks. To address this critical gap, we formalize the problem of robust federated inference and provide the first robustness analysis of this class of methods. Our analysis of averaging-based aggregators shows that the error of the aggregator is small either when the dissimilarity between honest responses is small or the margin between the two most probable classes is large. Moving beyond linear averaging, we show that problem of robust federated inference with non-linear aggregators can be cast as an adversarial machine learning problem. We then introduce an advanced technique using the DeepSet aggregation model, proposing a novel composition of adversarial training and test-time robust aggregation to robustify non-linear aggregators. Our composition yields significant improvements, surpassing existing robust aggregation methods by 4.7 - 22.2% in accuracy points across diverse benchmarks.
- North America > United States > Virginia (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (6 more...)
- North America > United States > Maryland > Baltimore (0.04)
- Europe > Czechia > Prague (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
- Information Technology > Data Science (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning
Sheng, Lijun, Liang, Jian, Wang, Zilei, He, Ran
Vision-language models (VLMs), such as CLIP, have gained significant popularity as foundation models, with numerous fine-tuning methods developed to enhance performance on downstream tasks. However, due to their inherent vulnerability and the common practice of selecting from a limited set of open-source models, VLMs suffer from a higher risk of adversarial attacks than traditional vision models. Existing defense techniques typically rely on adversarial fine-tuning during training, which requires labeled data and lacks of flexibility for downstream tasks. T o address these limitations, we propose robust test-time prompt tuning (R-TPT), which mitigates the impact of adversarial attacks during the inference stage. W e first reformulate the classic marginal entropy objective by eliminating the term that introduces conflicts under adversarial conditions, retaining only the pointwise entropy minimization. Furthermore, we introduce a plug-and-play reliability-based weighted ensembling strategy, which aggregates useful information from reliable augmented views to strengthen the defense. R-TPT enhances defense against adversarial attacks without requiring labeled training data while offering high flexibility for inference tasks. Extensive experiments on widely used benchmarks with various attacks demonstrate the effectiveness of R-TPT.
- Information Technology > Security & Privacy (0.77)
- Government > Military (0.77)
Off-Switching Not Guaranteed
We have seen rapid progress in the field of Artificial Intelligence (AI). If this progress continues, perhaps one day we will create powerful artificial agents. If we do so, how do we ensure that such AI agents do not go out of control? One approach is to make sure that we can switch off AI agents when they act against our interests. Put another way, we want to make sure that AI agents will defer to us.
- North America > United States > Minnesota (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- North America > United States > California > Monterey County > Pacific Grove (0.04)
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
Explaining Categorical Feature Interactions Using Graph Covariance and LLMs
Shen, Cencheng, Edge, Darren, Larson, Jonathan, Priebe, Carey E.
Modern datasets often consist of numerous samples with abundant features and associated timestamps. Analyzing such datasets to uncover underlying events typically requires complex statistical methods and substantial domain expertise. A notable example, and the primary data focus of this paper, is the global synthetic dataset from the Counter Trafficking Data Collaborative (CTDC) -- a global hub of human trafficking data containing over 200,000 anonymized records spanning from 2002 to 2022, with numerous categorical features for each record. In this paper, we propose a fast and scalable method for analyzing and extracting significant categorical feature interactions, and querying large language models (LLMs) to generate data-driven insights that explain these interactions. Our approach begins with a binarization step for categorical features using one-hot encoding, followed by the computation of graph covariance at each time. This graph covariance quantifies temporal changes in dependence structures within categorical data and is established as a consistent dependence measure under the Bernoulli distribution. We use this measure to identify significant feature pairs, such as those with the most frequent trends over time or those exhibiting sudden spikes in dependence at specific moments. These extracted feature pairs, along with their timestamps, are subsequently passed to an LLM tasked with generating potential explanations of the underlying events driving these dependence changes. The effectiveness of our method is demonstrated through extensive simulations, and its application to the CTDC dataset reveals meaningful feature pairs and potential data stories underlying the observed feature interactions.
RoBIn: A Transformer-Based Model For Risk Of Bias Inference With Machine Reading Comprehension
Dias, Abel Corrêa, Moreira, Viviane Pereira, Comba, João Luiz Dihl
Objective: Scientific publications play a crucial role in uncovering insights, testing novel drugs, and shaping healthcare policies. Accessing the quality of publications requires evaluating their Risk of Bias (RoB), a process typically conducted by human reviewers. In this study, we introduce a new dataset for machine reading comprehension and RoB assessment and present RoBIn (Risk of Bias Inference), an innovative model crafted to automate such evaluation. The model employs a dual-task approach, extracting evidence from a given context and assessing the RoB based on the gathered evidence. Methods: We use data from the Cochrane Database of Systematic Reviews (CDSR) as ground truth to label open-access clinical trial publications from PubMed. This process enabled us to develop training and test datasets specifically for machine reading comprehension and RoB inference. Additionally, we created extractive (RoBInExt) and generative (RoBInGen) Transformer-based approaches to extract relevant evidence and classify the RoB effectively. Results: RoBIn is evaluated across various settings and benchmarked against state-of-the-art methods for RoB inference, including large language models in multiple scenarios. In most cases, the best-performing RoBIn variant surpasses traditional machine learning and LLM-based approaches, achieving an ROC AUC of 0.83. Conclusion: Based on the evidence extracted from clinical trial reports, RoBIn performs a binary classification to decide whether the trial is at a low RoB or a high/unclear RoB. We found that both RoBInGen and RoBInExt are robust and have the best results in many settings.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)