Kariyappa, Sanjay
Interpretable LLM-based Table Question Answering
Giang, null, Nguyen, null, Brugere, Ivan, Sharma, Shubham, Kariyappa, Sanjay, Nguyen, Anh Totti, Lecue, Freddy
Interpretability for Table Question Answering (Table QA) is critical, particularly in high-stakes industries like finance or healthcare. Although recent approaches using Large Language Models (LLMs) have significantly improved Table QA performance, their explanations for how the answers are generated are ambiguous. To fill this gap, we introduce Plan-of-SQLs ( or POS), an interpretable, effective, and efficient approach to Table QA that answers an input query solely with SQL executions. Through qualitative and quantitative evaluations with human and LLM judges, we show that POS is most preferred among explanation methods, helps human users understand model decision boundaries, and facilitates model success and error identification. Furthermore, when evaluated in standard benchmarks (TabFact, WikiTQ, and FetaQA), POS achieves competitive or superior accuracy compared to existing methods, while maintaining greater efficiency by requiring significantly fewer LLM calls and database queries.
Progressive Inference: Explaining Decoder-Only Sequence Classification Models Using Intermediate Predictions
Kariyappa, Sanjay, Lรฉcuรฉ, Freddy, Mishra, Saumitra, Pond, Christopher, Magazzeni, Daniele, Veloso, Manuela
This paper proposes Progressive Inference - a framework to compute input attributions to explain the predictions of decoder-only sequence classification models. Our work is based on the insight that the classification head of a decoder-only Transformer model can be used to make intermediate predictions by evaluating them at different points in the input sequence. Due to the causal attention mechanism, these intermediate predictions only depend on the tokens seen before the inference point, allowing us to obtain the model's prediction on a masked input sub-sequence, with negligible computational overheads. We develop two methods to provide sub-sequence level attributions using this insight. First, we propose Single Pass-Progressive Inference (SP-PI), which computes attributions by taking the difference between consecutive intermediate predictions. Second, we exploit a connection with Kernel SHAP to develop Multi Pass-Progressive Inference (MP-PI). MP-PI uses intermediate predictions from multiple masked versions of the input to compute higher quality attributions. Our studies on a diverse set of models trained on text classification tasks show that SP-PI and MP-PI provide significantly better attributions compared to prior work.
Privacy-Preserving Algorithmic Recourse
Pentyala, Sikha, Sharma, Shubham, Kariyappa, Sanjay, Lecue, Freddy, Magazzeni, Daniele
When individuals are subject to adverse outcomes from machine learning models, providing a recourse path to help achieve a positive outcome is desirable. Recent work has shown that counterfactual explanations - which can be used as a means of single-step recourse - are vulnerable to privacy issues, putting an individuals' privacy at risk. Providing a sequential multi-step path for recourse can amplify this risk. Furthermore, simply adding noise to recourse paths found from existing methods can impact the realism and actionability of the path for an end-user. In this work, we address privacy issues when generating realistic recourse paths based on instance-based counterfactual explanations, and provide PrivRecourse: an end-to-end privacy preserving pipeline that can provide realistic recourse paths. PrivRecourse uses differentially private (DP) clustering to represent non-overlapping subsets of the private dataset. These DP cluster centers are then used to generate recourse paths by forming a graph with cluster centers as the nodes, so that we can generate realistic - feasible and actionable - recourse paths. We empirically evaluate our approach on finance datasets and compare it to simply adding noise to data instances, and to using DP synthetic data, to generate the graph. We observe that PrivRecourse can provide paths that are private and realistic.
SHAP@k:Efficient and Probably Approximately Correct (PAC) Identification of Top-k Features
Kariyappa, Sanjay, Tsepenekas, Leonidas, Lรฉcuรฉ, Freddy, Magazzeni, Daniele
The SHAP framework provides a principled method to explain the predictions of a model by computing feature importance. Motivated by applications in finance, we introduce the Top-k Identification Problem (TkIP), where the objective is to identify the k features with the highest SHAP values. While any method to compute SHAP values with uncertainty estimates (such as KernelSHAP and SamplingSHAP) can be trivially adapted to solve TkIP, doing so is highly sample inefficient. The goal of our work is to improve the sample efficiency of existing methods in the context of solving TkIP. Our key insight is that TkIP can be framed as an Explore-m problem--a well-studied problem related to multi-armed bandits (MAB). This connection enables us to improve sample efficiency by leveraging two techniques from the MAB literature: (1) a better stopping-condition (to stop sampling) that identifies when PAC (Probably Approximately Correct) guarantees have been met and (2) a greedy sampling scheme that judiciously allocates samples between different features. By adopting these methods we develop KernelSHAP@k and SamplingSHAP@k to efficiently solve TkIP, offering an average improvement of $5\times$ in sample-efficiency and runtime across most common credit related datasets.
Information Flow Control in Machine Learning through Modular Model Architecture
Tiwari, Trishita, Gururangan, Suchin, Guo, Chuan, Hua, Weizhe, Kariyappa, Sanjay, Gupta, Udit, Xiong, Wenjie, Maeng, Kiwan, Lee, Hsien-Hsin S., Suh, G. Edward
In today's machine learning (ML) models, any part of the training data can affect its output. This lack of control for information flow from training data to model output is a major obstacle in training models on sensitive data when access control only allows individual users to access a subset of data. To enable secure machine learning for access controlled data, we propose the notion of information flow control for machine learning, and develop a secure Transformer-based language model based on the Mixture-of-Experts (MoE) architecture. The secure MoE architecture controls information flow by limiting the influence of training data from each security domain to a single expert module, and only enabling a subset of experts at inference time based on an access control policy. The evaluation using a large corpus of text data shows that the proposed MoE architecture has minimal (1.9%) performance overhead and can significantly improve model accuracy (up to 37%) by enabling training on access-controlled data.
Bounding the Invertibility of Privacy-preserving Instance Encoding using Fisher Information
Maeng, Kiwan, Guo, Chuan, Kariyappa, Sanjay, Suh, G. Edward
Privacy-preserving instance encoding aims to encode raw data as feature vectors without revealing their privacy-sensitive information. When designed properly, these encodings can be used for downstream ML applications such as training and inference with limited privacy risk. However, the vast majority of existing instance encoding schemes are based on heuristics and their privacy-preserving properties are only validated empirically against a limited set of attacks. In this paper, we propose a theoretically-principled measure for the privacy of instance encoding based on Fisher information. We show that our privacy measure is intuitive, easily applicable, and can be used to bound the invertibility of encodings both theoretically and empirically.
Gradient Inversion Attack: Leaking Private Labels in Two-Party Split Learning
Kariyappa, Sanjay, Qureshi, Moinuddin K
Split learning is a popular technique used to perform vertical federated learning, where the goal is to jointly train a model on the private input and label data held by two parties. To preserve privacy of the input and label data, this technique uses a split model and only requires the exchange of intermediate representations (IR) of the inputs and gradients of the IR between the two parties during the learning process. In this paper, we propose Gradient Inversion Attack (GIA), a label leakage attack that allows an adversarial input owner to learn the label owner's private labels by exploiting the gradient information obtained during split learning. GIA frames the label leakage attack as a supervised learning problem by developing a novel loss function using certain key properties of the dataset and models. Our attack can uncover the private label data on several multi-class image classification problems and a binary conversion prediction task with near-perfect accuracy (97.01% - 99.96%), demonstrating that split learning provides negligible privacy benefits to the label owner. Furthermore, we evaluate the use of gradient noise to defend against GIA. While this technique is effective for simpler datasets, it significantly degrades utility for datasets with higher input dimensionality. Our findings underscore the need for better privacy-preserving training techniques for vertically split data.
Enabling Inference Privacy with Adaptive Noise Injection
Kariyappa, Sanjay, Dia, Ousmane, Qureshi, Moinuddin K
User-facing software services are becoming increasingly reliant on remote servers to host Deep Neural Network (DNN) models, which perform inference tasks for the clients. Such services require the client to send input data to the service provider, who processes it using a DNN and returns the output predictions to the client. Due to the rich nature of the inputs such as images and speech, the input often contains more information than what is necessary to perform the primary inference task. Consequently, in addition to the primary inference task, a malicious service provider could infer secondary (sensitive) attributes from the input, compromising the client's privacy. The goal of our work is to improve inference privacy by injecting noise to the input to hide the irrelevant features that are not conducive to the primary classification task. T o this end, we propose Adaptive Noise Injection (ANI), which uses a light-weight DNN on the client-side to inject noise to each input, before transmitting it to the service provider to perform inference. Our key insight is that by customizing the noise to each input, we can achieve state-of-the-art trade-off between utility and privacy (up to 48. 5% degradation in sensitive-task accuracy with 1% degradation in primary accuracy), significantly outperforming existing noise injection schemes. Our method does not require prior knowledge of the sensitive attributes and incurs minimal computational overheads.
MAZE: Data-Free Model Stealing Attack Using Zeroth-Order Gradient Estimation
Kariyappa, Sanjay, Prakash, Atul, Qureshi, Moinuddin
Model Stealing (MS) attacks allow an adversary with black-box access to a Machine Learning model to replicate its functionality, compromising the confidentiality of the model. Such attacks train a clone model by using the predictions of the target model for different inputs. The effectiveness of such attacks relies heavily on the availability of data necessary to query the target model. Existing attacks either assume partial access to the dataset of the target model or availability of an alternate dataset with semantic similarities. This paper proposes MAZE -- a data-free model stealing attack using zeroth-order gradient estimation. In contrast to prior works, MAZE does not require any data and instead creates synthetic data using a generative model. Inspired by recent works in data-free Knowledge Distillation (KD), we train the generative model using a disagreement objective to produce inputs that maximize disagreement between the clone and the target model. However, unlike the white-box setting of KD, where the gradient information is available, training a generator for model stealing requires performing black-box optimization, as it involves accessing the target model under attack. MAZE relies on zeroth-order gradient estimation to perform this optimization and enables a highly accurate MS attack. Our evaluation with four datasets shows that MAZE provides a normalized clone accuracy in the range of 0.91x to 0.99x, and outperforms even the recent attacks that rely on partial data (JBDA, clone accuracy 0.13x to 0.69x) and surrogate data (KnockoffNets, clone accuracy 0.52x to 0.97x). We also study an extension of MAZE in the partial-data setting and develop MAZE-PD, which generates synthetic data closer to the target distribution. MAZE-PD further improves the clone accuracy (0.97x to 1.0x) and reduces the query required for the attack by 2x-24x.
Improving Adversarial Robustness of Ensembles with Diversity Training
Kariyappa, Sanjay, Qureshi, Moinuddin K.
Deep Neural Networks are vulnerable to adversarial attacks even in settings where the attacker has no direct access to the model being attacked. Such attacks usually rely on the principle of transferability, whereby an attack crafted on a surrogate model tends to transfer to the target model. We show that an ensemble of models with misaligned loss gradients can provide an effective defense against transfer-based attacks. Our key insight is that an adversarial example is less likely to fool multiple models in the ensemble if their loss functions do not increase in a correlated fashion. To this end, we propose Diversity Training, a novel method to train an ensemble of models with uncorrelated loss functions. We show that our method significantly improves the adversarial robustness of ensembles and can also be combined with existing methods to create a stronger defense.