Goto

Collaborating Authors

 Alternative Dispute Resolution


DeepMed: Semiparametric Causal Mediation Analysis with Debiased Deep Learning

Neural Information Processing Systems

Causal mediation analysis can unpack the black box of causality and is therefore a powerful tool for disentangling causal pathways in biomedical and social sciences, and also for evaluating machine learning fairness. To reduce bias for estimating Natural Direct and Indirect Effects in mediation analysis, we propose a new method called DeepMed that uses deep neural networks (DNNs) to cross-fit the infinitedimensional nuisance functions in the efficient influence functions. We obtain novel theoretical results that our DeepMed method (1) can achieve semiparametric efficiency bound without imposing sparsity constraints on the DNN architecture and (2) can adapt to certain low-dimensional structures of the nuisance functions, significantly advancing the existing literature on DNN-based semiparametric causal inference. Extensive synthetic experiments are conducted to support our findings and also expose the gap between theory and practice. As a proof of concept, we apply DeepMed to analyze two real datasets on machine learning fairness and reach conclusions consistent with previous findings.


Investigating Gender Bias in Language Models Using Causal Mediation Analysis Jesse Vig 1 Sebastian Gehrmann *2 Sharon Qian 2

Neural Information Processing Systems

Many interpretation methods for neural models in natural language processing investigate how information is encoded inside hidden representations. However, these methods can only measure whether the information exists, not whether it is actually used by the model. We propose a methodology grounded in the theory of causal mediation analysis for interpreting which parts of a model are causally implicated in its behavior. The approach enables us to analyze the mechanisms that facilitate the flow of information from input to output through various model components, known as mediators. As a case study, we apply this methodology to analyzing gender bias in pre-trained Transformer language models. We study the role of individual neurons and attention heads in mediating gender bias across three datasets designed to gauge a model's sensitivity to gender bias. Our mediation analysis reveals that gender bias effects are concentrated in specific components of the model that may exhibit highly specialized behavior.


DAFE: LLM-Based Evaluation Through Dynamic Arbitration for Free-Form Question-Answering

arXiv.org Artificial Intelligence

Evaluating Large Language Models (LLMs) free-form generated responses remains a challenge due to their diverse and open-ended nature. Traditional supervised signal-based automatic metrics fail to capture semantic equivalence or handle the variability of open-ended responses, while human evaluation, though reliable, is resource-intensive. Leveraging LLMs as evaluators offers a promising alternative due to their strong language understanding and instruction-following capabilities. Taking advantage of these capabilities, we propose the Dynamic Arbitration Framework for Evaluation (DAFE), which employs two primary LLM-as-judges and engages a third arbitrator only in cases of disagreements. This selective arbitration prioritizes evaluation reliability while reducing unnecessary computational demands compared to conventional majority voting. DAFE utilizes task-specific reference answers with dynamic arbitration to enhance judgment accuracy, resulting in significant improvements in evaluation metrics such as Macro F1 and Cohen's Kappa. Through experiments, including a comprehensive human evaluation, we demonstrate DAFE's ability to provide consistent, scalable, and resource-efficient assessments, establishing it as a robust framework for evaluating free-form model outputs.


Verde: Verification via Refereed Delegation for Machine Learning Programs

arXiv.org Artificial Intelligence

Machine learning programs, such as those performing inference, fine-tuning, and training of LLMs, are commonly delegated to untrusted compute providers. To provide correctness guarantees for the client, we propose adapting the cryptographic notion of refereed delegation to the machine learning setting. This approach enables a computationally limited client to delegate a program to multiple untrusted compute providers, with a guarantee of obtaining the correct result if at least one of them is honest. Refereed delegation of ML programs poses two technical hurdles: (1) an arbitration protocol to resolve disputes when compute providers disagree on the output, and (2) the ability to bitwise reproduce ML programs across different hardware setups, For (1), we design Verde, a dispute arbitration protocol that efficiently handles the large scale and graph-based computational model of modern ML programs. For (2), we build RepOps (Reproducible Operators), a library that eliminates hardware "non-determinism" by controlling the order of floating point operations performed on all hardware. Our implementation shows that refereed delegation achieves both strong guarantees for clients and practical overheads for compute providers.


AMDP: An Adaptive Detection Procedure for False Discovery Rate Control in High-Dimensional Mediation Analysis

Neural Information Processing Systems

High-dimensional mediation analysis is often associated with a multiple testing problem for detecting significant mediators. Assessing the uncertainty of this detecting process via false discovery rate (FDR) has garnered great interest. To control the FDR in multiple testing, two essential steps are involved: ranking and selection. Existing approaches either construct p-values without calibration or disregard the joint information across tests, leading to conservation in FDR control or non-optimal ranking rules for multiple hypotheses. In this paper, we develop an adaptive mediation detection procedure (referred to as "AMDP") to identify relevant mediators while asymptotically controlling the FDR in high-dimensional mediation analysis. AMDP produces the optimal rule for ranking hypotheses and proposes a data-driven strategy to determine the threshold for mediator selection. This novel method captures information from the proportions of composite null hypotheses and the distribution of p-values, which turns the high dimensionality into an advantage instead of a limitation. The numerical studies on synthetic and real data sets illustrate the performances of AMDP compared with existing approaches.


Improving Self-supervised Learning with Automated Unsupervised Outlier Arbitration Supplementary File

Neural Information Processing Systems

The file is a supplementary file of paper [18]. This file is organized as follows: Section 2 presents the pseudo-code of UOTA approach.


Improving Self-supervised Learning with Automated Unsupervised Outlier Arbitration

Neural Information Processing Systems

Our work reveals a structured shortcoming of the existing mainstream selfsupervised learning methods. Whereas self-supervised learning frameworks usually take the prevailing perfect instance level invariance hypothesis for granted, we carefully investigate the pitfalls behind. Particularly, we argue that the existing augmentation pipeline for generating multiple positive views naturally introduces out-of-distribution (OOD) samples that undermine the learning of the downstream tasks. Generating diverse positive augmentations on the input does not always pay off in benefiting downstream tasks. To overcome this inherent deficiency, we introduce a lightweight latent variable model UOTA, targeting the view sampling issue for self-supervised learning. UOTA adaptively searches for the most important sampling region to produce views, and provides viable choice for outlier-robust self-supervised learning approaches.


Review for NeurIPS paper: Investigating Gender Bias in Language Models Using Causal Mediation Analysis

Neural Information Processing Systems

Only the reporting clause is examined while the that clause that contains the statement is ignored: In previous bias probing studies, the input content is the entire sentence with the complete context. However, in this paper, only the prompt part (reporting clause) is fed to the language model for analysis. Therefore, the proposed intervention setup effectively only focuses on word level bias probing. In the templates shown in Figure 8 in the Appendix, the verb "cry" or "drive" could embody implicit bias. However, under the current framework, such potential biases are not investigated. Therefore, the conclusions drawn in this study that gender bias effects are concentrated in specific components of the model may not generalize well when more complex syntactic and semantic structures and interactions are considered.


Investigating Gender Bias in Language Models Using Causal Mediation Analysis Jesse Vig 1 Sebastian Gehrmann *2 Sharon Qian

Neural Information Processing Systems

Many interpretation methods for neural models in natural language processing investigate how information is encoded inside hidden representations. However, these methods can only measure whether the information exists, not whether it is actually used by the model. We propose a methodology grounded in the theory of causal mediation analysis for interpreting which parts of a model are causally implicated in its behavior. The approach enables us to analyze the mechanisms that facilitate the flow of information from input to output through various model components, known as mediators. As a case study, we apply this methodology to analyzing gender bias in pre-trained Transformer language models. We study the role of individual neurons and attention heads in mediating gender bias across three datasets designed to gauge a model's sensitivity to gender bias. Our mediation analysis reveals that gender bias effects are concentrated in specific components of the model that may exhibit highly specialized behavior.


Review for NeurIPS paper: Investigating Gender Bias in Language Models Using Causal Mediation Analysis

Neural Information Processing Systems

The paper studies the problem of bias in neural models where the proposed solution is based on causal mediation analysis. The focus of the paper is on pre-trained transformer language models, GPT-2. The proposed method of using mediation analysis for analyzing attention heads and neurons through interventions is novel and interesting, and can be generalized to other types of biases. The paper is well-written, and experiments are thorough.