Review for NeurIPS paper: Investigating Gender Bias in Language Models Using Causal Mediation Analysis

Neural Information Processing Systems 

Only the reporting clause is examined while the that clause that contains the statement is ignored: In previous bias probing studies, the input content is the entire sentence with the complete context. However, in this paper, only the prompt part (reporting clause) is fed to the language model for analysis. Therefore, the proposed intervention setup effectively only focuses on word level bias probing. In the templates shown in Figure 8 in the Appendix, the verb "cry" or "drive" could embody implicit bias. However, under the current framework, such potential biases are not investigated. Therefore, the conclusions drawn in this study that gender bias effects are concentrated in specific components of the model may not generalize well when more complex syntactic and semantic structures and interactions are considered.