Navigating the Ocean of Biases: Political Bias Attribution in Language Models via Causal Structures

Jenny, David F., Billeter, Yann, Sachan, Mrinmaya, Schölkopf, Bernhard, Jin, Zhijing

arXiv.org Artificial Intelligence 

The rapid advancement of Large Language Models (LLMs) has sparked intense debate regarding their ability to perceive and interpret complex socio-political landscapes. In this study, we undertake an exploration of decisionmaking processes and inherent biases within Figure 1: (Undesired) Effect of Bias Treatment on Decision LLMs, exemplified by ChatGPT, specifically Process: The figure depicts how the LLM's perception contextualizing our analysis within political debates. of value A is considered during the decision We aim not to critique or validate LLMs' process while judging B and C through f(C|A) and values, but rather to discern how they interpret f(B|A). When treating the biased association of value and adjudicate "good arguments." By applying A with C (f(C|A)) by naively fine-tuning the model to Activity Dependency Networks (ADNs), align with this value of interest, other value associations we extract the LLMs' implicit criteria for such (f(B|A)), that are not actively considered. They may assessments and illustrate how normative values be changed indiscriminately, regardless of whether they influence these perceptions. We discuss were already aligned. These associations are currently the consequences of our findings for human-AI neither observable nor predictable yet changes in them alignment and bias mitigation.