Goto

Collaborating Authors

 socio-economic status


Impoverished Language Technology: The Lack of (Social) Class in NLP

arXiv.org Artificial Intelligence

Since Labov's (1964) foundational work on the social stratification of language, linguistics has dedicated concerted efforts towards understanding the relationships between socio-demographic factors and language production and perception. Despite the large body of evidence identifying significant relationships between socio-demographic factors and language production, relatively few of these factors have been investigated in the context of NLP technology. While age and gender are well covered, Labov's initial target, socio-economic class, is largely absent. We survey the existing Natural Language Processing (NLP) literature and find that only 20 papers even mention socio-economic status. However, the majority of those papers do not engage with class beyond collecting information of annotator-demographics. Given this research lacuna, we provide a definition of class that can be operationalised by NLP researchers, and argue for including socio-economic class in future language technologies.


Navigating Fairness Measures and Trade-Offs

arXiv.org Artificial Intelligence

One of the main risks accompanying the use of artificial intelligence in decision making is that the algorithms that are used are biased, and as a result can lead to unfair outcomes (Pessach and Shmueli, 2020). In particular, artificial intelligence is prone to (unintentionally) indirectly discriminate against certain groups. Machine learning systems (a type of AI) are fitted to data and find patterns in that data in order to predict a target variable. In doing so, they often use correlations present in the data (e.g. between ethnicity and zip codes, as with segregated neighbourhoods the zip code is a good predictor for ethnicity) to select on a problematic property (ethnicity) not directly but through the use of information on an unproblematic property (zip codes). This means that often these systems do not have direct access to variables that would be unfair to select on, but they still produce outputs that would lead to unfair treatment of certain groups. Put more precisely, indirect discrimination is the situation where a group A (e.g.


Debiasing Recommendation by Learning Identifiable Latent Confounders

arXiv.org Artificial Intelligence

Recommendation systems aim to predict users' feedback on items not exposed to them. Confounding bias arises due to the presence of unmeasured variables (e.g., the socio-economic status of a user) that can affect both a user's exposure and feedback. Existing methods either (1) make untenable assumptions about these unmeasured variables or (2) directly infer latent confounders from users' exposure. However, they cannot guarantee the identification of counterfactual feedback, which can lead to biased predictions. In this work, we propose a novel method, i.e., identifiable deconfounder (iDCF), which leverages a set of proxy variables (e.g., observed user features) to resolve the aforementioned non-identification issue. The proposed iDCF is a general deconfounded recommendation framework that applies proximal causal inference to infer the unmeasured confounders and identify the counterfactual feedback with theoretical guarantees. Extensive experiments on various real-world and synthetic datasets verify the proposed method's effectiveness and robustness.