Semantic Properties of cosine based bias scores for word embeddings

Schröder, Sarah, Schulz, Alexander, Hinder, Fabian, Hammer, Barbara

Jan-27-2024–arXiv.org Artificial Intelligence

In the domain of Natural Language Processing (NLP), many works have investigated social biases in terms of associations in the embeddings space. Early works [1, 2] introduced methods to measure and mitigate social biases based on cosine similarity in word embeddigs. With NLP research progressing to large language models and contextualized embeddings, doubts have been raised whether these methods are still suitable for fairness evaluation [3] and other works criticize that for instance the Word Embedding Association Test (WEAT) [2] fails to detect some kinds of biases [4, 5]. Overall there exists a great deal of bias measures in the literature, which not necessarily detect the same biases [6, 4, 5]. In general, researchers are questioning the usability of model intrinsic bias measures, such as cosine based methods [7, 8, 9]. There exist few papers that compare the performance of different bias scores [10, 11] and works that evaluate experimental setups for bias measurement [12]. However, to our knowledge, only two works investigate the properties of intrinsic bias scores on a theoretical level [5, 13]. To further close this gap, we evaluate the semantic properties of cosine based bias scores, focusing on bias quantification as opposed to bias detection. We make the following contributions: (i) We formalize the properties of trustworthiness and comparability as requirements for cosine based bias scores.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

Jan-27-2024

arXiv.org PDF

Add feedback

Country:
- Europe > Germany (0.14)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)