What is in a name? Mitigating Name Bias in Text Embeddings via Anonymization
Manchanda, Sahil, Shivaswamy, Pannaga
–arXiv.org Artificial Intelligence
Text-embedding models often exhibit biases arising from the data on which they are trained. In this paper, we examine a hitherto unexplored bias in text-embeddings: bias arising from the presence of $\textit{names}$ such as persons, locations, organizations etc. in the text. Our study shows how the presence of $\textit{name-bias}$ in text-embedding models can potentially lead to erroneous conclusions in assessment of thematic similarity.Text-embeddings can mistakenly indicate similarity between texts based on names in the text, even when their actual semantic content has no similarity or indicate dissimilarity simply because of the names in the text even when the texts match semantically. We first demonstrate the presence of name bias in different text-embedding models and then propose $\textit{text-anonymization}$ during inference which involves removing references to names, while preserving the core theme of the text. The efficacy of the anonymization approach is demonstrated on two downstream NLP tasks, achieving significant performance gains. Our simple and training-optimization-free approach offers a practical and easily implementable solution to mitigate name bias.
arXiv.org Artificial Intelligence
Feb-5-2025
- Country:
- Africa
- Lesotho (0.04)
- Mali (0.04)
- Burkina Faso (0.04)
- Ethiopia (0.04)
- Niger (0.04)
- Kenya (0.04)
- Cabo Verde (0.04)
- Botswana (0.04)
- The Gambia (0.04)
- Nigeria (0.04)
- Central African Republic (0.04)
- Ghana (0.04)
- Eswatini (0.04)
- Namibia (0.04)
- Middle East
- Eritrea (0.04)
- Mauritania (0.04)
- Rwanda (0.04)
- Mozambique (0.04)
- Gabon (0.04)
- Equatorial Guinea (0.04)
- Madagascar (0.04)
- Malawi (0.04)
- Liberia (0.04)
- Angola (0.04)
- Senegal (0.04)
- Cameroon (0.04)
- Burundi (0.04)
- Guinea-Bissau (0.04)
- Mauritius (0.04)
- Comoros (0.04)
- Benin (0.04)
- Asia
- Pakistan (0.04)
- Brunei (0.04)
- Cambodia (0.04)
- Nepal (0.04)
- Mongolia (0.04)
- Malaysia (0.04)
- North Korea (0.04)
- Kazakhstan (0.04)
- Bangladesh (0.04)
- Bhutan (0.04)
- Azerbaijan (0.04)
- Japan (0.04)
- Indonesia (0.04)
- Laos (0.04)
- Middle East
- Maldives (0.04)
- Philippines (0.04)
- Russia (0.04)
- China (0.04)
- Armenia (0.04)
- Myanmar (0.04)
- Kyrgyzstan (0.04)
- India (0.04)
- Afghanistan (0.04)
- Europe
- Belarus (0.04)
- North Macedonia (0.14)
- Hungary (0.04)
- Portugal (0.04)
- Moldova (0.04)
- Ireland (0.04)
- Czechia (0.04)
- Croatia (0.04)
- Kosovo > District of Pristina
- Pristina (0.04)
- Belgium (0.04)
- Romania (0.04)
- Estonia (0.04)
- Lithuania (0.04)
- Latvia (0.04)
- Middle East
- Greece (0.04)
- San Marino (0.04)
- Russia (0.04)
- Italy (0.04)
- France (0.04)
- Serbia (0.04)
- Norway (0.04)
- Monaco (0.04)
- Albania (0.04)
- Finland (0.04)
- Denmark (0.04)
- Iceland (0.04)
- Netherlands (0.04)
- Spain (0.04)
- Germany (0.04)
- Poland (0.04)
- Andorra (0.04)
- Liechtenstein (0.04)
- Bulgaria (0.04)
- Austria (0.04)
- Bosnia and Herzegovina (0.04)
- Montenegro (0.04)
- North America
- Saint Vincent and the Grenadines (0.04)
- Cuba (0.04)
- Haiti (0.04)
- The Bahamas (0.04)
- Saint Lucia (0.04)
- El Salvador (0.04)
- Canada (0.04)
- United States
- New Jersey (0.04)
- New York (0.04)
- Jamaica (0.04)
- Mexico (0.04)
- Saint Kitts and Nevis (0.04)
- Barbados (0.04)
- Dominica (0.04)
- Guatemala (0.04)
- Nicaragua (0.04)
- Belize (0.04)
- Honduras (0.04)
- Antigua and Barbuda (0.04)
- Dominican Republic (0.04)
- Costa Rica (0.04)
- Panama (0.04)
- Oceania
- Australia (0.04)
- Kiribati (0.04)
- New Zealand (0.04)
- Fiji (0.04)
- Marshall Islands (0.04)
- Samoa (0.04)
- Papua New Guinea (0.04)
- Nauru (0.04)
- Micronesia (0.04)
- Palau (0.04)
- South America
- Argentina (0.04)
- Paraguay (0.04)
- French Guiana (0.04)
- Brazil (0.14)
- Colombia (0.14)
- Bolivia (0.14)
- Suriname (0.04)
- Ecuador (0.14)
- Guyana (0.14)
- Chile > Santiago Metropolitan Region
- Santiago Province > Santiago (0.04)
- Venezuela (0.04)
- Peru (0.14)
- Africa
- Genre:
- Research Report > New Finding (0.93)
- Industry:
- Health & Medicine (1.00)
- Technology: