Nichelle and Nancy: The Influence of Demographic Attributes and Tokenization Length on First Name Biases
–arXiv.org Artificial Intelligence
Through the use of first name substitution experiments, prior research has demonstrated the tendency of social commonsense reasoning models to systematically exhibit social biases along the dimensions of race, ethnicity, and gender (An et al., 2023). Demographic attributes of first names, however, are strongly correlated with corpus frequency and tokenization length, which may influence model behavior independent of or in addition to demographic factors. In this paper, we conduct a new series of first name substitution experiments that measures the influence of these factors while controlling for the others. We find that demographic attributes of a name (race, ethnicity, and gender) and name tokenization length are both factors that systematically affect the behavior of social commonsense reasoning models.
arXiv.org Artificial Intelligence
May-25-2023
- Country:
- Asia
- China > Hong Kong (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Europe
- Croatia (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- North America
- Dominican Republic (0.04)
- United States
- Alaska (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Maryland > Prince George's County
- College Park (0.04)
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- New York > New York County
- New York City (0.04)
- South America > Chile
- Asia
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Education > Educational Setting > Online (0.67)
- Technology: