Linguistic Fingerprint in Transformer Models: How Language Variation Influences Parameter Selection in Irony Detection
Mastromattei, Michele, Zanzotto, Fabio Massimo
–arXiv.org Artificial Intelligence
Sentiment analysis datasets, particularly those annotated on crowdsourcing platforms, may contain biases due to the lack of information about the cultural backgrounds of the annotators. This can lead to machine learning models trained on this data amplifying these biases, affecting how people perceive and label sentiment. Although these models can capture general sentiment, they often fail to capture the nuances experienced by different groups. This paper examines the impact of linguistic diversity on transformer models designed for irony detection. Using the EPIC corpus [1], we created five subsets tailored to different variations of English. We trained different transformer models and used the KEN pruning algorithm [2] to extract the minimum subset of optimal parameters that maintain the original performance of the model. We conducted this experimental process across five transformer architectures, revealing a minimum parameter overlap of 60% among resulting subnetworks. We then performed a comprehensive analysis to identify subnetworks with the highest and lowest similarity.
arXiv.org Artificial Intelligence
Jun-4-2024
- Country:
- Asia > Middle East
- Jordan (0.04)
- Europe > Italy
- North America
- Canada > Ontario
- Toronto (0.04)
- United States (0.14)
- Canada > Ontario
- Asia > Middle East
- Genre:
- Research Report (1.00)
- Technology: