Linguistic Fingerprint in Transformer Models: How Language Variation Influences Parameter Selection in Irony Detection

Mastromattei, Michele, Zanzotto, Fabio Massimo

Jun-4-2024–arXiv.org Artificial Intelligence

Sentiment analysis datasets, particularly those annotated on crowdsourcing platforms, may contain biases due to the lack of information about the cultural backgrounds of the annotators. This can lead to machine learning models trained on this data amplifying these biases, affecting how people perceive and label sentiment. Although these models can capture general sentiment, they often fail to capture the nuances experienced by different groups. This paper examines the impact of linguistic diversity on transformer models designed for irony detection. Using the EPIC corpus [1], we created five subsets tailored to different variations of English. We trained different transformer models and used the KEN pruning algorithm [2] to extract the minimum subset of optimal parameters that maintain the original performance of the model. We conducted this experimental process across five transformer architectures, revealing a minimum parameter overlap of 60% among resulting subnetworks. We then performed a comprehensive analysis to identify subnetworks with the highest and lowest similarity.

linguistic variation, matrix, variation, (14 more...)

arXiv.org Artificial Intelligence

Jun-4-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - United States (0.14)
  - Canada > Ontario
    - Toronto (0.04)
- Europe > Italy
  - Lazio > Rome (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology
  - Communications > Social Media (1.00)
  - Artificial Intelligence
    - Natural Language (1.00)
    - Machine Learning > Neural Networks
      - Deep Learning (0.49)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found