A meta-analysis on the performance of machine-learning based language models for sentiment analysis

Rohde, Elena, Klingwort, Jonas, Borgs, Christian

arXiv.org Artificial Intelligence 

Social media is a valuable data source for social science research, particularly in analyzing public sentiment during events with considerable social impact (Wang et al. 2021). However, the large volume of text data makes evaluation challenging. Sentiment analysis, using Natural Language Processing, extracts attitudes and emotions from text to classify content into categories like positive, negative, or neutral (Govin-darajan 2022). Sentiment analysis methods fall into lexicon-based and machine-learning approaches, with the latter preferred for social media due to higher accuracy (Hartmann et al. 2019; V erma and Jain 2022). Machine learning strategies vary by algorithm and feature extraction, making overall performance evaluation challenging. This raises questions about algorithm effectiveness and the factors influencing variability. Identifying study characteristics and potential variability sources is crucial for setting realistic performance expectations (Hartmann et al. 2023). This paper contributes to the literature by conducting a systematic literature review, followed by a meta-analysis and meta-regression, to explain the variation in the performance outcomes of machine learning algorithms in the context of social media data sentiment analysis. The results provide evidence of the factors contributing to the varying performance of different machine-learning algorithms in sentiment analysis.