Detecting value-expressive text posts in Russian social media
Milkova, Maria, Rudnev, Maksim, Okolskaya, Lidia
–arXiv.org Artificial Intelligence
Basic values are concepts or beliefs which pertain to desirable end-states and transcend specific situations. Studying personal values in social media can illuminate how and why societal values evolve especially when the stimuli-based methods, such as surveys, are inefficient, for instance, in hard-to-reach populations. On the other hand, user-generated content is driven by the massive use of stereotyped, culturally defined speech constructions rather than authentic expressions of personal values. We aimed to find a model that can accurately detect value-expressive posts in Russian social media VKontakte. A training dataset of 5,035 posts was annotated by three experts, 304 crowd-workers and ChatGPT. Crowd-workers and experts showed only moderate agreement in categorizing posts. ChatGPT was more consistent but struggled with spam detection. We applied an ensemble of human- and AI-assisted annotation involving active learning approach, subsequently trained several LLMs and selected a model based on embeddings from pre-trained fine-tuned rubert-tiny2, and reached a high quality of value detection with F1 = 0.75 (F1-macro = 0.80). This model provides a crucial step to a study of values within and between Russian social media users.
arXiv.org Artificial Intelligence
Dec-14-2023
- Country:
- Asia
- Japan (0.04)
- Middle East > Jordan (0.04)
- Atlantic Ocean (0.04)
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Ireland (0.04)
- Russia > Volga Federal District
- Belgium > Brussels-Capital Region
- North America
- Canada > Ontario
- Toronto (0.04)
- United States
- California > San Diego County
- San Diego (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Michigan (0.04)
- New Mexico > Santa Fe County
- Santa Fe (0.04)
- New York (0.04)
- Texas > Travis County
- Austin (0.14)
- Wisconsin > Dane County
- Madison (0.04)
- California > San Diego County
- Canada > Ontario
- Asia
- Genre:
- Research Report > New Finding (0.67)
- Workflow (0.87)
- Industry:
- Information Technology (0.94)
- Technology: