Prompt-based Personality Profiling: Reinforcement Learning for Relevance Filtering
Hofmann, Jan, Sindermann, Cornelia, Klinger, Roman
–arXiv.org Artificial Intelligence
Author profiling is the task of inferring characteristics about individuals by analyzing content they share. Supervised machine learning still dominates automatic systems that perform this task, despite the popularity of prompting large language models to address natural language understanding tasks. One reason is that the classification instances consist of large amounts of posts, potentially a whole user profile, which may exceed the input length of Transformers. Even if a model can use a large context window, the entirety of posts makes the application of API-accessed black box systems costly and slow, next to issues which come with such "needle-in-the-haystack" tasks. To mitigate this limitation, we propose a new method for author profiling which aims at distinguishing relevant from irrelevant content first, followed by the actual user profiling only with relevant data. To circumvent the need for relevance-annotated data, we optimize this relevance filter via reinforcement learning with a reward function that utilizes the zero-shot capabilities of large language models. We evaluate our method for Big Five personality trait prediction on two Twitter corpora. On publicly available real-world data with a skewed label distribution, our method shows similar efficacy to using all posts in a user profile, but with a substantially shorter context. An evaluation on a version of these data balanced with artificial posts shows that the filtering to relevant posts leads to a significantly improved accuracy of the predictions.
arXiv.org Artificial Intelligence
Sep-6-2024
- Country:
- Asia
- China > Hong Kong (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Europe
- Germany > Baden-Württemberg
- Stuttgart Region > Stuttgart (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Spain > Valencian Community
- Valencia Province > Valencia (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.14)
- Germany > Baden-Württemberg
- North America
- Dominican Republic (0.04)
- United States
- California
- Alameda County > Berkeley (0.04)
- San Diego County > San Diego (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- New York > New York County
- New York City (0.04)
- Washington > King County
- Seattle (0.04)
- California
- Asia
- Genre:
- Research Report > New Finding (0.48)
- Industry:
- Health & Medicine > Consumer Health (0.67)
- Leisure & Entertainment (0.92)
- Media (0.92)
- Technology: