Llms, Virtual Users, and Bias: Predicting Any Survey Question Without Human Data

Sinacola, Enzo, Pachot, Arnault, Petit, Thierry

Mar-11-2025–arXiv.org Artificial Intelligence

Large Language Models (LLMs) offer a promising alternative to traditional survey methods, potentially enhancing efficiency and reducing costs. In this study, we use LLMs to create virtual populations that answer survey questions, enabling us to predict outcomes comparable to human responses. We evaluate several LLMs-including GPT-4o, GPT-3.5, Claude 3.5-Sonnet, and versions of the Llama and Mistral models-comparing their performance to that of a traditional Random Forests algorithm using demographic data from the World Values Survey (WVS). LLMs demonstrate competitive performance overall, with the significant advantage of requiring no additional training data. However, they exhibit biases when predicting responses for certain religious and population groups, underperforming in these areas. On the other hand, Random Forests demonstrate stronger performance than LLMs when trained with sufficient data. We observe that removing censorship mechanisms from LLMs significantly improves predictive accuracy, particularly for underrepresented demographic segments where censored models struggle. These findings highlight the importance of addressing biases and reconsidering censorship approaches in LLMs to enhance their reliability and fairness in public opinion research.

accuracy, llm, prediction, (16 more...)

arXiv.org Artificial Intelligence

Mar-11-2025

arXiv.org PDF

Add feedback

Country:
- Africa
  - Ethiopia (0.04)
  - Kenya (0.04)
  - Middle East
    - Egypt (0.04)
    - Libya (0.04)
    - Morocco (0.04)
    - Tunisia (0.04)
  - Nigeria (0.04)
  - North Africa (0.04)
  - Sub-Saharan Africa (0.04)
  - Zimbabwe (0.04)
- Asia
  - Pakistan (0.04)
  - Mongolia (0.04)
  - Malaysia (0.04)
  - East Asia (0.04)
  - Bangladesh (0.04)
  - Vietnam (0.04)
  - Indonesia (0.04)
  - Middle East
    - Iran (0.04)
    - Iraq (0.04)
  - South Korea (0.04)
  - Southeast Asia (0.05)
  - Thailand (0.04)
  - China > Hong Kong (0.04)
- Europe
  - France > Île-de-France
    - Paris > Paris (0.04)
  - Greece (0.04)
  - Middle East (0.04)
  - Netherlands (0.04)
  - Romania (0.04)
- North America
  - Canada (0.04)
  - Central America (0.05)
  - Mexico (0.04)
  - United States (0.04)
- South America
  - Argentina (0.04)
  - Brazil (0.04)
  - Ecuador (0.04)

Genre:
- Questionnaire & Opinion Survey (1.00)
- Research Report > New Finding (1.00)

Industry:
- Law > Civil Rights & Constitutional Law (0.79)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.77)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found