LLM-Measure: Generating Valid, Consistent, and Reproducible Text-Based Measures for Social Science Research

Yang, Yi, Duan, Hanyu, Liu, Jiaxin, Tam, Kar Yan

Sep-19-2024–arXiv.org Artificial Intelligence

The increasing use of text as data in social science research necessitates the development of valid, consistent, reproducible, and efficient methods for generating text-based concept measures. This paper presents a novel method that leverages the internal hidden states of large language models (LLMs) to generate these concept measures. Specifically, the proposed method learns a concept vector that captures how the LLM internally represents the target concept, then estimates the concept value for text data by projecting the text's LLM hidden states onto the concept vector. Three replication studies demonstrate the method's effectiveness in producing highly valid, consistent, and reproducible text-based measures across various social science research contexts, highlighting its potential as a valuable tool for the research community.

concept prompt, input sequence, text snippet, (11 more...)

arXiv.org Artificial Intelligence

Sep-19-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.14)
  - Pennsylvania > Allegheny County
    - Pittsburgh (0.04)
- Europe > Netherlands
  - North Holland > Amsterdam (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - China > Hong Kong
    - Sai Kung (0.04)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Banking & Finance > Economy (1.00)
- Government > Regional Government
  - North America Government > United States Government (0.96)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found