Large Language Models Struggle to Describe the Haystack without Human Help: Human-in-the-loop Evaluation of LLMs

Li, Zongxia, Calvo-Bartolomé, Lorena, Hoyle, Alexander, Xu, Paiheng, Dima, Alden, Fung, Juan Francisco, Boyd-Graber, Jordan

Feb-20-2025–arXiv.org Artificial Intelligence

A common use of NLP is to facilitate the understanding of large document collections, with a shift from using traditional topic models to Large Language Models. Yet the effectiveness of using LLM for large corpus understanding in real-world applications remains under-explored. This study measures the knowledge users acquire with unsupervised, supervised LLM-based exploratory approaches or traditional topic models on two datasets. While LLM-based methods generate more human-readable topics and show higher average win probabilities than traditional models for data exploration, they produce overly generic topics for domain-specific datasets that do not easily allow users to learn much about the documents. Adding human supervision to the LLM generation process improves data exploration by mitigating hallucination and over-genericity but requires greater human effort. In contrast, traditional. models like Latent Dirichlet Allocation (LDA) remain effective for exploration but are less user-friendly. We show that LLMs struggle to describe the haystack of large corpora without human help, particularly domain-specific data, and face scaling and hallucination limitations due to context length constraints. Dataset available at https://huggingface. co/datasets/zli12321/Bills.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Feb-20-2025

arXiv.org PDF

Add feedback

Country:
- Europe (1.00)
- North America > United States
  - California > San Francisco County > San Francisco (0.14)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (1.00)

Industry:
- Education > Curriculum
  - Subject-Specific Education (0.67)
- Government > Regional Government
  - North America Government > United States Government (1.00)
- Health & Medicine (0.93)
- Law (0.93)
- Water & Waste Management > Water Management
  - Water Supplies & Services (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.67)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found