Large Language Models ' Expert-level Global History Knowledge Benchmark (HiST-LLM), Jenny Reddish
–Neural Information Processing Systems
Large Language Models (LLMs) have the potential to transform humanities and social science research, yet their history knowledge and comprehension at a graduate level remains untested. Benchmarking LLMs in history is particularly challenging, given that human knowledge of history is inherently unbalanced, with more information available on Western history and recent periods. We introduce the History Seshat Test for LLMs (HiST-LLM), based on a subset of the Seshat Global History Databank, which provides a structured representation of human historical knowledge, containing 36,000 data points across 600 historical societies and over 2,700 scholarly references. This dataset covers every major world region from the Neolithic period to the Industrial Revolution and includes information reviewed and assembled by history experts and graduate research assistants.
Neural Information Processing Systems
Mar-19-2025, 18:28:28 GMT
- Country:
- Africa (1.00)
- Asia > Middle East (0.93)
- Europe
- Austria > Vienna (0.14)
- United Kingdom > England (0.14)
- North America > United States
- California (0.14)
- New York (0.14)
- Washington > King County
- Seattle (0.14)
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Government (0.92)
- Law (1.00)
- Technology: