If the Sources Could Talk: Evaluating Large Language Models for Research Assistance in History

Garcia, Giselle Gonzalez, Weilbach, Christian

Oct-16-2023–arXiv.org Artificial Intelligence

The recent advent of powerful Large-Language Models (LLM) provides a new conversational form of inquiry into historical memory (or, training data, in this case). We show that by augmenting such LLMs with vector embeddings from highly specialized academic sources, a conversational methodology can be made accessible to historians and other researchers in the Humanities. Concretely, we evaluate and demonstrate how LLMs have the ability of assisting researchers while they examine a customized corpora of different types of documents, including, but not exclusive to: (1). primary sources, (2). secondary sources written by experts, and (3). the combination of these two. Compared to established search interfaces for digital catalogues, such as metadata and full-text search, we evaluate the richer conversational style of LLMs on the performance of two main types of tasks: (1). question-answering, and (2). extraction and organization of data. We demonstrate that LLMs semantic retrieval and reasoning abilities on problem-specific tasks can be applied to large textual archives that have not been part of the its training data. Therefore, LLMs can be augmented with sources relevant to specific research projects, and can be queried privately by researchers.

ireland, llm, university press, (13 more...)

arXiv.org Artificial Intelligence

Oct-16-2023

arXiv.org PDF

Add feedback

Country:
- South America
  - Guyana (0.04)
  - Argentina > Pampas
    - Buenos Aires F.D. > Buenos Aires (0.04)
- Oceania
  - New Zealand (0.04)
  - Australia (0.04)
- North America
  - Cuba (0.05)
  - Jamaica (0.04)
  - Central America (0.04)
  - Barbados (0.04)
  - United States
    - Mississippi (0.04)
    - Maryland > Baltimore (0.04)
    - Wisconsin (0.04)
    - Arizona (0.04)
    - Tennessee (0.04)
    - Virginia > Williamsburg (0.04)
    - Kentucky (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.04)
    - South Carolina > Richland County
      - Columbia (0.04)
    - North Carolina > Orange County
      - Chapel Hill (0.04)
    - Florida > Alachua County
      - Gainesville (0.04)
    - California > Santa Clara County
      - Stanford (0.04)
    - Massachusetts > Middlesex County
      - Cambridge (0.14)
    - Louisiana
      - Lafayette Parish > Lafayette (0.04)
      - East Baton Rouge Parish > Baton Rouge (0.04)
    - Illinois
      - Cook County > Chicago (0.04)
      - Champaign County > Urbana (0.04)
    - Indiana > St. Joseph County
      - Notre Dame (0.04)
    - New York > New York County
      - New York City (0.04)
  - Canada
    - Quebec > Montreal (0.04)
    - Ontario > Toronto (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
- Europe
  - Spain (0.04)
  - Switzerland (0.04)
  - United Kingdom
    - Northern Ireland > County Londonderry (0.04)
    - England
      - Cambridgeshire > Cambridge (0.14)
      - Oxfordshire > Oxford (0.05)
      - Suffolk (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - France > Île-de-France
    - Paris > Paris (0.04)
- Asia
  - India (0.04)
  - Middle East > Jordan (0.04)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Health & Medicine > Therapeutic Area (0.46)
- Education > Educational Setting
  - Higher Education (0.69)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.72)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found