Towards Verifiable Text Generation with Symbolic References

Hennigen, Lucas Torroba, Shen, Shannon, Nrusimha, Aniruddha, Gapp, Bernhard, Sontag, David, Kim, Yoon

Nov-15-2023–arXiv.org Artificial Intelligence

Large language models (LLMs) have demonstrated an impressive ability to synthesize plausible and fluent text. However they remain vulnerable to hallucinations, and thus their outputs generally require manual human verification for high-stakes applications, which can be timeconsuming and difficult. This paper proposes symbolically grounded generation (SymGen) as a simple approach for enabling easier validation of an LLM's output. SymGen prompts an LLM to interleave its regular output text with explicit symbolic references to fields present in some conditioning data (e.g., a table in JSON format). The references can be used to display the provenance of different spans of text in the generation, reducing the effort required for manual verification. Across data-to-text and question answering experiments, we find that Figure 1: Compare a standard LLM-generated (A) with LLMs are able to directly output text that makes a SymGen (B, ours) description of a basketball game, use of symbolic references while maintaining based on statistics about it.

json, rebound, symbolic reference, (12 more...)

arXiv.org Artificial Intelligence

Nov-15-2023

arXiv.org PDF

Add feedback

Country:
- North America
  - Mexico (0.04)
  - Dominican Republic (0.04)
  - United States
    - Ohio (0.05)
    - Pennsylvania (0.04)
    - Washington > King County
      - Seattle (0.04)
    - Massachusetts > Middlesex County
      - Cambridge (0.28)
    - Illinois > Cook County
      - Chicago (0.04)
    - California > Santa Cruz County
      - Santa Cruz (0.04)
- Europe
  - United Kingdom (0.14)
  - Sweden (0.04)
  - Netherlands (0.04)
  - Austria > Vienna (0.04)
  - Germany
    - Hesse > Darmstadt Region
      - Frankfurt (0.05)
    - Bavaria > Middle Franconia
      - Nuremberg (0.14)
  - Denmark > Capital Region
    - Copenhagen (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Norway > Eastern Norway
    - Oslo (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - Taiwan (0.04)
  - Singapore (0.04)
  - Middle East > Israel (0.04)
  - Japan > Honshū
    - Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre:
- Research Report (1.00)
- Personal
  - Obituary (1.00)
  - Honors (0.67)

Industry:
- Information Technology (1.00)
- Health & Medicine (1.00)
- Banking & Finance > Trading (1.00)
- Leisure & Entertainment > Sports
  - Basketball (1.00)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)