symbolic reference
Towards Verifiable Text Generation with Symbolic References
Hennigen, Lucas Torroba, Shen, Shannon, Nrusimha, Aniruddha, Gapp, Bernhard, Sontag, David, Kim, Yoon
Large language models (LLMs) have demonstrated an impressive ability to synthesize plausible and fluent text. However they remain vulnerable to hallucinations, and thus their outputs generally require manual human verification for high-stakes applications, which can be timeconsuming and difficult. This paper proposes symbolically grounded generation (SymGen) as a simple approach for enabling easier validation of an LLM's output. SymGen prompts an LLM to interleave its regular output text with explicit symbolic references to fields present in some conditioning data (e.g., a table in JSON format). The references can be used to display the provenance of different spans of text in the generation, reducing the effort required for manual verification. Across data-to-text and question answering experiments, we find that Figure 1: Compare a standard LLM-generated (A) with LLMs are able to directly output text that makes a SymGen (B, ours) description of a basketball game, use of symbolic references while maintaining based on statistics about it.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.28)
- Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.14)
- Europe > United Kingdom (0.14)
- (20 more...)
- Research Report (1.00)
- Personal > Obituary (1.00)
- Personal > Honors (0.67)
- Leisure & Entertainment > Sports > Basketball (1.00)
- Information Technology (1.00)
- Health & Medicine (1.00)
- Banking & Finance > Trading (1.00)