dIR -- Discrete Information Retrieval: Conversational Search over Unstructured (and Structured) Data with Large Language Models
Bertorello, Pablo M. Rodriguez, Laguerre, Jean Rodmond Junior
–arXiv.org Artificial Intelligence
Data is stored in both structured and unstructured form. Querying both, to power natural language conversations, is a challenge. This paper introduces dIR, Discrete Information Retrieval, providing a unified interface to query both free text and structured knowledge. Specifically, a Large Language Model (LLM) transforms text into expressive representation. After the text is extracted into columnar form, it can then be queried via a text-to-SQL Semantic Parser, with an LLM converting natural language into SQL. Where desired, such conversation may be effected by a multi-step reasoning conversational agent. We validate our approach via a proprietary question/answer data set, concluding that dIR makes a whole new class of queries on free text possible when compared to traditionally fine-tuned dense-embedding-model-based Information Retrieval (IR) and SQL-based Knowledge Bases (KB). For sufficiently complex queries, dIR can succeed where no other method stands a chance.
arXiv.org Artificial Intelligence
Dec-20-2023
- Country:
- North America
- United States > Minnesota
- Hennepin County > Minneapolis (0.14)
- Canada > Ontario
- Toronto (0.04)
- United States > Minnesota
- Europe
- United Kingdom > Scotland
- City of Edinburgh > Edinburgh (0.04)
- Italy > Tuscany
- Florence (0.04)
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- United Kingdom > Scotland
- Asia
- Singapore (0.04)
- Middle East
- Jordan (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- China
- Beijing > Beijing (0.04)
- Hong Kong (0.04)
- Guangxi Province > Nanning (0.04)
- North America
- Genre:
- Research Report (0.64)
- Technology: