Learning Metadata-Agnostic Representations for Text-to-SQL In-Context Example Selection
Mai, Chuhong, Tal, Ro-ee, Mohamed, Thahir
–arXiv.org Artificial Intelligence
In-context learning (ICL) is a powerful paradigm where large language models (LLMs) benefit from task demonstrations added to the prompt. Yet, selecting optimal demonstrations is not trivial, especially for complex or multi-modal tasks where input and output distributions differ. We hypothesize that forming task-specific representations of the input is key. In this paper, we propose a method to align representations of natural language questions and those of SQL queries in a shared embedding space. Our technique, dubbed MARLO - Metadata-Agnostic Representation Learning for Text-tO-SQL - uses query structure to model querying intent without over-indexing on underlying database metadata (i.e. tables, columns, or domain-specific entities of a database referenced in the question or query). This allows MARLO to select examples that are structurally and semantically relevant for the task rather than examples that are spuriously related to a certain domain or question phrasing. When used to retrieve examples based on question similarity, MARLO shows superior performance compared to generic embedding models (on average +2.9\%pt. in execution accuracy) on the Spider benchmark. It also outperforms the next best method that masks metadata information by +0.8\%pt. in execution accuracy on average, while imposing a significantly lower inference latency.
arXiv.org Artificial Intelligence
Oct-17-2024
- Country:
- North America
- Dominican Republic (0.04)
- United States
- California > Sonoma County (0.14)
- Washington > King County
- Seattle (0.04)
- Utah > Salt Lake County
- Salt Lake City (0.04)
- Europe
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Ireland > Leinster
- Asia
- Singapore (0.04)
- Indonesia (0.04)
- Myanmar > Tanintharyi Region
- Dawei (0.04)
- Middle East
- Jordan (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- China
- Hong Kong (0.04)
- Guangxi Province > Nanning (0.04)
- North America
- Genre:
- Research Report (0.82)
- Industry:
- Leisure & Entertainment (0.46)
- Banking & Finance (0.46)
- Education (0.46)
- Technology: