On Linearizing Structured Data in Encoder-Decoder Language Models: Insights from Text-to-SQL

Apr-2-2024–arXiv.org Artificial Intelligence

Structured data, prevalent in tables, databases, and knowledge graphs, poses a significant challenge in its representation. With the advent of large language models (LLMs), there has been a shift towards linearization-based methods, which process structured data as sequential token streams, diverging from approaches that explicitly model structure, often as a graph. Crucially, there remains a gap in our understanding of how these linearization-based methods handle structured data, which is inherently non-linear. This work investigates the linear handling of structured data in encoder-decoder language models, specifically T5. Our findings reveal the model's ability to mimic human-designed processes such as schema linking and syntax prediction, indicating a deep, meaningful learning of structure beyond simple token sequencing. We also uncover insights into the model's internal mechanisms, including the ego-centric nature of structure node encodings and the potential for model compression due to modality fusion redundancy. Overall, this work sheds light on the inner workings of linearization-based methods and could potentially provide guidance for future research.

information, prediction, prefix, (13 more...)

arXiv.org Artificial Intelligence

Apr-2-2024

arXiv.org PDF

Add feedback

Country:
- Africa (0.04)
- North America > United States
  - Massachusetts (0.04)
  - Louisiana > Orleans Parish
    - New Orleans (0.04)
  - California > San Diego County
    - San Diego (0.04)
    - La Jolla (0.04)
- Europe
  - Italy (0.04)
  - Germany > Saarland (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - China > Hong Kong (0.04)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology
  - Information Management (1.00)
  - Artificial Intelligence
    - Natural Language > Large Language Model (1.00)
    - Machine Learning > Neural Networks
      - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found