TaBERT: A new model for understanding queries over tabular data

#artificialintelligence 

TaBERT is the first model that has been pretrained to learn representations for both natural language sentences and tabular data. These sorts of representations are useful for natural language understanding tasks that involve joint reasoning over natural language sentences and tables. A representative example is semantic parsing over databases, where a natural language question (e.g., "Which country has the highest GDP?") is mapped to a program executable over database (DB) tables. This is the first pretraining approach across structured and unstructured domains, and it opens new possibilities regarding semantic parsing, where one of the key challenges has been understanding the structure of a DB table and how it aligns with a query. TaBERT has been trained using a corpus of 26 million tables and their associated English sentences.