Goto

Collaborating Authors

 google release tydi qa


Google releases TyDi QA, a data set that aims to capture the uniqueness of languages

#artificialintelligence

Google hopes to spur the development of AI capable of understanding the ways in which languages express different meanings. To this end, company researchers today detailed a data set -- TyDi QA, a question-answering data set covering 11 languages -- inspired by typological diversity, or the notion that different languages express meaning in structurally unique ways. TyDi QA is something of a complement to the English-language Natural Questions corpus Google released last year, and it attempts to capture t he idiosyncrasies and features of tongues like Japanese and Arabic. The researchers point out, for instance, that English changes words to indicate one object ("book") versus many ("books"), and that Arabic has a third form to indicate if there are two of something ("كتابان", kitaban) beyond just singular ("كتاب", kitab) or plural ("كتب", kutub). "Because we selected a set of languages that are typologically distant from each other for this corpus, we expect models performing well on this dataset to generalize across a large number of the languages in the world," wrote Google Research scientist Jonathan Clark in a blog post.