Quantifying syntax similarity with a polynomial representation of dependency trees

Liu, Pengyu, Feng, Tinghao, Liu, Rui

arXiv.org Artificial Intelligence 

Dependency focuses on the proximity of words in a sentence, and the hierarchical relations between words in the sentence are represented by a tree structure called the dependency tree of the sentence. Recently, an international collaboration project called Universal Dependency (UD) has created a standard annotation scheme for constructing dependency trees from sentences, and hundreds of UD treebanks of various languages have been made publicly available [7]. These datasets form key materials for syntax analysis, providing new opportunities for automated text processing and syntactic typology studies to name a few. Parallel Universal Dependency (PUD) treebanks are a class of UD treebanks consisting of dependency trees of 1,000 sentences and their translations to other languages [33]. The 1,000 sentences are randomly selected from the news domain and Wikipedia and are originally written in English, French, German, Italian or Spanish. At the time of writing, there are 20 PUD treebanks containing the dependency trees of the 1,000 sentences in 20 languages respectively. These UD treebanks have stimulated novel computational methods for syntax analysis and the development of quantitative measures for syntax similarity [19, 31, 32]. However, current methods describing dependency trees mainly focus on partial syntactic information recorded in the structures such as the order of words and the dependency distance [2, 3, 11, 18]. In this work, we introduce a comprehensive representation of dependency trees based on a tree distinguishing polynomial.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found