Quantifying syntax similarity with a polynomial representation of dependency trees

Nov-13-2022–arXiv.org Artificial Intelligence

Dependency focuses on the proximity of words in a sentence, and the hierarchical relations between words in the sentence are represented by a tree structure called the dependency tree of the sentence. Recently, an international collaboration project called Universal Dependency (UD) has created a standard annotation scheme for constructing dependency trees from sentences, and hundreds of UD treebanks of various languages have been made publicly available [7]. These datasets form key materials for syntax analysis, providing new opportunities for automated text processing and syntactic typology studies to name a few. Parallel Universal Dependency (PUD) treebanks are a class of UD treebanks consisting of dependency trees of 1,000 sentences and their translations to other languages [33]. The 1,000 sentences are randomly selected from the news domain and Wikipedia and are originally written in English, French, German, Italian or Spanish. At the time of writing, there are 20 PUD treebanks containing the dependency trees of the 1,000 sentences in 20 languages respectively. These UD treebanks have stimulated novel computational methods for syntax analysis and the development of quantitative measures for syntax similarity [19, 31, 32]. However, current methods describing dependency trees mainly focus on partial syntactic information recorded in the structures such as the order of words and the dependency distance [2, 3, 11, 18]. In this work, we introduce a comprehensive representation of dependency trees based on a tree distinguishing polynomial.

artificial intelligence, natural language, text processing, (17 more...)

arXiv.org Artificial Intelligence

Nov-13-2022

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Kansas (0.04)
    - North Carolina > Watauga County
      - Boone (0.04)
    - New York > New York County
      - New York City (0.04)
    - Massachusetts > Suffolk County
      - Boston (0.04)
    - California > Yolo County
      - Davis (0.14)
  - Canada > British Columbia
    - Metro Vancouver Regional District
      - Vancouver (0.04)
      - Burnaby (0.04)
- Europe
  - Northern Europe (0.04)
  - Greece (0.04)
  - Sweden > Östergötland County
    - Linköping (0.04)
  - Poland > Greater Poland Province
    - Poznań (0.04)
  - Netherlands > North Holland
    - Amsterdam (0.04)
  - Italy > Tuscany
    - Pisa Province > Pisa (0.04)
- Asia > China
  - Guangdong Province > Zhuhai (0.04)
  - Beijing > Beijing (0.04)

Genre:
- Research Report (0.50)

Industry:
- Government (0.67)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.45)

Technology:
- Information Technology
  - Communications (1.00)
  - Artificial Intelligence > Natural Language
    - Text Processing (0.68)
    - Grammars & Parsing (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found