Kubon, Vladislav
Exploiting Maching Learning for Automatic Semantic Feature Assignment
Bilek, Karel (Charles University) | Klyueva, Natalia (Charles University in Prague) | Kubon, Vladislav (Charles University in Prague)
In this paper we experiment with supervised machine learning techniques for the task of assigning semantic categories to nouns in Czech. The experiments work with 16 semantic categories based on available manually annotated data. The paper compares two possible approaches - one based on the contextual information, the other based upon morphological properties - we are trying to automatically extract final segments of lemmas which might carry semantic information. The central problem of this research is finding the features for machine learning that produce better results for relatively small training data size.
Studying Formal Properties of a Free Word Order Language
Kubon, Vladislav (Charles University in Prague) | Lopatkova, Marketa (Charles University in Prague)
The paper investigates a phenomenon of free word order through the analysis by reduction. It exploits its formal background and data types and studies the word order freedom by means of the minimal number of word order shifts (word order changes preserving syntactic correctness, individual word forms, their morphological characteristics and/or their surface dependency relations). The investigation focuses upon an interplay of two phenomena related to word order: (non-)projectivity of a sentence and number of word order shifts within the analysis by reduction. This interplay is exemplified on a sample of Czech sentences with clitics.
Obtaining Hidden Relations from a Syntactically Annotated Corpus - From Word Relationships to Clause Relationships
Kruza, Oldrich (Charles University in Prague) | Kubon, Vladislav (Charles University in Prague)
The paper concentrates on obtaining hidden relationships among individual clauses of complex sentences from the Prague Dependency Treebank. The treebank contains only an information about mutual relationships among individual tokens (words, punctuation marks), not about more complex units (clauses). For the experiments with clauses and their parts (segments) it was therefore necessary to develop an automatic method transforming the original annotation into a scheme describing the syntactic relationships between clauses. The task was complicated by a certain degree of inconsistency in original annotation with regard to clauses and their structure. The paper describes the algorithm of deriving clause-related information from the existing annotation and its evaluation.