Context based lemmatizer for Polish language

Karwatowski, Michal, Pietron, Marcin

arXiv.org Artificial Intelligence 

Natural Language Processing consists of many tasks, the role of each is extracting and processing human understandable meaning from the text data. Some tasks like classification encompass the complete flow from data to answer, in other tasks like part of speech tagging, results are often used as an input for next algorithms. An interesting and complex problem is translation, where the meaning of the text needs to be extracted and encoded back to the text in a different language. This approach describes a family of NLP tasks called text-to-text or sequence-to-sequence processing. Another example of text-to-text processing is lemmatisation, it finds a base form of a given word or expression. Complexity of this problem varies from language to language. In English the number of word variations is usually low, there are simple rules and not many exceptions. However in Slavic languages such as Polish inflection of words it is significantly more complicated and effective lemmatisation is beyond capabilities of a rule based or edit tree classification methods [1], [2]. Situation becomes more difficult when we include multi-segment expressions.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found