Context based lemmatizer for Polish language

Jul-23-2022–arXiv.org Artificial Intelligence

Natural Language Processing consists of many tasks, the role of each is extracting and processing human understandable meaning from the text data. Some tasks like classification encompass the complete flow from data to answer, in other tasks like part of speech tagging, results are often used as an input for next algorithms. An interesting and complex problem is translation, where the meaning of the text needs to be extracted and encoded back to the text in a different language. This approach describes a family of NLP tasks called text-to-text or sequence-to-sequence processing. Another example of text-to-text processing is lemmatisation, it finds a base form of a given word or expression. Complexity of this problem varies from language to language. In English the number of word variations is usually low, there are simple rules and not many exceptions. However in Slavic languages such as Polish inflection of words it is significantly more complicated and effective lemmatisation is beyond capabilities of a rule based or edit tree classification methods [1], [2]. Situation becomes more difficult when we include multi-segment expressions.

computational linguistic, lemmatizer, proceedings, (10 more...)

arXiv.org Artificial Intelligence

Jul-23-2022

arXiv.org PDF

Add feedback

Country:
- North America
  - United States > Louisiana
    - Orleans Parish > New Orleans (0.04)
  - Canada > British Columbia
    - Metro Vancouver Regional District > Vancouver (0.04)
- Europe
  - Portugal > Lisbon
    - Lisbon (0.04)
  - Poland > Lesser Poland Province
    - Kraków (0.05)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.70)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found