A Tulu Resource for Machine Translation

Mar-28-2024–arXiv.org Artificial Intelligence

We present the first parallel dataset for English-Tulu translation. Tulu, classified within the South Dravidian linguistic family branch, is predominantly spoken by approximately 2.5 million individuals in southwestern India. Our dataset is constructed by integrating human translations into the multilingual machine translation resource FLORES-200. Furthermore, we use this dataset for evaluation purposes in developing our English-Tulu machine translation model. For the model's training, we leverage resources available for related South Dravidian languages. We adopt a transfer learning approach that exploits similarities between high-resource and low-resource languages. This method enables the training of a machine translation system even in the absence of parallel data between the source and target language, thereby overcoming a significant obstacle in machine translation development for low-resource languages. Our English-Tulu system, trained without using parallel English-Tulu data, outperforms Google Translate by 19 BLEU points (in September 2023).

computational linguistic, translation, tulu, (13 more...)

arXiv.org Artificial Intelligence

Mar-28-2024

arXiv.org PDF

Add feedback

Country:
- Africa > South Africa (0.04)
- South America (0.04)
- North America
  - United States
    - Texas
      - Travis County > Austin (0.04)
      - Dallas County > Dallas (0.04)
    - Pennsylvania > Philadelphia County
      - Philadelphia (0.04)
    - California > San Diego County
      - San Diego (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Europe
  - Slovenia (0.04)
  - Germany > Berlin (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Switzerland > Zürich
    - Zürich (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.05)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - Indonesia > Bali (0.04)
  - India
    - Karnataka (0.05)
    - Maharashtra (0.04)
    - Kerala (0.04)

Genre:
- Research Report (0.82)
- Workflow (0.68)

Industry:
- Education (0.93)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found