First Attempt at Building Parallel Corpora for Machine Translation of Northeast India's Very Low-Resource Languages
Tonja, Atnafu Lambebo, Mersha, Melkamu, Kalita, Ananya, Kolesnikova, Olga, Kalita, Jugal
–arXiv.org Artificial Intelligence
This paper presents the creation of initial bilingual corpora for thirteen very low-resource languages of India, all from Northeast India. It also presents the results of initial translation efforts in these languages. It creates the first-ever parallel corpora for these languages and provides initial benchmark neural machine translation results for these languages. We intend to extend these corpora to include a large number of low-resource Indian languages and integrate the effort with our prior work with African and American-Indian languages to create corpora covering a large number of languages from across the world.
arXiv.org Artificial Intelligence
Dec-7-2023
- Country:
- Africa > South Africa (0.04)
- South America > Uruguay
- North America
- United States
- California (0.05)
- Washington > King County
- Seattle (0.04)
- Colorado > El Paso County
- Colorado Springs (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- Europe
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Ireland > Leinster
- Asia
- Myanmar (0.15)
- Indonesia > Bali (0.05)
- China (0.05)
- Nepal (0.04)
- Middle East > Israel (0.04)
- Bangladesh (0.04)
- India
- Nagaland (0.06)
- Mizoram (0.05)
- Manipur (0.05)
- Arunachal Pradesh (0.05)
- Tripura (0.04)
- Meghalaya (0.04)
- Himachal Pradesh (0.04)
- Genre:
- Research Report (0.50)
- Technology: