First Attempt at Building Parallel Corpora for Machine Translation of Northeast India's Very Low-Resource Languages
Tonja, Atnafu Lambebo, Mersha, Melkamu, Kalita, Ananya, Kolesnikova, Olga, Kalita, Jugal
–arXiv.org Artificial Intelligence
This paper presents the creation of initial bilingual corpora for thirteen very low-resource languages of India, all from Northeast India. It also presents the results of initial translation efforts in these languages. It creates the first-ever parallel corpora for these languages and provides initial benchmark neural machine translation results for these languages. We intend to extend these corpora to include a large number of low-resource Indian languages and integrate the effort with our prior work with African and American-Indian languages to create corpora covering a large number of languages from across the world.
arXiv.org Artificial Intelligence
Dec-7-2023
- Country:
- Asia > India (1.00)
- North America
- Mexico > Mexico City (0.14)
- United States > Colorado (0.15)
- Genre:
- Research Report (0.50)
- Technology: