translate-train
Extending Translate-Train for ColBERT-X to African Language CLIR
Yang, Eugene, Lawrie, Dawn J., McNamee, Paul, Mayfield, James
This paper describes the submission runs from the HLTCOE team at the CIRAL CLIR tasks for African languages at FIRE 2023. Our submissions use machine translation models to translate the documents and the training passages, and ColBERT-X as the retrieval model. Additionally, we present a set of unofficial runs that use an alternative training procedure with a similar training setting.
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- (6 more...)
Better Low-Resource Entity Recognition Through Translation and Annotation Fusion
Chen, Yang, Shah, Vedaant, Ritter, Alan
Pre-trained multilingual language models have enabled significant advancements in cross-lingual transfer. However, these models often exhibit a performance disparity when transferring from high-resource languages to low-resource languages, especially for languages that are underrepresented or not in the pre-training data. Motivated by the superior performance of these models on high-resource languages compared to low-resource languages, we introduce a Translation-and-fusion framework, which translates low-resource language text into a high-resource language for annotation using fully supervised models before fusing the annotations back into the low-resource language. Based on this framework, we present TransFusion, a model trained to fuse predictions from a high-resource language to make robust predictions on low-resource languages. We evaluate our methods on two low-resource named entity recognition (NER) datasets, MasakhaNER2.0 and LORELEI NER, covering 25 languages, and show consistent improvement up to +16 F$_1$ over English fine-tuning systems, achieving state-of-the-art performance compared to Translate-train systems. Our analysis depicts the unique advantages of the TransFusion method which is robust to translation errors and source language prediction errors, and complimentary to adapted multilingual language models.
- North America > United States (0.46)
- Europe > United Kingdom (0.04)
- Europe > France (0.04)
- (6 more...)