AITopics

2305.13303

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(9 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.47)

arXiv.org Artificial IntelligenceOct-20-2023

Towards Making the Most of ChatGPT for Machine Translation

Peng, Keqin, Ding, Liang, Zhong, Qihuang, Shen, Li, Liu, Xuebo, Zhang, Min, Ouyang, Yuanxin, Tao, Dacheng

ChatGPT shows remarkable capabilities for machine translation (MT). Several prior studies have shown that it achieves comparable results to commercial systems for high-resource languages, but lags behind in complex tasks, e.g., low-resource and distant-language-pairs translation. However, they usually adopt simple prompts which can not fully elicit the capability of ChatGPT. In this paper, we aim to further mine ChatGPT's translation ability by revisiting several aspects: temperature, task information, and domain information, and correspondingly propose an optimal temperature setting and two (simple but effective) prompts: Task-Specific Prompts (TSP) and Domain-Specific Prompts (DSP). We show that: 1) The performance of ChatGPT depends largely on temperature, and a lower temperature usually can achieve better performance; 2) Emphasizing the task information can further improve ChatGPT's performance, particularly in complex MT tasks; 3) Introducing domain information can elicit ChatGPT's generalization ability and improve its performance in the specific domain; 4) ChatGPT tends to generate hallucinations for non-English-centric MT tasks, which can be partially addressed by our proposed prompts but still need to be highlighted for the MT/NLP community. We also explore the effects of advanced in-context learning strategies and find a (negative but interesting) observation: the powerful chain-of-thought prompt leads to word-by-word translation behavior, thus bringing significant translation degradation.

arxiv preprint, chatgpt, translation, (14 more...)

2303.1378

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Czechia > Prague (0.04)
Asia > China > Hubei Province > Wuhan (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

GARI: Graph Attention for Relative Isomorphism of Arabic Word Embeddings

Ali, Muhammad Asif, Alshmrani, Maha, Qin, Jianbin, Hu, Yan, Wang, Di

Bilingual Lexical Induction (BLI) is a core challenge in NLP, it relies on the relative isomorphism of individual embedding spaces. Existing attempts aimed at controlling the relative isomorphism of different embedding spaces fail to incorporate the impact of semantically related words in the model training objective. To address this, we propose GARI that combines the distributional training objectives with multiple isomorphism losses guided by the graph attention network. GARI considers the impact of semantical variations of words in order to define the relative isomorphism of the embedding spaces. Experimental evaluation using the Arabic language data set shows that GARI outperforms the existing research by improving the average P@1 by a relative score of up to 40.95% and 76.80% for in-domain and domain mismatch settings respectively. We release the codes for GARI at https://github.com/asif6827/GARI.

gari, isomorphism, relative isomorphism, (12 more...)

2310.13068

Country:

Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)

Algan, Abdullah Can, Yürekli, Emre, Çayır, Aykut

A Use Case: Reformulating Query Rewriting as a Statistical Machine Translation Problem

One of the most important challenges for modern search engines is to retrieve relevant web content based on user queries. In order to achieve this challenge, search engines have a module to rewrite user queries. That is why modern web search engines utilize some statistical and neural models used in the natural language processing domain. Statistical machine translation is a well-known NLP method among them. The paper proposes a query rewriting pipeline based on a monolingual machine translation model that learns to rewrite Arabic user search queries. This paper also describes preprocessing steps to create a mapping between user queries and web page titles.

query, rewritten query, translation, (13 more...)

2310.13031

Country:

Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.05)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.05)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry:

Media > Television (0.68)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Wu, Qiyu, Nagata, Masaaki, Tsuruoka, Yoshimasa

WSPAlign: Word Alignment Pre-training via Large-Scale Weakly Supervised Span Prediction

Most existing word alignment methods rely on manual alignment datasets or parallel corpora, which limits their usefulness. Here, to mitigate the dependence on manual data, we broaden the source of supervision by relaxing the requirement for correct, fully-aligned, and parallel sentences. Specifically, we make noisy, partially aligned, and non-parallel paragraphs. We then use such a large-scale weakly-supervised dataset for word alignment pre-training via span prediction. Extensive experiments with various settings empirically demonstrate that our approach, which is named WSPAlign, is an effective and scalable way to pre-train word aligners without manual data. When fine-tuned on standard benchmarks, WSPAlign has set a new state-of-the-art by improving upon the best-supervised baseline by 3.3~6.1 points in F1 and 1.5~6.1 points in AER. Furthermore, WSPAlign also achieves competitive performance compared with the corresponding baselines in few-shot, zero-shot and cross-lingual tests, which demonstrates that WSPAlign is potentially more practical for low-resource languages than existing methods.

alignment, dataset, word alignment, (13 more...)

2306.05644

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Africa > Middle East > Egypt > Giza Governorate > Giza (0.04)
North America > United States (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.89)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)

PMIndiaSum: Multilingual and Cross-lingual Headline Summarization for Languages in India

Urlana, Ashok, Chen, Pinzhen, Zhao, Zheng, Cohen, Shay B., Shrivastava, Manish, Haddow, Barry

This paper introduces PMIndiaSum, a multilingual and massively parallel summarization corpus focused on languages in India. Our corpus provides a training and testing ground for four language families, 14 languages, and the largest to date with 196 language pairs. We detail our construction workflow including data acquisition, processing, and quality assurance. Furthermore, we publish benchmarks for monolingual, cross-lingual, and multilingual summarization by fine-tuning, prompting, as well as translate-and-summarize. Experimental results confirm the crucial role of our data in aiding summarization between Indian languages. Our dataset is publicly available and can be freely modified and re-distributed.

dataset, information, summarization, (17 more...)

2305.08828

Country:

Asia > India > Tripura (0.04)
North America > Cuba (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology (1.00)
Government > Regional Government (0.92)
Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)
Information Technology > Data Science (0.67)
(2 more...)

Downey, C. M., Zhou, Xuhui, Liu, Leo Z., Steinert-Threlkeld, Shane

Learning to translate by learning to communicate

We formulate and test a technique to use Emergent Communication (EC) with a pre-trained multilingual model to improve on modern Unsupervised NMT systems, especially for low-resource languages. It has been argued that the current dominant paradigm in NLP of pre-training on text-only corpora will not yield robust natural language understanding systems, and the need for grounded, goal-oriented, and interactive language learning has been high lighted. In our approach, we embed a multilingual model (mBART, Liu et al., 2020) into an EC image-reference game, in which the model is incentivized to use multilingual generations to accomplish a vision-grounded task. The hypothesis is that this will align multiple languages to a shared task space. We present two variants of EC Fine-Tuning (Steinert-Threlkeld et al., 2022), one of which outperforms a backtranslation-only baseline in all four languages investigated, including the low-resource language Nepali.

computational linguistic, proceedings, translation, (15 more...)

2207.07025

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Hong Kong (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(15 more...)

Genre: Research Report > New Finding (0.68)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceOct-18-2023

knn-seq: Efficient, Extensible kNN-MT Framework

Deguchi, Hiroyuki, Hirano, Hayate, Hoshino, Tomoki, Nishida, Yuto, Vasselli, Justin, Watanabe, Taro

k-nearest-neighbor machine translation (kNN-MT) boosts the translation quality of a pre-trained neural machine translation (NMT) model by utilizing translation examples during decoding. Translation examples are stored in a vector database, called a datastore, which contains one entry for each target token from the parallel data it is made from. Due to its size, it is computationally expensive both to construct and to retrieve examples from the datastore. In this paper, we present an efficient and extensible kNN-MT framework, knn-seq, for researchers and developers that is carefully designed to run efficiently, even with a billion-scale large datastore. knn-seq is developed as a plug-in on fairseq and easy to switch models and kNN indexes. Experimental results show that our implemented kNN-MT achieves a comparable gain to the original kNN-MT, and the billion-scale datastore construction took 2.21 hours in the WMT'19 German-to-English translation task. We publish our knn-seq as an MIT-licensed open-source project and the code is available on https://github.com/naist-nlp/knn-seq . The demo video is available on https://youtu.be/zTDzEOq80m0 .

extensible knn-mt framework

2310.12352

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.53)

Petrick, Frithjof, Herold, Christian, Petrushkov, Pavel, Khadivi, Shahram, Ney, Hermann

Document-Level Language Models for Machine Translation

arXiv.org Artificial IntelligenceOct-18-2023

Despite the known limitations, most machine translation systems today still operate on the sentence-level. One reason for this is, that most parallel training data is only sentence-level aligned, without document-level meta information available. In this work, we set out to build context-aware translation systems utilizing document-level monolingual data instead. This can be achieved by combining any existing sentence-level translation model with a document-level language model. We improve existing approaches by leveraging recent advancements in model combination. Additionally, we propose novel weighting techniques that make the system combination more flexible and significantly reduce computational overhead. In a comprehensive evaluation on four diverse translation tasks, we show that our extensions improve document-targeted scores substantially and are also computationally more efficient. However, we also find that in most scenarios, back-translation gives even better results, at the cost of having to re-train the translation system. Finally, we explore language model fusion in the light of recent advancements in large language models. Our findings suggest that there might be strong potential in utilizing large language models via model combination.

document-level language model, machine translation

2310.12303

Genre: Research Report > New Finding (0.53)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Tourni, Isidora Chara, Naskar, Subhajit

Direct Neural Machine Translation with Task-level Mixture of Experts models

arXiv.org Artificial IntelligenceOct-18-2023

Direct neural machine translation (direct NMT) is a type of NMT system that translates text between two non-English languages. Direct NMT systems often face limitations due to the scarcity of parallel data between non-English language pairs. Several approaches have been proposed to address this limitation, such as multilingual NMT and pivot NMT (translation between two languages via English). Task-level Mixture of expert models (Task-level MoE), an inference-efficient variation of Transformer-based models, has shown promising NMT performance for a large number of language pairs. In Task-level MoE, different language groups can use different routing strategies to optimize cross-lingual learning and inference speed. In this work, we examine Task-level MoE's applicability in direct NMT and propose a series of high-performing training and evaluation configurations, through which Task-level MoE-based direct NMT systems outperform bilingual and pivot-based models for a large number of low and high-resource direct pairs, and translation directions. Our Task-level MoE with 16 experts outperforms bilingual NMT, Pivot NMT models for 7 language pairs, while pivot-based models still performed better in 9 pairs and directions.

direct neural machine translation, expert model, task-level mixture

2310.12236

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)