AITopics | Machine Translation

Collaborating Authors

Machine Translation

"Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains."
– Definition from the European Association for Machine Translation (EAMT).

You can translate text of your choice by using free translators such as: CAPITA, Google Translate, SDL International, SYSTRAN.

News Overviews Instructional Materials AI-Alerts Classics

Plan-then-Seam: Towards Efficient Table-to-Text Generation

Li, Liang, Geng, Ruiying, Fang, Chengyang, Li, Bing, Ma, Can, Li, Binhua, Li, Yongbin

arXiv.org Artificial IntelligenceFeb-28-2023

Table-to-text generation aims at automatically generating text to help people conveniently obtain salient information in tables. Recent works explicitly decompose the generation process into content planning and surface generation stages, employing two autoregressive networks for them respectively. However, they are computationally expensive due to the non-parallelizable nature of autoregressive decoding and the redundant parameters of two networks. In this paper, we propose the first totally non-autoregressive table-to-text model (Plan-then-Seam, PTS) that produces its outputs in parallel with one single network. PTS firstly writes and calibrates one plan of the content to be generated with a novel rethinking pointer predictor, and then takes the plan as the context for seaming to decode the description. These two steps share parameters and perform iteratively to capture token inter-dependency while keeping parallel decoding. Experiments on two public benchmarks show that PTS achieves 3.0~5.6 times speedup for inference time, reducing 50% parameters, while maintaining as least comparable performance against strong two-stage table-to-text competitors.

computational linguistic, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2302.05138

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
North America > United States > California > Los Angeles County > Long Beach (0.14)
South America > Brazil (0.04)
(18 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.47)

Add feedback

An evaluation of Google Translate for Sanskrit to English translation via sentiment and semantic analysis

Shukla, Akshat, Bansal, Chaarvi, Badhe, Sushrut, Ranjan, Mukul, Chandra, Rohitash

arXiv.org Artificial IntelligenceFeb-27-2023

Google Translate has been prominent for language translation; however, limited work has been done in evaluating the quality of translation when compared to human experts. Sanskrit one of the oldest written languages in the world. In 2022, the Sanskrit language was added to the Google Translate engine. Sanskrit is known as the mother of languages such as Hindi and an ancient source of the Indo-European group of languages. Sanskrit is the original language for sacred Hindu texts such as the Bhagavad Gita. In this study, we present a framework that evaluates the Google Translate for Sanskrit using the Bhagavad Gita. We first publish a translation of the Bhagavad Gita in Sanskrit using Google Translate. Our framework then compares Google Translate version of Bhagavad Gita with expert translations using sentiment and semantic analysis via BERT-based language models. Our results indicate that in terms of sentiment and semantic analysis, there is low level of similarity in selected verses of Google Translate when compared to expert translations. In the qualitative evaluation, we find that Google translate is unsuitable for translation of certain Sanskrit words and phrases due to its poetic nature, contextual significance, metaphor and imagery. The mistranslations are not surprising since the Bhagavad Gita is known as a difficult text not only to translate, but also to interpret since it relies on contextual, philosophical and historical information. Our framework lays the foundation for automatic evaluation of other languages by Google Translate

machine learning, natural language, translation, (19 more...)

arXiv.org Artificial Intelligence

2303.07201

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New York (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
(6 more...)

Genre: Research Report > New Finding (0.54)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

kNN-BOX: A Unified Framework for Nearest Neighbor Generation

Zhu, Wenhao, Zhao, Qianfeng, Lv, Yunzhe, Huang, Shujian, Zhao, Siheng, Liu, Sizhe, Chen, Jiajun

arXiv.org Artificial IntelligenceFeb-27-2023

Augmenting the base neural model with a token-level symbolic datastore is a novel generation paradigm and has achieved promising results in machine translation (MT). In this paper, we introduce a unified framework kNN-BOX, which enables quick development and interactive analysis for this novel paradigm. kNN-BOX decomposes the datastore-augmentation approach into three modules: datastore, retriever and combiner, thus putting diverse kNN generation methods into a unified way. Currently, kNN-BOX has provided implementation of seven popular kNN-MT variants, covering research from performance enhancement to efficiency optimization. It is easy for users to reproduce these existing works or customize their own models. Besides, users can interact with their kNN generation systems with kNN-BOX to better understand the underlying inference process in a visualized way. In the experiment section, we apply kNN-BOX for machine translation and three other seq2seq generation tasks, namely, text simplification, paraphrase generation and question generation. Experiment results show that augmenting the base neural model with kNN-BOX leads to a large performance improvement in all these tasks. The code and document of kNN-BOX is available at https://github.com/NJUNLP/knn-box.

artificial intelligence, machine translation, natural language, (14 more...)

arXiv.org Artificial Intelligence

2302.13574

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Cross-Lingual Question Answering over Knowledge Base as Reading Comprehension

Zhang, Chen, Lai, Yuxuan, Feng, Yansong, Shen, Xingyu, Du, Haowei, Zhao, Dongyan

arXiv.org Artificial IntelligenceFeb-26-2023

Although many large-scale knowledge bases (KBs) claim to contain multilingual information, their support for many non-English languages is often incomplete. This incompleteness gives birth to the task of cross-lingual question answering over knowledge base (xKBQA), which aims to answer questions in languages different from that of the provided KB. One of the major challenges facing xKBQA is the high cost of data annotation, leading to limited resources available for further exploration. Another challenge is mapping KB schemas and natural language expressions in the questions under cross-lingual settings. In this paper, we propose a novel approach for xKBQA in a reading comprehension paradigm. We convert KB subgraphs into passages to narrow the gap between KB schemas and questions, which enables our model to benefit from recent advances in multilingual pre-trained language models (MPLMs) and cross-lingual machine reading comprehension (xMRC). Specifically, we use MPLMs, with considerable knowledge of cross-lingual mappings, for cross-lingual reading comprehension. Existing high-quality xMRC datasets can be further utilized to finetune our model, greatly alleviating the data scarcity issue in xKBQA. Extensive experiments on two xKBQA datasets in 12 languages show that our approach outperforms various baselines and achieves strong few-shot and zero-shot performance. Our dataset and code are released for further research.

computational linguistic, natural language, question answering, (18 more...)

arXiv.org Artificial Intelligence

2302.13241

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Russia (0.04)
Asia > Russia (0.04)
(8 more...)

Genre: Research Report (0.84)

Industry: Education > Assessment & Standards > Student Performance (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.91)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.72)

Add feedback

Resources for Turkish Natural Language Processing: A critical survey

Çöltekin, Çağrı, Doğruöz, A. Seza, Çetinoğlu, Özlem

arXiv.org Artificial IntelligenceFeb-25-2023

The recent (re)popularization of deep learning methods increased the importance and need for the data even further. Similarly, the other subfields of theoretical and applied linguistics have also seen a shift towards more data-driven methods. As a result, availability of large and high-quality language data is essential for both linguistic research and practical NLP applications. In this paper, we present a comprehensive and critical survey of linguistic resources for Turkish.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s10579-022-09605-4

2204.05042

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
(49 more...)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.67)
Research Report > New Finding (0.45)

Industry:

Media > News (1.00)
Education (1.00)
Government (0.67)
Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
(5 more...)

Add feedback

Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection

Xu, Weijia, Agrawal, Sweta, Briakou, Eleftheria, Martindale, Marianna J., Carpuat, Marine

arXiv.org Artificial IntelligenceFeb-24-2023

Neural sequence generation models are known to "hallucinate", by producing outputs that are unrelated to the source text. These hallucinations are potentially harmful, yet it remains unclear in what conditions they arise and how to mitigate their impact. In this work, we first identify internal model symptoms of hallucinations by analyzing the relative token contributions to the generation in contrastive hallucinated vs. non-hallucinated outputs generated via source perturbations. We then show that these symptoms are reliable indicators of natural hallucinations, by using them to design a lightweight hallucination detector which outperforms both model-free baselines and strong classifiers based on quality estimation or large pre-trained models on manually annotated English-Chinese and German-English translation test beds.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2301.07779

Country:

North America > United States > Maryland (0.05)
Asia > China > Hong Kong (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
(17 more...)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Jointly Optimizing Translations and Speech Timing to Improve Isochrony in Automatic Dubbing

Chronopoulou, Alexandra, Thompson, Brian, Mathur, Prashant, Virkar, Yogesh, Lakew, Surafel M., Federico, Marcello

arXiv.org Artificial IntelligenceFeb-24-2023

Automatic dubbing (AD) is the task of translating the original speech in a video into target language speech. The new target language speech should satisfy isochrony; that is, the new speech should be time aligned with the original video, including mouth movements, pauses, hand gestures, etc. In this paper, we propose training a model that directly optimizes both the translation as well as the speech duration of the generated translations. We show that this system generates speech that better matches the timing of the original speech, compared to prior work, while simplifying the system architecture.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2302.12979

Country:

North America > Canada > Quebec > Montreal (0.05)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(6 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Federated Nearest Neighbor Machine Translation

Du, Yichao, Zhang, Zhirui, Wu, Bingzhe, Liu, Lemao, Xu, Tong, Chen, Enhong

arXiv.org Artificial IntelligenceFeb-23-2023

To protect user privacy and meet legal regulations, federated learning (FL) is attracting significant attention. Training neural machine translation (NMT) models with traditional FL algorithms (e.g., FedAvg) typically relies on multi-round model-based interactions. However, it is impractical and inefficient for translation tasks due to the vast communication overheads and heavy synchronization. In this paper, we propose a novel Federated Nearest Neighbor (FedNN) machine translation framework that, instead of multi-round model-based interactions, leverages one-round memorization-based interaction to share knowledge across different clients and build low-overhead privacy-preserving systems. The whole approach equips the public NMT model trained on large-scale accessible data with a k-nearestneighbor (kNN) classifier and integrates the external datastore constructed by private text data from all clients to form the final FL model. A two-phase datastore encryption strategy is introduced to achieve privacy-preserving during this process. Extensive experiments show that FedNN significantly reduces computational and communication costs compared with FedAvg, while maintaining promising translation performance in different FL settings. In recent years, neural machine translation (NMT) has significantly improved translation quality (Bahdanau et al., 2015; Vaswani et al., 2017; Hassan et al., 2018) and has been widely adopted in many commercial systems. The current mainstream system is first built on a large-scale corpus collected by the service provider and then directly applied to translation tasks for different users and enterprises. However, this application paradigm faces two critical challenges in practice.

artificial intelligence, machine translation, natural language, (16 more...)

arXiv.org Artificial Intelligence

2302.12211

Country:

Europe (0.14)
Asia > China (0.04)
North America > United States > Virginia (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.88)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Simple and Scalable Nearest Neighbor Machine Translation

Dai, Yuhan, Zhang, Zhirui, Liu, Qiuzhi, Cui, Qu, Li, Weihua, Du, Yichao, Xu, Tong

arXiv.org Artificial IntelligenceFeb-23-2023

Despite being conceptually attractive, kNN-MT is burdened with massive storage requirements and high computational complexity since it conducts nearest neighbor searches over the entire reference corpus. In this paper, we propose a simple and scalable nearest neighbor machine translation framework to drastically promote the decoding and storage efficiency of kNN-based models while maintaining the translation performance. To this end, we dynamically construct an extremely small datastore for each input via sentence-level retrieval to avoid searching the entire datastore in vanilla kNN-MT, based on which we further introduce a distance-aware adapter to adaptively incorporate the kNN retrieval results into the pre-trained NMT models. Experiments on machine translation in two general settings, static domain adaptation, and online learning, demonstrate that our proposed approach not only achieves almost 90% speed as the NMT model without performance degradation, but also significantly reduces the storage requirements of kNN-MT. Domain adaptation is one of the fundamental challenges in machine learning which aspires to cope with the discrepancy across domain distributions and improve the generality of the trained models. It has attracted wide attention in the neural machine translation (NMT) area (Britz et al., 2017; Chen et al., 2017; Chu & Wang, 2018; Bapna & Firat, 2019; Bapna et al., 2019; Wei et al., 2020). Recently, kNN-MT and its variants (Khandelwal et al., 2021; Zheng et al., 2021a;b; Wang et al., 2022a) provide a new paradigm and have achieved remarkable performance for fast domain adaptation by retrieval pipelines. These approaches combine traditional NMT models (Bahdanau et al., 2015; Vaswani et al., 2017) with a token-level k-nearest-neighbour (kNN) retrieval mechanism, allowing it to directly access the domain-specific datastore to improve translation accuracy without fine-tuning the entire model.

datastore, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2302.12188

Country: Asia > China (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.34)

Add feedback

Machine Translation between Spoken Languages and Signed Languages Represented in SignWriting

Jiang, Zifan, Moryossef, Amit, Müller, Mathias, Ebling, Sarah

arXiv.org Artificial IntelligenceFeb-23-2023

This paper presents work on novel machine translation (MT) systems between spoken and signed languages, where signed languages are represented in SignWriting, a sign language writing system. Our work seeks to address the lack of out-of-the-box support for signed languages in current MT systems and is based on the SignBank dataset, which contains pairs of spoken language text and SignWriting content. We introduce novel methods to parse, factorize, decode, and evaluate SignWriting, leveraging ideas from neural factored MT. In a bilingual setup--translating from American Sign Language to (American) English--our method achieves over 30 BLEU, while in two multilingual setups--translating in both directions between spoken languages and signed languages--we achieve over 20 BLEU. We find that common MT techniques used to improve spoken language translation similarly affect the performance of sign language translation. These findings validate our use of an intermediate text representation for signed languages to include them in natural language processing research.

artificial intelligence, natural language, translation, (14 more...)

arXiv.org Artificial Intelligence

2210.05404

Country:

South America > Brazil (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
North America > Canada > Quebec (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry: Education > Curriculum > Subject-Specific Education (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback