AITopics | Machine Translation

Collaborating Authors

Machine Translation

"Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains."
– Definition from the European Association for Machine Translation (EAMT).

You can translate text of your choice by using free translators such as: CAPITA, Google Translate, SDL International, SYSTRAN.

News Overviews Instructional Materials AI-Alerts Classics

HLT-MT: High-resource Language-specific Training for Multilingual Neural Machine Translation

Yang, Jian, Yin, Yuwei, Ma, Shuming, Zhang, Dongdong, Li, Zhoujun, Wei, Furu

arXiv.org Artificial IntelligenceJul-15-2022

Multilingual neural machine translation (MNMT) trained in multiple language pairs has attracted considerable attention due to fewer model parameters and lower training costs by sharing knowledge among multiple languages. Nonetheless, multilingual training is plagued by language interference degeneration in shared parameters because of the negative interference among different translation directions, especially on high-resource languages. In this paper, we propose the multilingual translation model with the high-resource language-specific training (HLT-MT) to alleviate the negative interference, which adopts the two-stage training with the language-specific selection mechanism. Specifically, we first train the multilingual model only with the high-resource pairs and select the language-specific modules at the top of the decoder to enhance the translation quality of high-resource directions. Next, the model is further trained on all available corpora to transfer knowledge from high-resource languages (HRLs) to low-resource languages (LRLs). Experimental results show that HLT-MT outperforms various strong baselines on WMT-10 and OPUS-100 benchmarks. Furthermore, the analytic experiments validate the effectiveness of our method in mitigating the negative interference in multilingual training.

low-resource language, machine translation, translation, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.24963/ijcai.2022/619

2207.04906

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Boosting Span-based Joint Entity and Relation Extraction via Squence Tagging Mechanism

Ji, Bin, Li, Shasha, Yu, Jie, Ma, Jun, Liu, Huijun

arXiv.org Artificial IntelligenceJul-15-2022

Span-based joint extraction simultaneously conducts named entity recognition (NER) and relation extraction (RE) in text span form. Recent studies have shown that token labels can convey crucial task-specific information and enrich token semantics. However, as far as we know, due to completely abstain from sequence tagging mechanism, all prior span-based work fails to use token label in-formation. To solve this problem, we pro-pose Sequence Tagging enhanced Span-based Network (STSN), a span-based joint extrac-tion network that is enhanced by token BIO label information derived from sequence tag-ging based NER. By stacking multiple atten-tion layers in depth, we design a deep neu-ral architecture to build STSN, and each atten-tion layer consists of three basic attention units. The deep neural architecture first learns seman-tic representations for token labels and span-based joint extraction, and then constructs in-formation interactions between them, which also realizes bidirectional information interac-tions between span-based NER and RE. Fur-thermore, we extend the BIO tagging scheme to make STSN can extract overlapping en-tity. Experiments on three benchmark datasets show that our model consistently outperforms previous optimal models by a large margin, creating new state-of-the-art results.

computational linguistic, representation, semantic representation, (14 more...)

arXiv.org Artificial Intelligence

2105.1008

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy > Tuscany > Florence (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
(10 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback

From Start to Finish: Latency Reduction Strategies for Incremental Speech Synthesis in Simultaneous Speech-to-Speech Translation

Liu, Danni, Wang, Changhan, Gong, Hongyu, Ma, Xutai, Tang, Yun, Pino, Juan

arXiv.org Artificial IntelligenceJul-15-2022

Speech-to-speech translation (S2ST) converts input speech to speech in another language. A challenge of delivering S2ST in real time is the accumulated delay between the translation and speech synthesis modules. While recently incremental text-to-speech (iTTS) models have shown large quality improvements, they typically require additional future text inputs to reach optimal performance. In this work, we minimize the initial waiting time of iTTS by adapting the upstream speech translator to generate high-quality pseudo lookahead for the speech synthesizer. After mitigating the initial delay, we demonstrate that the duration of synthesized speech also plays a crucial role on latency. We formalize this as a latency metric and then present a simple yet effective duration-scaling approach for latency reduction. Our approaches consistently reduce latency by 0.2-0.5 second without sacrificing speech translation quality.

latency, lookahead, pseudo lookahead, (17 more...)

arXiv.org Artificial Intelligence

2110.08214

Country:

North America > United States (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Netherlands > Limburg > Maastricht (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

No Language Left Behind

#artificialintelligenceJul-14-2022, 16:50:17 GMT

Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. The limits of my language mean the limits of my world.

artificial intelligence, machine translation, natural language, (15 more...)

#artificialintelligence

Country:

Europe (0.05)
South America (0.05)
North America > United States (0.05)
(2 more...)

Industry: Government (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Open Terminology Management and Sharing Toolkit for Federation of Terminology Databases

Lagzdiņš, Andis, Siliņš, Uldis, Pinnis, Mārcis, Bergmanis, Toms, Vasiļevskis, Artūrs, Vasiļjevs, Andrejs

arXiv.org Artificial IntelligenceJul-14-2022

Consolidated access to current and reliable terms from different subject fields and languages is necessary for content creators and translators. Terminology is also needed in AI applications such as machine translation, speech recognition, information extraction, and other natural language processing tools. In this work, we facilitate standards-based sharing and management of terminology resources by providing an open terminology management solution - the EuroTermBank Toolkit. It allows organisations to manage and search their terms, create term collections, and share them within and outside the organisation by participating in the network of federated databases. The data curated in the federated databases are automatically shared with EuroTermBank, the largest multilingual terminology resource in Europe, allowing translators and language service providers as well as researchers and students to access terminology resources in their most current version.

eurotermbank, terminology, translation, (14 more...)

arXiv.org Artificial Intelligence

2207.06729

Country:

Europe > Latvia > Riga Municipality > Riga (0.05)
Europe > Estonia (0.05)
North America > Dominican Republic (0.04)
(8 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Meta's AI machine translation research to help break language barriers

#artificialintelligenceJul-13-2022, 14:25:53 GMT

Meta has announced that it has built and open-sourced'No Language Left Behind' NLLB-200, a single Artificial Intelligence (AI) model that is the first to translate across 200 different languages, including 55 African languages with state-of-the-art results. Meta is using the modelling techniques and learnings from the project to improve and extend translations on Facebook, Instagram, and Wikipedia. In an effort to develop high-quality machine translation capabilities for most of the world's low-resource languages, this single AI model was designed with a focus on African languages. They are challenging from a machine translation perspective. AI models require lots and lots of data to help them learn, and there's not a lot of human-translated training data for these languages.

artificial intelligence, machine translation research, natural language, (13 more...)

#artificialintelligence

Country: Africa (0.09)

Genre: Press Release (0.36)

Industry: Law (0.32)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Interactive Machine Learning: A State of the Art Review

Wondimu, Natnael A., Buche, Cédric, Visser, Ubbo

arXiv.org Artificial IntelligenceJul-13-2022

Machine learning has proved useful in many software disciplines, including computer vision, speech and audio processing, natural language processing, robotics and some other fields. However, its applicability has been significantly hampered due its black-box nature and significant resource consumption. Performance is achieved at the expense of enormous computational resource and usually compromising the robustness and trustworthiness of the model. Recent researches have been identifying a lack of interactivity as the prime source of these machine learning problems. Consequently, interactive machine learning (iML) has acquired increased attention of researchers on account of its human-in-the-loop modality and relatively efficient resource utilization. Thereby, a state-of-the-art review of interactive machine learning plays a vital role in easing the effort toward building human-centred models. In this paper, we provide a comprehensive analysis of the state-of-the-art of iML. We analyze salient research works using merit-oriented and application/task oriented mixed taxonomy. We use a bottom-up clustering approach to generate a taxonomy of iML research works. Research works on adversarial black-box attacks and corresponding iML based defense system, exploratory machine learning, resource constrained learning, and iML performance evaluation are analyzed under their corresponding theme in our merit-oriented taxonomy. We have further classified these research works into technical and sectoral categories. Finally, research opportunities that we believe are inspiring for future work in iML are discussed thoroughly.

iml, learning, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2207.06196

Country:

Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
Oceania > Australia > South Australia > Adelaide (0.04)
North America > United States > Florida > Miami-Dade County > Coral Gables (0.04)
(3 more...)

Genre:

Overview (0.93)
Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
(2 more...)

Add feedback

Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation

Fukuda, Ryo, Sudoh, Katsuhito, Nakamura, Satoshi

arXiv.org Artificial IntelligenceJul-13-2022

Speech segmentation, which splits long speech into short segments, is essential for speech translation (ST). Popular VAD tools like WebRTC VAD have generally relied on pause-based segmentation. Unfortunately, pauses in speech do not necessarily match sentence boundaries, and sentences can be connected by a very short pause that is difficult to detect by VAD. In this study, we propose a speech segmentation method using a binary classification model trained using a segmented bilingual speech corpus. We also propose a hybrid method that combines VAD and the above speech segmentation method. Experimental results revealed that the proposed method is more suitable for cascade and end-to-end ST systems than conventional segmentation methods. The hybrid approach further improved the translation performance.

proceedings, segmentation, translation, (13 more...)

arXiv.org Artificial Intelligence

2203.15479

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(4 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech > Acoustic Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Crossing the Conversational Chasm: A Primer on Natural Language Processing for Multilingual Task-Oriented Dialogue Systems

Razumovskaia, Evgeniia (Language Technology Lab, University of Cambridge, UK) | Glavas, Goran (Data and Web Science Group, University of Mannheim, Germany) | Majewska, Olga (Language Technology Lab, University of Cambridge, UK) | Ponti, Edoardo M. (Mila - Quebec AI Institute and McGill University, Canada) | Korhonen, Anna (University of Cambridge, UK) | Vulic, Ivan (Language Technology Lab, University of Cambridge, UK)

Journal of Artificial Intelligence ResearchJul-13-2022

In task-oriented dialogue (ToD), a user holds a conversation with an artificial agent with the aim of completing a concrete task. Although this technology represents one of the central objectives of AI and has been the focus of ever more intense research and development efforts, it is currently limited to a few narrow domains (e.g., food ordering, ticket booking) and a handful of languages (e.g., English, Chinese). This work provides an extensive overview of existing methods and resources in multilingual ToD as an entry point to this exciting and emerging field. We find that the most critical factor preventing the creation of truly multilingual ToD systems is the lack of datasets in most languages for both training and evaluation. In fact, acquiring annotations or human feedback for each component of modular systems or for data-hungry end-to-end systems is expensive and tedious. Hence, state-of-the-art approaches to multilingual ToD mostly rely on (zero- or few-shot) cross-lingual transfer from resource-rich languages (almost exclusively English), either by means of (i) machine translation or (ii) multilingual representations. These approaches are currently viable only for typologically similar languages and languages with parallel / monolingual corpora available. On the other hand, their effectiveness beyond these boundaries is doubtful or hard to assess due to the lack of linguistically diverse benchmarks (especially for natural language generation and end-to-end evaluation). To overcome this limitation, we draw parallels between components of the ToD pipeline and other NLP tasks, which can inspire solutions for learning in low-resource scenarios. Finally, we list additional challenges that multilinguality poses for related areas (such as speech, fluency in generated text, and human-centred evaluation), and indicate future directions that hold promise to further expand language coverage and dialogue capabilities of current ToD systems.

computational linguistic, dataset, proceedings, (11 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.13083

AI Access Foundation

13083

Journal of Artificial Intelligence Research

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy > Tuscany > Florence (0.04)
(24 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.65)

Industry:

Information Technology (1.00)
Health & Medicine (0.92)
Education (0.92)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(5 more...)

Add feedback

A General Contextualized Rewriting Framework for Text Summarization

Bao, Guangsheng, Zhang, Yue

arXiv.org Artificial IntelligenceJul-12-2022

The rewriting method for text summarization combines extractive and abstractive approaches, improving the conciseness and readability of extractive summaries using an abstractive model. Exiting rewriting systems take each extractive sentence as the only input, which is relatively focused but can lose necessary background knowledge and discourse context. In this paper, we investigate contextualized rewriting, which consumes the entire document and considers the summary context. We formalize contextualized rewriting as a seq2seq with group-tag alignments, introducing group-tag as a solution to model the alignments, identifying extractive sentences through content-based addressing. Results show that our approach significantly outperforms non-contextualized rewriting systems without requiring reinforcement learning, achieving strong improvements on ROUGE scores upon multiple extractors.

computational linguistic, extractive sentence, rewriter, (15 more...)

arXiv.org Artificial Intelligence

2207.05948

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > China > Hong Kong (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(14 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Media (0.68)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback