AITopics

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

#artificialintelligenceMar-24-2022, 07:15:28 GMT

AI will Power Machine Translation to New Heights in 2022 - Enterprise Viewpoint

Machine translation has been around for many years. However, it wasn't until Google, Microsoft and others began developing machine translation that it grew into a serious competitive alternative to human translation. As a result, machine translation has made more progress in the last 10 years than the previous 50 years. Today, machine translation is used to produce billions of words daily and is fast closing in on human translation quality. At the heart of the improvement in machine translation quality is artificial intelligence.

machine translation, translation, translation system, (12 more...)

Industry: Information Technology (0.31)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

#artificialintelligenceMar-22-2022, 18:06:25 GMT

What You Never Knew About Attention Mechanisms

This blog is written and maintained by students in the Master of Science in Professional Computer Science Program at Simon Fraser University as part of their course credit. To learn more about this unique program, please visit {sfu.ca/computing/mpcs}. Where are your eyes drawn to in this photo? Most of us will admit that our eyes are drawn to the blue duckling. To humans, the blue duckling sticks out like a sore thumb.

attention mechanism, query, vector, (14 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.30)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Fierro, Constanza, Søgaard, Anders

Factual Consistency of Multilingual Pretrained Language Models

arXiv.org Artificial IntelligenceMar-22-2022

Pretrained language models can be queried for factual knowledge, with potential applications in knowledge base acquisition and tasks that require inference. However, for that, we need to know how reliable this knowledge is, and recent work has shown that monolingual English language models lack consistency when predicting factual knowledge, that is, they fill-in-the-blank differently for paraphrases describing the same fact. In this paper, we extend the analysis of consistency to a multilingual setting. We introduce a resource, mParaRel, and investigate (i) whether multilingual language models such as mBERT and XLM-R are more consistent than their monolingual counterparts; and (ii) if such models are equally consistent across languages. We find that mBERT is as inconsistent as English BERT in English paraphrases, but that both mBERT and XLM-R exhibit a high degree of inconsistency in English and even more so for all the other 45 languages.

computational linguistic, consistency, proceedings, (13 more...)

doi: 10.18653/v1/2022.findings-acl.240

2203.11552

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Hong Kong (0.04)
North America > United States > New York > New York County > New York City (0.04)
(7 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.89)

Nonaka, Keita, Yamanouchi, Kazutaka, I, Tomohiro, Okita, Tsuyoshi, Shimada, Kazutaka, Sakamoto, Hiroshi

LCP-dropout: Compression-based Multiple Subword Segmentation for Neural Machine Translation

arXiv.org Artificial IntelligenceMar-19-2022

In this study, we propose a simple and effective preprocessing method for subword segmentation based on a data compression algorithm. Compression-based subword segmentation has recently attracted significant attention as a preprocessing method for training data in Neural Machine Translation. Among them, BPE/BPE-dropout is one of the fastest and most effective method compared to conventional approaches. However, compression-based approach has a drawback in that generating multiple segmentations is difficult due to the determinism. To overcome this difficulty, we focus on a probabilistic string algorithm, called locally-consistent parsing (LCP), that has been applied to achieve optimum compression. Employing the probabilistic mechanism of LCP, we propose LCP-dropout for multiple subword segmentation that improves BPE/BPE-dropout, and show that it outperforms various baselines in learning from especially small training data.

artificial intelligence, lcp-dropout, natural language, (12 more...)

doi: 10.3390/electronics11071014

2202.1359

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Austria > Vienna (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(16 more...)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

arXiv.org Artificial IntelligenceMar-17-2022

Modeling Intensification for Sign Language Generation: A Computational Approach

İnan, Mert, Zhong, Yang, Hassan, Sabit, Quandt, Lorna, Alikhani, Malihe

End-to-end sign language generation models do not accurately represent the prosody in sign language. A lack of temporal and spatial variations leads to poor-quality generated presentations that confuse human interpreters. In this paper, we aim to improve the prosody in generated sign languages by modeling intensification in a data-driven manner. We present different strategies grounded in linguistics of sign language that inform how intensity modifiers can be represented in gloss annotations. To employ our strategies, we first annotate a subset of the benchmark PHOENIX-14T, a German Sign Language dataset, with different levels of intensification. We then use a supervised intensity tagger to extend the annotated dataset and obtain labels for the remaining portion of it. This enhanced dataset is then used to train state-of-the-art transformer models for sign language generation. We find that our efforts in intensification modeling yield better results when evaluated with automatic metrics. Human evaluation also indicates a higher preference of the videos generated using our model.

artificial intelligence, machine learning, natural language, (20 more...)

doi: 10.18653/v1/2022.findings-acl.228

2203.09679

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.82)

Lam, Tsz Kin, Schamoni, Shigehiko, Riezler, Stefan

Sample, Translate, Recombine: Leveraging Audio Alignments for Data Augmentation in End-to-end Speech Translation

arXiv.org Artificial IntelligenceMar-16-2022

End-to-end speech translation relies on data that pair source-language speech inputs with corresponding translations into a target language. Such data are notoriously scarce, making synthetic data augmentation by back-translation or knowledge distillation a necessary ingredient of end-to-end training. In this paper, we present a novel approach to data augmentation that leverages audio alignments, linguistic properties, and translation. First, we augment a transcription by sampling from a suffix memory that stores text and audio data. Second, we translate the augmented transcript. Finally, we recombine concatenated audio segments and the generated translation. Besides training an MT-system, we only use basic off-the-shelf components without fine-tuning. While having similar resource demands as knowledge distillation, adding our method delivers consistent improvements of up to 0.9 and 1.1 BLEU points on five language pairs on CoVoST 2 and on two language pairs on Europarl-ST, respectively.

covost 2, machine learning, natural language, (19 more...)

doi: 10.18653/v1/2022.acl-short.27

2203.08757

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Quebec > Montreal (0.04)
Europe > Czechia > South Moravian Region > Brno (0.04)
(12 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Lupo, Lorenzo, Dinarelli, Marco, Besacier, Laurent

Divide and Rule: Effective Pre-Training for Context-Aware Multi-Encoder Translation Models

arXiv.org Artificial IntelligenceMar-15-2022

Multi-encoder models are a broad family of context-aware neural machine translation systems that aim to improve translation quality by encoding document-level contextual information alongside the current sentence. The context encoding is undertaken by contextual parameters, trained on document-level data. In this work, we discuss the difficulty of training these parameters effectively, due to the sparsity of the words in need of context (i.e., the training signal), and their relevant context. We propose to pre-train the contextual parameters over split sentence pairs, which makes an efficient use of the available data for two reasons. Firstly, it increases the contextual training signal by breaking intra-sentential syntactic relations, and thus pushing the model to search the context for disambiguating clues more frequently. Secondly, it eases the retrieval of relevant context, since context segments become shorter. We propose four different splitting methods, and evaluate our approach with BLEU and contrastive test sets. Results show that it consistently improves learning of contextual parameters, both in low and high resource settings.

artificial intelligence, computational linguistic, natural language, (17 more...)

doi: 10.18653/v1/2022.acl-long.312

2103.17151

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Germany > Saxony > Leipzig (0.04)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
(16 more...)

Genre: Research Report > New Finding (0.87)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

#artificialintelligenceMar-11-2022, 13:09:14 GMT

Know More About Natural Language Processing (NLP) & AI

Natural language processing (NLP) is an area of artificial intelligence (AI) that focuses on assisting computers in understanding how humans write and communicate. This is a difficult task because of the large amount of unstructured data. Individuals' speaking and writing styles are unique, and they are continually changing to suit widespread usage. Understanding context is another issue that requires semantic analysis to be solved by machine learning. Natural language understanding (NLU) is a sub-branch of natural language processing (NLP) that deals with these complexities through machine reading comprehension rather than merely comprehending literal meanings. These functions improve as we write, speak, and converse with computers more: they are constantly learning.

artificial intelligence, machine translation, natural language processing, (12 more...)

Industry:

Health & Medicine (0.74)
Media > News (0.49)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

#artificialintelligenceMar-8-2022, 19:10:21 GMT

Meta's machine translation journey

There are around 7000 languages spoken globally, but most translation models focus on English and other popular languages. This excludes a major part of the world from the benefit of having access to content, technologies and other advantages of being online. Tech giants are trying to bridge this gap. Just days back, Meta announced that it plans to bring out a Universal Speech Translator to translate speech from one language to another in real-time. This announcement is not surprising to anyone who follows the company closely. Meta has been devoted to bringing innovations in machine translations for quite some time now.

low-resource language, machine translation journey, meta, (4 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)