AITopics

doi: 10.18653/v1/2022.findings-acl.228

2203.09679

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.82)

Lam, Tsz Kin, Schamoni, Shigehiko, Riezler, Stefan

Sample, Translate, Recombine: Leveraging Audio Alignments for Data Augmentation in End-to-end Speech Translation

arXiv.org Artificial IntelligenceMar-16-2022

End-to-end speech translation relies on data that pair source-language speech inputs with corresponding translations into a target language. Such data are notoriously scarce, making synthetic data augmentation by back-translation or knowledge distillation a necessary ingredient of end-to-end training. In this paper, we present a novel approach to data augmentation that leverages audio alignments, linguistic properties, and translation. First, we augment a transcription by sampling from a suffix memory that stores text and audio data. Second, we translate the augmented transcript. Finally, we recombine concatenated audio segments and the generated translation. Besides training an MT-system, we only use basic off-the-shelf components without fine-tuning. While having similar resource demands as knowledge distillation, adding our method delivers consistent improvements of up to 0.9 and 1.1 BLEU points on five language pairs on CoVoST 2 and on two language pairs on Europarl-ST, respectively.

covost 2, machine learning, natural language, (19 more...)

doi: 10.18653/v1/2022.acl-short.27

2203.08757

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Quebec > Montreal (0.04)
Europe > Czechia > South Moravian Region > Brno (0.04)
(12 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Lupo, Lorenzo, Dinarelli, Marco, Besacier, Laurent

Divide and Rule: Effective Pre-Training for Context-Aware Multi-Encoder Translation Models

arXiv.org Artificial IntelligenceMar-15-2022

Multi-encoder models are a broad family of context-aware neural machine translation systems that aim to improve translation quality by encoding document-level contextual information alongside the current sentence. The context encoding is undertaken by contextual parameters, trained on document-level data. In this work, we discuss the difficulty of training these parameters effectively, due to the sparsity of the words in need of context (i.e., the training signal), and their relevant context. We propose to pre-train the contextual parameters over split sentence pairs, which makes an efficient use of the available data for two reasons. Firstly, it increases the contextual training signal by breaking intra-sentential syntactic relations, and thus pushing the model to search the context for disambiguating clues more frequently. Secondly, it eases the retrieval of relevant context, since context segments become shorter. We propose four different splitting methods, and evaluate our approach with BLEU and contrastive test sets. Results show that it consistently improves learning of contextual parameters, both in low and high resource settings.

artificial intelligence, computational linguistic, natural language, (17 more...)

doi: 10.18653/v1/2022.acl-long.312

2103.17151

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Germany > Saxony > Leipzig (0.04)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
(16 more...)

Genre: Research Report > New Finding (0.87)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

#artificialintelligenceMar-11-2022, 13:09:14 GMT

Know More About Natural Language Processing (NLP) & AI

Natural language processing (NLP) is an area of artificial intelligence (AI) that focuses on assisting computers in understanding how humans write and communicate. This is a difficult task because of the large amount of unstructured data. Individuals' speaking and writing styles are unique, and they are continually changing to suit widespread usage. Understanding context is another issue that requires semantic analysis to be solved by machine learning. Natural language understanding (NLU) is a sub-branch of natural language processing (NLP) that deals with these complexities through machine reading comprehension rather than merely comprehending literal meanings. These functions improve as we write, speak, and converse with computers more: they are constantly learning.

artificial intelligence, machine translation, natural language processing, (12 more...)

Industry:

Health & Medicine (0.74)
Media > News (0.49)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

#artificialintelligenceMar-8-2022, 19:10:21 GMT

Meta's machine translation journey

There are around 7000 languages spoken globally, but most translation models focus on English and other popular languages. This excludes a major part of the world from the benefit of having access to content, technologies and other advantages of being online. Tech giants are trying to bridge this gap. Just days back, Meta announced that it plans to bring out a Universal Speech Translator to translate speech from one language to another in real-time. This announcement is not surprising to anyone who follows the company closely. Meta has been devoted to bringing innovations in machine translations for quite some time now.

low-resource language, machine translation journey, meta, (4 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

#artificialintelligenceMar-8-2022, 03:37:56 GMT

Digital Babel Fish: The holy grail of Conversational AI

Yesterday's science fiction is today's invention. Babel Fish, the "oddest thing in the universe", is a species of fish featured in Douglas Adam's magnum opus, The Hitchhiker's Guide to Galaxy. The fish, worn as an earpiece, translates all the languages that ever existed instantly. Babel Fish is no longer the stuff of dreams: Thanks to advances in AI, especially in the NLP domain, many tech giants are in the process of building a universal translator. To that end, Universal Speech Translator was a dominant theme in the Meta's Inside the Lab event on February 23.

babel fish, translation, translator, (14 more...)

Country: Europe > Italy (0.05)

Genre: Personal > Honors (0.31)

Industry: Information Technology (0.53)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.80)

arXiv.org Artificial IntelligenceMar-7-2022

Semantic-Preserving Linguistic Steganography by Pivot Translation and Semantic-Aware Bins Coding

Yang, Tianyu, Wu, Hanzhou, Yi, Biao, Feng, Guorui, Zhang, Xinpeng

Linguistic steganography (LS) aims to embed secret information into a highly encoded text for covert communication. It can be roughly divided to two main categories, i.e., modification based LS (MLS) and generation based LS (GLS). Unlike MLS that hides secret data by slightly modifying a given text without impairing the meaning of the text, GLS uses a trained language model to directly generate a text carrying secret data. A common disadvantage for MLS methods is that the embedding payload is very low, whose return is well preserving the semantic quality of the text. In contrast, GLS allows the data hider to embed a high payload, which has to pay the high price of uncontrollable semantics. In this paper, we propose a novel LS method to modify a given text by pivoting it between two different languages and embed secret data by applying a GLS-like information encoding strategy. Our purpose is to alter the expression of the given text, enabling a high payload to be embedded while keeping the semantic information unchanged. Experimental results have shown that the proposed work not only achieves a high embedding payload, but also shows superior performance in maintaining the semantic consistency and resisting linguistic steganalysis.

information, machine learning, natural language, (19 more...)

2203.03795

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceMar-4-2022, 16:15:30 GMT

Baidu Launches Digital Platform for AI Sign Language

Baidu AI Cloud launched a sign language platform on Thursday, able to generate digital avatars for sign language translation and live interpretation within minutes. Released as a new offering of Baidu AI Cloud's digital avatar platform XiLing, this new product aims to help break down communication barriers for the deaf and hard-of-hearing (DHH) community by boosting the accessibility of automated sign language translation. An AI sign language interpreter developed using the platform will perform its duties during the upcoming 2022 Beijing Winter Paralympic Games. Also released with the platform on Thursday were two all-in-one AI sign language translators, providing one-stop solutions with a streamlined set-up process and plug-and-use features. With the technological changes brought by AI, production and operational costs of digital avatars have been reduced to a significant degree, making it possible for AI sign language to scale up and serve more DHH individuals, said Tian Wu, Baidu Corporate Vice President.

language translation, platform, sign language translation, (7 more...)

Country: Asia > China > Beijing > Beijing (0.26)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.72)

#artificialintelligenceMar-1-2022, 00:25:16 GMT

What to Expect from the Language Industry in 2022

The language industry is having a moment. The ongoing global health crisis has forced organizations to break down borders and support a global remote workforce, requiring more cross-language interactions and coordination than ever before. At the same time, technological innovations in the language translation industry are at an all time high. We've never before had access to such sophisticated technology tools to manage translation processes. I predict it's going to be an exciting year in the industry, with an unprecedented level of innovation.

language industry, translation industry, translation model, (12 more...)

Industry: Information Technology (0.31)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.74)

#artificialintelligenceFeb-27-2022, 23:25:13 GMT

Paper Review: Meta-Learning for Low-Resource Neural Machine Translation

So, without further ado, let's jump into this awesome paper. This paper talks about low resource Neural Machine Translation which means translating less common language to English or other famous languages. This task is defined as a task under the umbrella of Meta-learning because there is not a lot of translation present for languages like Romanian or other regional languages. The proposed methodology should learn from the commonly available language translations and use that knowledge to convert Romanian or Finnish to English. Let's define the problem in a technical manner.

language pair, low-resource neural machine translation, neural machine translation, (12 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)