Goto

Collaborating Authors

 Machine Translation


Digital Babel Fish: The holy grail of Conversational AI

#artificialintelligence

Yesterday's science fiction is today's invention. Babel Fish, the "oddest thing in the universe", is a species of fish featured in Douglas Adam's magnum opus, The Hitchhiker's Guide to Galaxy. The fish, worn as an earpiece, translates all the languages that ever existed instantly. Babel Fish is no longer the stuff of dreams: Thanks to advances in AI, especially in the NLP domain, many tech giants are in the process of building a universal translator. To that end, Universal Speech Translator was a dominant theme in the Meta's Inside the Lab event on February 23.


Semantic-Preserving Linguistic Steganography by Pivot Translation and Semantic-Aware Bins Coding

arXiv.org Artificial Intelligence

Linguistic steganography (LS) aims to embed secret information into a highly encoded text for covert communication. It can be roughly divided to two main categories, i.e., modification based LS (MLS) and generation based LS (GLS). Unlike MLS that hides secret data by slightly modifying a given text without impairing the meaning of the text, GLS uses a trained language model to directly generate a text carrying secret data. A common disadvantage for MLS methods is that the embedding payload is very low, whose return is well preserving the semantic quality of the text. In contrast, GLS allows the data hider to embed a high payload, which has to pay the high price of uncontrollable semantics. In this paper, we propose a novel LS method to modify a given text by pivoting it between two different languages and embed secret data by applying a GLS-like information encoding strategy. Our purpose is to alter the expression of the given text, enabling a high payload to be embedded while keeping the semantic information unchanged. Experimental results have shown that the proposed work not only achieves a high embedding payload, but also shows superior performance in maintaining the semantic consistency and resisting linguistic steganalysis.


Baidu Launches Digital Platform for AI Sign Language

#artificialintelligence

Baidu AI Cloud launched a sign language platform on Thursday, able to generate digital avatars for sign language translation and live interpretation within minutes. Released as a new offering of Baidu AI Cloud's digital avatar platform XiLing, this new product aims to help break down communication barriers for the deaf and hard-of-hearing (DHH) community by boosting the accessibility of automated sign language translation. An AI sign language interpreter developed using the platform will perform its duties during the upcoming 2022 Beijing Winter Paralympic Games. Also released with the platform on Thursday were two all-in-one AI sign language translators, providing one-stop solutions with a streamlined set-up process and plug-and-use features. With the technological changes brought by AI, production and operational costs of digital avatars have been reduced to a significant degree, making it possible for AI sign language to scale up and serve more DHH individuals, said Tian Wu, Baidu Corporate Vice President.


What to Expect from the Language Industry in 2022

#artificialintelligence

The language industry is having a moment. The ongoing global health crisis has forced organizations to break down borders and support a global remote workforce, requiring more cross-language interactions and coordination than ever before. At the same time, technological innovations in the language translation industry are at an all time high. We've never before had access to such sophisticated technology tools to manage translation processes. I predict it's going to be an exciting year in the industry, with an unprecedented level of innovation.


Paper Review: Meta-Learning for Low-Resource Neural Machine Translation

#artificialintelligence

So, without further ado, let's jump into this awesome paper. This paper talks about low resource Neural Machine Translation which means translating less common language to English or other famous languages. This task is defined as a task under the umbrella of Meta-learning because there is not a lot of translation present for languages like Romanian or other regional languages. The proposed methodology should learn from the commonly available language translations and use that knowledge to convert Romanian or Finnish to English. Let's define the problem in a technical manner.


Mark Zuckerberg demos a tool for building virtual worlds using voice commands – TechCrunch

#artificialintelligence

Meta, formerly known as Facebook, today showed off a prototype of an AI system that enables people to generate or import things into a virtual world just by using voice commands. The company sees the tool, which is called "Builder Bot," as an "exploratory concept" that shows AI's potential for creating new worlds in the metaverse. Meta CEO Mark Zuckerberg showed off the prototype at the Meta AI: Inside the Lab event on Wednesday in a pre-recorded demo video. In the video, Zuckerberg explained the process of building parts of a virtual world by describing them. He begins with the prompt, "let's go to a park."


Meta wants to build a universal language translator

Engadget

During an Inside the Lab: Building for the metaverse with AI livestream event on Wednesday, Meta CEO Mark Zuckerberg didn't just expound on his company's unblinking vision for the future, dubbed the Metaverse. He also revealed that Meta's research division is working on a universal speech translation system that could streamline users' interactions with AI within the company's digital universe. "The big goal here is to build a universal model that can incorporate knowledge across all modalities... all the information that is captured through rich sensors," Zuckerberg said. "This will enable a vast scale of predictions, decisions, and generation as well as whole new architectures training methods and algorithms that can learn from a vast and diverse range of different inputs." Zuckerberg noted that Facebook has continually striven to develop technologies that enable more people worldwide to access the internet and is confident that those efforts will translate to the Metaverse as well.


Control formality in machine translated text using Amazon Translate

#artificialintelligence

Amazon Translate is a neural machine translation service that delivers fast, high-quality, affordable, and customizable language translation. Amazon Translate now supports formality customization. This feature allows you to customize the level of formality in your translation output. At the time of writing, the formality customization feature is available for six target languages: French, German, Hindi, Italian, Japanese, and Spanish. You can customize the formality of your translated output to suit your communication needs.


Revisiting the Evaluation Metrics of Paraphrase Generation

arXiv.org Artificial Intelligence

Paraphrase generation is an important NLP task that has achieved significant progress recently. However, one crucial problem is overlooked, `how to evaluate the quality of paraphrase?'. Most existing paraphrase generation models use reference-based metrics (e.g., BLEU) from neural machine translation (NMT) to evaluate their generated paraphrase. Such metrics' reliability is hardly evaluated, and they are only plausible when there exists a standard reference. Therefore, this paper first answers one fundamental question, `Are existing metrics reliable for paraphrase generation?'. We present two conclusions that disobey conventional wisdom in paraphrasing generation: (1) existing metrics poorly align with human annotation in system-level and segment-level paraphrase evaluation. (2) reference-free metrics outperform reference-based metrics, indicating that the standard references are unnecessary to evaluate the paraphrase's quality. Such empirical findings expose a lack of reliable automatic evaluation metrics. Therefore, this paper proposes BBScore, a reference-free metric that can reflect the generated paraphrase's quality. BBScore consists of two sub-metrics: S3C score and SelfBLEU, which correspond to two criteria for paraphrase evaluation: semantic preservation and diversity. By connecting two sub-metrics, BBScore significantly outperforms existing paraphrase evaluation metrics.


Sequence-to-Sequence Resources for Catalan

arXiv.org Artificial Intelligence

In this work, we introduce sequence-to-sequence language resources for Catalan, a moderately under-resourced language, towards two tasks, namely: Summarization and Machine Translation (MT). We present two new abstractive summarization datasets in the domain of newswire. We also introduce a parallel Catalan-English corpus, paired with three different brand new test sets. Finally, we evaluate the data presented with competing state of the art models, and we develop baselines for these tasks using a newly created Catalan BART. We release the resulting resources of this work under open license to encourage the development of language technology in Catalan.