Machine Translation
Lost in Translation: How Artificial Intelligence is Breaking the Language Barrier - DefinedCrowd
Human interaction with machines has experienced a great leap forward in recent years, largely driven by artificial intelligence (AI). From smart homes to self-driving cars, AI has become a seamless part of our daily lives. Voice interactions play a key role in many of these technological advances, most notably in language translation. Here, AI enables instant translation across a number of mediums: text, voice, images and even street signs. The technology works by recognizing individual words, then leveraging similarities in how various languages express the relationships between those words.
Machine Learning case study: GOOGLE
Machine learning is a sub-field of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning algorithms are usually categorized as supervised or unsupervised. Artificial Intelligence is a branch of computer science that endeavors to replicate or simulate human intelligence in a machine, so machines can perform tasks that typically require human intelligence. Some programmable functions of AI systems include planning, learning, reasoning, problem-solving, and decision making. My social, promotional, and primary mails might be different than what you have in your mailbox.
UniCase -- Rethinking Casing in Language Models
Powalski, Rafal, Stanislawek, Tomasz
In this paper, we introduce a new approach to dealing with the problem of case-sensitiveness in Language Modelling (LM). We propose simple architecture modification to the RoBERTa language model, accompanied by a new tokenization strategy, which we named Unified Case LM (UniCase). We tested our solution on the GLUE benchmark, which led to increased performance by 0.42 points. Moreover, we prove that the UniCase model works much better when we have to deal with text data, where all tokens are uppercased (+5.88 point).
A Technical Report: BUT Speech Translation Systems
Vydana, Hari Krishna, Burget, Lukas, Cernocky, Jan
The paper describes the BUT's speech translation systems. The systems are English$\longrightarrow$German offline speech translation systems. The systems are based on our previous works \cite{Jointly_trained_transformers}. Though End-to-End and cascade~(ASR-MT) spoken language translation~(SLT) systems are reaching comparable performances, a large degradation is observed when translating ASR hypothesis compared to the oracle input text. To reduce this performance degradation, we have jointly-trained ASR and MT modules with ASR objective as an auxiliary loss. Both the networks are connected through the neural hidden representations. This model has an End-to-End differentiable path with respect to the final objective function and also utilizes the ASR objective for better optimization. During the inference both the modules(i.e., ASR and MT) are connected through the hidden representations corresponding to the n-best hypotheses. Ensembling with independently trained ASR and MT models have further improved the performance of the system.
Translating lost languages using machine learning
Recent research suggests that most languages that have ever existed are no longer spoken. Dozens of these dead languages are also considered to be lost, or "undeciphered" -- that is, we don't know enough about their grammar, vocabulary, or syntax to be able to actually understand their texts. Lost languages are more than a mere academic curiosity; without them, we miss an entire body of knowledge about the people who spoke them. Unfortunately, most of them have such minimal records that scientists can't decipher them by using machine-translation algorithms like Google Translate. Some don't have a well-researched "relative" language to be compared to, and often lack traditional dividers like white space and punctuation.
Exploring Sequence-to-Sequence Models for SPARQL Pattern Composition
Panchbhai, Anand, Soru, Tommaso, Marx, Edgard
A booming amount of information is continuously added to the Internet as structured and unstructured data, feeding knowledge bases such as DBpedia and Wikidata with billions of statements describing millions of entities. The aim of Question Answering systems is to allow lay users to access such data using natural language without needing to write formal queries. However, users often submit questions that are complex and require a certain level of abstraction and reasoning to decompose them into basic graph patterns. In this short paper, we explore the use of architectures based on Neural Machine Translation called Neural SPARQL Machines to learn pattern compositions. We show that sequence-to-sequence models are a viable and promising option to transform long utterances into complex SPARQL queries.
Fluent and Low-latency Simultaneous Speech-to-Speech Translation with Self-adaptive Training
Zheng, Renjie, Ma, Mingbo, Zheng, Baigong, Liu, Kaibo, Yuan, Jiahong, Church, Kenneth, Huang, Liang
Simultaneous speech-to-speech translation is widely useful but extremely challenging, since it needs to generate target-language speech concurrently with the source-language speech, with only a few seconds delay. In addition, it needs to continuously translate a stream of sentences, but all recent solutions merely focus on the single-sentence scenario. As a result, current approaches accumulate latencies progressively when the speaker talks faster, and introduce unnatural pauses when the speaker talks slower. To overcome these issues, we propose Self-Adaptive Translation (SAT) which flexibly adjusts the length of translations to accommodate different source speech rates. At similar levels of translation quality (as measured by BLEU), our method generates more fluent target speech (as measured by the naturalness metric MOS) with substantially lower latency than the baseline, in both Zh <-> En directions.
The first AI model that translates 100 languages without relying on English data
Facebook AI is introducing, M2M-100 the first multilingual machine translation (MMT) model that translates between any pair of 100 languages without relying on English data. When translating, say, Chinese to French, previous best multilingual models train on Chinese to English and English to French, because English training data is the most widely available. Our model directly trains on Chinese to French data to better preserve meaning. It outperforms English-centric systems by 10 points on the widely used BLEU metric for evaluating machine translations. M2M-100 is trained on a total of 2,200 language directions -- or 10x more than previous best, English-centric multilingual models.
Facebook's new AI can translate languages directly into one another
Whether you're logging on from the US, Brazil, Borneo, or France, Facebook can translate virtually any written content published on its platform into the local language using automated machine translation. In fact, Facebook provides around 20 billion translations everyday for its News Feed alone. However these systems typically use English as an intermediary step -- that is, translating from Chinese to French actually goes Chinese to English to French. This is done because data sets of translations to and from English are massive and widely available but putting English in the middle reduces the overall translation accuracy while making the entire process more complex and cumbersome than it needs to be. That's why Facebook AI has developed a new MT model that can bidirectionally translate directly between two languages (Chinese to French and French to Chinese) without ever using English as a crutch -- and which outperforms the English-centric model by 10 points on BLEU metrics.
Bayesian Attention Modules
Fan, Xinjie, Zhang, Shujian, Chen, Bo, Zhou, Mingyuan
Attention modules, as simple and effective tools, have not only enabled deep neural networks to achieve state-of-the-art results in many domains, but also enhanced their interpretability. Most current models use deterministic attention modules due to their simplicity and ease of optimization. Stochastic counterparts, on the other hand, are less popular despite their potential benefits. The main reason is that stochastic attention often introduces optimization issues or requires significant model changes. In this paper, we propose a scalable stochastic version of attention that is easy to implement and optimize. We construct simplex-constrained attention distributions by normalizing reparameterizable distributions, making the training process differentiable. We learn their parameters in a Bayesian framework where a data-dependent prior is introduced for regularization. We apply the proposed stochastic attention modules to various attention-based models, with applications to graph node classification, visual question answering, image captioning, machine translation, and language understanding. Our experiments show the proposed method brings consistent improvements over the corresponding baselines.