Goto

Collaborating Authors

 Machine Translation


Machines may never master the distinctly human elements of language

#artificialintelligence

Artificial intelligence is difficult to develop because real intelligence is mysterious. This mystery manifests in language, or "the dress of thought" as the writer Samuel Johnson put it, and language remains a major challenge to the development of artificial intelligence. "There's no way you can have an AI system that's humanlike that doesn't have language at the heart of it," Josh Tenenbaum, a professor of cognitive science and computation at MIT told Technology Review in August. In September, Google announced that its Neural Machine Translation (GNMT) system can now "in some cases" produce translations that are "nearly indistinguishable" from those of humans. "Machine translation is by no means solved. GNMT can still make significant errors that a human translator would never make, like dropping words and mistranslating proper names or rare terms, and translating sentences in isolation rather than considering the context of the paragraph or page."


A Computer Can Now Translate Languages as Well as a Human

#artificialintelligence

Have you ever been in a situation where knowing another language would have come in handy? I remember standing on the platform at Tokyo Station watching my train to Nagano -- the last train of the day -- pulling away without me on it. What ensued was a frustrating hour of gestures, confused smiles, and head-shaking as I wandered the station looking for someone who spoke English (my Japanese is unfortunately nonexistent). It would have been really helpful to have a bilingual pal along with me to translate. Bilingual pals can be hard to find, but Google's new translation software may be an equally useful alternative.


lmthang/nmt.matlab

@machinelearnbot

Code to train Neural Machine Translation systems as described in our EMNLP paper Effective Approaches to Attention-based Neural Machine Translation. Here, we convert train/valid/test files in text format into integer format that can be handled efficiently in Matlab. This trains a very basic model with all the default settings. We set'isResume' to 0 so that it will train a new model each time you run the command instead of loading existing models. Decode with beamSize 2, collect maximum 10 translations, batchSize 1.


WIPO Develops Cutting-Edge Translation Tool For Patent Documents

#artificialintelligence

The World Intellectual Property Organization has developed a ground-breaking new "artificial intelligence"-based translation tool for patent documents, handing innovators around the world the highest-quality service yet available for accessing information on new technologies. WIPO Translate now incorporates cutting-edge neural machine translation technology to render highly technical patent documents into a second language in a style and syntax that more closely mirrors common usage, out-performing other translation tools built on previous technologies. WIPO has initially "trained" the new technology to translate Chinese, Japanese and Korean patent documents into English. Patent applications in those languages accounted for some 55% of worldwide filings in 20141. Users can already try out the Chinese-English translation facility on the public beta test platform.


The Limits of Modern AI: A Story The Best Schools

#artificialintelligence

The dream of thinking machines goes back centuries, at least to Gottfried Wilhelm Leibniz, in the 17th century. Leibniz (right) helped invent mechanical calculators, independently of Isaac Newton developed the integral calculus, and had a lifelong fascination with reducing thinking to calculation. His Mathesis Universalis was a vision of universal science made possible by a mathematical language more precise than natural languages, like English. The Limits of Modern AI: A Story In the 18th Century the Enlightenment philosopher and proto-psychologist ร‰tienne Bonnot de Condillac imagined a statue outwardly appearing like a man and also with what he called "the inward organization." In an example of supreme armchair speculation, Condillac imagined pouring facts--bits of knowledge--into its head, wondering when intelligence would emerge. Condillac's musings drew inspiration from the early mechanical philosophy of Thomas Hobbes, who had famously declared that thinking was nothing but ...


Google introduces neural machine learning to improve translation, approach human-level accuracy The Tech Portal

#artificialintelligence

Though Google Translate is one of the most powerful language translation tools, the company still thinks there's room for major improvement. And it is now working towards creating a model which can translate phrases from one language to another automatically. Much like every other product, Google has been working on integrating machine learning translation techniques into this system as well. And today seems to be the day, we can finally see it in action. Google Neural Machine Translation system, or GNMT which utilizes state-of-the-art training techniques for improved translations has today been introduced into one of the most difficult language pair: Chinese to English.


Age of Aritificial Intelligence: How We're Already Living In a Sci-Fi Future

#artificialintelligence

When we talk about artificial intelligence (AI) most people still imagine robots who can talk, act, and behave (to a certain extent) like a human being -- like a C-3PO (Star Wars), sans the metallic look. Or maybe, a supercomputer that can read human behavior so well that it interacts seamlessly with us, while controlling the system -- like Hal 9000 (2001: A Space Odyssey) or Auto (Wall-E). While, arguably, we may not be there yet in terms of our command of AI, we are not that far. AI is definitely the direction tech development is taking, as evidenced by most recent trends, including the formation of a partnership by tech giants to push the frontier of AI. While we may not be nearing the Singularity, AI has taken leaps and bounds of improvement over the past few years alone.


Lightweight Random Indexing for Polylingual Text Classification

Journal of Artificial Intelligence Research

Multilingual Text Classification (MLTC) is a text classification task in which documents are written each in one among a set L of natural languages, and in which all documents must be classified under the same classification scheme, irrespective of language. There are two main variants of MLTC, namely Cross-Lingual Text Classification (CLTC) and Polylingual Text Classification (PLTC). In PLTC, which is the focus of this paper, we assume (differently from CLTC) that for each language in L there is a representative set of training documents; PLTC consists of improving the accuracy of each of the |L| monolingual classifiers by also leveraging the training documents written in the other (|L| โˆ’ 1) languages. The obvious solution, consisting of generating a single polylingual classifier from the juxtaposed monolingual vector spaces, is usually infeasible, since the dimensionality of the resulting vector space is roughly |L| times that of a monolingual one, and is thus often unmanageable. As a response, the use of machine translation tools or multilingual dictionaries has been proposed. However, these resources are not always available, or are not always free to use. One machine-translation-free and dictionary-free method that, to the best of our knowledge, has never been applied to PLTC before, is Random Indexing (RI). We analyse RI in terms of space and time efficiency, and propose a particular configuration of it (that we dub Lightweight Random Indexing LRI). By running experiments on two well known public benchmarks, Reuters RCV1/RCV2 (a comparable corpus) and JRC-Acquis (a parallel one), we show LRI to outperform (both in terms of effectiveness and efficiency) a number of previously proposed machine-translation-free and dictionary-free PLTC methods that we use as baselines.


Voice Conversion from Non-parallel Corpora Using Variational Auto-encoder

arXiv.org Machine Learning

We propose a flexible framework for spectral conversion (SC) that facilitates training with unaligned corpora. Many SC frameworks require parallel corpora, phonetic alignments, or explicit frame-wise correspondence for learning conversion functions or for synthesizing a target spectrum with the aid of alignments. However, these requirements gravely limit the scope of practical applications of SC due to scarcity or even unavailability of parallel corpora. We propose an SC framework based on variational auto-encoder which enables us to exploit non-parallel corpora. The framework comprises an encoder that learns speaker-independent phonetic representations and a decoder that learns to reconstruct the designated speaker. It removes the requirement of parallel corpora or phonetic alignments to train a spectral conversion system. We report objective and subjective evaluations to validate our proposed method and compare it to SC methods that have access to aligned corpora.


A Survey of Voice Translation Methodologies - Acoustic Dialect Decoder

arXiv.org Machine Learning

Speech Translation has always been about giving source text or audio input and waiting for system to give translated output in desired form. In this paper, we present the Acoustic Dialect Decoder (ADD) - a voice to voice ear-piece translation device. We introduce and survey the recent advances made in the field of Speech Engineering, to employ in the ADD, particularly focusing on the three major processing steps of Recognition, Translation and Synthesis. We tackle the problem of machine understanding of natural language by designing a recognition unit for source audio to text, a translation unit for source language text to target language text, and a synthesis unit for target language text to target language speech. Speech from the surroundings will be recorded by the recognition unit present on the ear-piece and translation will start as soon as one sentence is successfully read. This way, we hope to give translated output as and when input is being read. The recognition unit will use Hidden Markov Models (HMMs) Based Tool-Kit (HTK), hybrid RNN systems with gated memory cells, and the synthesis unit, HMM based speech synthesis system HTS. This system will initially be built as an English to Tamil translation device.