Goto

Collaborating Authors

 Machine Translation


Predicting Crowd-Based Translation Quality with Language-Independent Feature Vectors

AAAI Conferences

Research over the past years has shown that machine translation results can be greatly enhanced with the help of mono- or bilingual human contributors, e.g. by asking hu- mans to proofread or correct outputs of machine translation systems. However, it remains difficult to determine the quality of individual revisions. This paper proposes a meth- od to determine the quality of individual contributions by analyzing task-independent data. Examples of such data are completion time, number of keystrokes, etc. An initial evaluation showed promising F-measure values larger than 0.8 for support vector machine and decision tree based classifications of a combined test set of Vietnamese and German translations.


Applying Automated Language Translation at a Global Enterprise Level

AAAI Conferences

In 2007 we presented a paper that described the application of Natural Language Processing (NLP) and Machine Translation (MT) for the automated translation of process build instructions from English to other languages to support Ford’s assembly plants in non-English speaking countries. This project has continued to evolve with the addition of new languages and improvements to the translation process. However, we discovered that there was a large demand for automated language translation across all of Ford Motor Company and we decided to expand the scope of our project to address these requirements. This paper will describe our efforts to meet all of Ford’s internal translation requirements with AI and MT technology and focus on the challenges and lessons that we learned from applying advanced technology across an entire corporation.


Doodling: A Gaming Paradigm for Generating Language Data

AAAI Conferences

With the advent of the increasingly participatory Internet and the growing power of the crowd, “Serious Games” have proven to be a fertile approach for gathering task-specific natural language data at very low cost. In this paper we outline a game we call Doodling, based on the sketch-and-convey metaphor used in the popular board game Pictionary(R), with the goal of generating useful natural language data. We explore whether such a paradigm can be successfully extended for conveying more complex syntactic and semantic constructs than the words or short phrases typically used in the board game. Through a series of user experiments, we show that this is indeed the case, and that valuable parallel language data may be produced as a byproduct. In addition, we explore extensions to this paradigm along two axes – going online (vs. face-to-face) and going cross-lingual. The results in each of the sets of experiments confirm the potential of Doodling game to generate data in large quantities and across languages, and thus provide a new means of developing data sets and technologies for resource-poor languages.


Generating Chinese Classical Poems with Statistical Machine Translation Models

AAAI Conferences

This paper describes a statistical approach to generation of Chinese classical poetry and proposes a novel method to automatically evaluate poems. The system accepts a set of keywords representing the writing intents from a writer and generates sentences one by one to form a completed poem. A statistical machine translation (SMT) system is applied to generate new sentences, given the sentences generated previously. For each line of sentence a specific model specially trained for that line is used, as opposed to using a single model for all sentences. To enhance the coherence of sentences on every line, a coherence model using mutual information is applied to select candidates with better consistency with previous sentences. In addition, we demonstrate the effectiveness of the BLEU metric for evaluation with a novel method of generating diverse references.


The Best of AI in Japan — Prologue

AI Magazine

This article is the first report in the best of AI in Japan series. This series will focus on the prominent accomplishments made in the AI field, not only the research and development but also the AI-related events in society. As the first in the forthcoming series, this opening article features a historical background and the contemporary AI-research activities in Japan. It then highlights some recent prominent results from the industry. Finally, a future perspective is given.


Improving Statistical Machine Translation for a Resource-Poor Language Using Related Resource-Rich Languages

Journal of Artificial Intelligence Research

We propose a novel language-independent approach for improving machine translation for resource-poor languages by exploiting their similarity to resource-rich ones. More precisely, we improve the translation from a resource-poor source language X_1 into a resource-rich language Y given a bi-text containing a limited number of parallel sentences for X_1-Y and a larger bi-text for X_2-Y for some resource-rich language X_2 that is closely related to X_1. This is achieved by taking advantage of the opportunities that vocabulary overlap and similarities between the languages X_1 and X_2 in spelling, word order, and syntax offer: (1) we improve the word alignments for the resource-poor language, (2) we further augment it with additional translation options, and (3) we take care of potential spelling differences through appropriate transliteration. The evaluation for Indonesian- >English using Malay and for Spanish -> English using Portuguese and pretending Spanish is resource-poor shows an absolute gain of up to 1.35 and 3.37 BLEU points, respectively, which is an improvement over the best rivaling approaches, while using much less additional data. Overall, our method cuts the amount of necessary "real'' training data by a factor of 2--5.


A Perspective on AI Research in India

AI Magazine

The second was the propensity of the computing industry toward more lucrative assignments in the service sector. Both these factors are changing, not least because leading international software companies have set up research and development centers in the country. Computer science education established itself in India in the early 1980s when the Indian Institutes of Technology (IITs) set up computer science departments and started offering undergraduate programs in the discipline. Research in artificial intelligence took off soon afterward when the government of India launched the Knowledge Based Computing Systems (KBCS) program in conjunction with the United Nations Development Program (Saint-Dizier 1991). A number of nodal centers were set up to focus on different areas of research including expert systems (IIT Madras), speech processing (Tata Institue of Fundamental Research), parallel processing (Indian Institute for Science), image processing (Indian Statistical Institute), and natural language processing (Center for Development of Advanced Computing).


Generalized Biwords for Bitext Compression and Translation Spotting

Journal of Artificial Intelligence Research

Large bilingual parallel texts (also known as bitexts) are usually stored in a compressed form, and previous work has shown that they can be more efficiently compressed if the fact that the two texts are mutual translations is exploited. For example, a bitext can be seen as a sequence of biwords ---pairs of parallel words with a high probability of co-occurrence--- that can be used as an intermediate representation in the compression process. However, the simple biword approach described in the literature can only exploit one-to-one word alignments and cannot tackle the reordering of words. We therefore introduce a generalization of biwords which can describe multi-word expressions and reorderings. We also describe some methods for the binary compression of generalized biword sequences, and compare their performance when different schemes are applied to the extraction of the biword sequence. In addition, we show that this generalization of biwords allows for the implementation of an efficient algorithm to look on the compressed bitext for words or text segments in one of the texts and retrieve their counterpart translations in the other text ---an application usually referred to as translation spotting--- with only some minor modifications in the compression algorithm.


Beyond Independent Agreement: A Tournament Selection Approach for Quality Assurance of Human Computation Tasks

AAAI Conferences

Quality assurance remains a key topic in human computation research field. Prior work indicates independent agreement is effective for low difficulty tasks, but has limitations. This paper addresses this problem by proposing a tournament selection based quality control process. The experimental results from this paper show that the human are better at identifying the correct answers than producing them themselves.


picoTrans: Using Pictures as Input for Machine Translation on Mobile Devices

AAAI Conferences

In this paper we present a novel user interface that integrates two popular approaches to language translation for travelers allowing multimodal communication between the parties involved: the picture-book, in which the user simply points to multiple picture icons representing what they want to say, and the statistical machine translation system that can translate arbitrary word sequences. Our prototype system tightly couples both processes within a translation framework that inherits many of the the positive features of both approaches, while at the same time mitigating their main weaknesses. Our system differs from traditional approaches in that its mode of input is a sequence of pictures, rather than text or speech. Text in the source language is generated automatically, and is used as a detailed representation of the intended meaning. The picture sequence which not only provides a rapid method to communicate basic concepts but also gives a `second opinion' on the machine transition output that catches machine translation errors and allows the users to retry the translation, avoiding misunderstandings.