AITopics | Machine Translation

Collaborating Authors

Machine Translation

"Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains."
– Definition from the European Association for Machine Translation (EAMT).

You can translate text of your choice by using free translators such as: CAPITA, Google Translate, SDL International, SYSTRAN.

News Overviews Instructional Materials AI-Alerts Classics

Panlingual Lexical Translation via Probabilistic Inference

Mausam, ' (University of Washington) | (University of Washington) | Soderland, Stephen (University of Washington) | Etzioni, Oren

AAAI ConferencesJul-15-2010

The bare minimum lexical resource required to translate between a pair of languages is a translation dictionary. Unfortunately, dictionaries exist only between a tiny fraction of the 49 million possible language-pairs making machine translation virtually impossible between most of the languages. This paper summarizes the last four years of our research motivated by the vision of panlingual communication. Our research comprises three key steps. First, we compile over 630 freely available dictionaries over the Web and convert this data into a single representation – the translation graph. Second, we build several inference algorithms that infer translations between word pairs even when no dictionary lists them as translations. Finally, we run our inference procedure offline to construct PANDICTIONARY– a sense-distinguished, massively multilingual dictionary that has translations in more than 1000 languages. Our experiments assess the quality of this dictionary and find that we have 4 times as many translations at a high precision of 0.9 compared to the English Wiktionary, which is the lexical resource closest to PANDICTIONARY.

artificial intelligence, natural language, translation, (18 more...)

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Washington > King County > Seattle (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.90)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.55)

Add feedback

Generalized Task Markets for Human and Machine Computation

Shahaf, Dafna (Carnegie Mellon University) | Horvitz, Eric (Microsoft Research)

AAAI ConferencesJul-15-2010

We discuss challenges and opportunities for developing generalized task markets where human and machine intelligence are enlisted to solve problems, based on a consideration of the competencies, availabilities, and pricing of different problem-solving resources. The approach couples human computation with machine learning and planning, and is aimed at optimizing the flow of subtasks to people and to computational problem solvers. We illustrate key ideas in the context of Lingua Mechanica, a project focused on harnessing human and machine translation skills to perform translation among languages. We present infrastructure and methods for enlisting and guiding human and machine computation for language translation, including details about the hardness of generating plans for assigning tasks to solvers. Finally, we discuss studies performed with machine and human solvers, focusing on components of a Lingua Mechanica prototype.

artificial intelligence, machine learning, natural language, (18 more...)

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Washington > King County > Redmond (0.04)

Genre: Research Report (0.46)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

A Survey of Paraphrasing and Textual Entailment Methods

Androutsopoulos, Ion, Malakasiotis, Prodromos

arXiv.org Artificial IntelligenceMay-30-2010

Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of natural language expressions, such that a human who reads (and trusts) the first element of a pair would most likely infer that the other element is also true. Paraphrasing can be seen as bidirectional textual entailment and methods from the two areas are often similar. Both kinds of methods are useful, at least in principle, in a wide range of natural language processing applications, including question answering, summarization, text generation, and machine translation. We summarize key ideas from the two areas by considering in turn recognition, generation, and extraction methods, also pointing to prominent articles and resources.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1613/jair.2985

0912.3747

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
(35 more...)

Genre:

Overview (0.69)
Research Report (0.63)

Industry:

Health & Medicine > Therapeutic Area (0.46)
Law > Litigation (0.46)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

A Survey of Paraphrasing and Textual Entailment Methods

Androutsopoulos, I., Malakasiotis, P.

Journal of Artificial Intelligence ResearchMay-28-2010

expression, proc, translation, (15 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.2985

AI Access Foundation

10651

Journal of Artificial Intelligence Research

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
(49 more...)

Genre:

Overview (0.67)
Instructional Material (0.46)

Industry:

Education (0.67)
Health & Medicine > Therapeutic Area (0.46)
Law > Litigation (0.46)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Training a Multilingual Sportscaster: Using Perceptual Context to Learn Language

Chen, D. L., Kim, J., Mooney, R. J.

Journal of Artificial Intelligence ResearchMar-26-2010

We present a novel framework for learning to interpret and generate language using only perceptual context as supervision. We demonstrate its capabilities by developing a system that learns to sportscast simulated robot soccer games in both English and Korean without any language-specific prior knowledge. Training employs only ambiguous supervision consisting of a stream of descriptive textual comments and a sequence of events extracted from the simulation trace. The system simultaneously establishes correspondences between individual comments and the events that they describe while building a translation model that supports both parsing and generation. We also present a novel algorithm for learning which events are worth describing. Human evaluations of the generated commentaries indicate they are of reasonable quality and in some cases even on par with those produced by humans for our limited domain.

probability, proceedings, training data, (15 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.2962

AI Access Foundation

10645

Journal of Artificial Intelligence Research

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > Texas > Travis County > Austin (0.14)
(26 more...)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.67)

Industry:

Leisure & Entertainment > Sports > Soccer (1.00)
Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
(3 more...)

Add feedback

Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization

Amini, Massih, Usunier, Nicolas, Goutte, Cyril

Neural Information Processing SystemsDec-31-2009

We address the problem of learning classifiers when observations have multiple views, some of which may not be observed for all examples. We assume the existence of view generating functions which may complete the missing views in an approximate way. This situation corresponds for example to learning text classifiers from multilingual collections where documents are not available in all languages. In that case, Machine Translation (MT) systems may be used to translate each document in the missing languages. We derive a generalization error bound for classifiers learned on examples with multiple artificially created views. Our result uncovers a trade-off between the size of the training set, the number of views, and the quality of the view generating functions. As a consequence, we identify situations where it is more interesting to use multiple views for learning instead of classical single view learning. An extension of this framework is a natural way to leverage unlabeled multi-view data in semi-supervised learning. Experimental results on a subset of the Reuters RCV1/RCV2 collections support our findings by showing that additional views obtained from MT may significantly improve the classification performance in the cases identified by our trade-off.

classifier, view generating function, view-specific classifier, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
North America > United States > New York (0.04)
Europe > France (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

Add feedback

Bayesian Synchronous Grammar Induction

Blunsom, Phil, Cohn, Trevor, Osborne, Miles

Neural Information Processing SystemsDec-31-2009

We present a novel method for inducing synchronous context free grammars (SCFGs) from a corpus of parallel string pairs. SCFGs can model equivalence between strings in terms of substitutions, insertions and deletions, and the reordering of sub-strings. We develop a non-parametric Bayesian model and apply it to a machine translation task, using priors to replace the various heuristics commonly used in this field. Using a variational Bayes training procedure, we learn the latent structure of translation equivalence through the induction of synchronous grammar categories for phrasal translations, showing improvements in translation performance over previously proposed maximum likelihood models.

machine learning, natural language, translation, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Genre: Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Automatic Identification of Document Translations in Large Multilingual Document Collections

Pouliquen, Bruno, Steinberger, Ralf, Ignat, Camelia

arXiv.org Artificial IntelligenceDec-1-2009

Texts and their translations are a rich linguistic resource that can be used to train and test statistics-based Machine Translation systems and many other applications. In this paper, we present a working system that can identify translations and other very similar documents among a large number of candidates, by representing the document contents with a vector of thesaurus terms from a multilingual thesaurus, and by then measuring the semantic similarity between the vectors. Tests on different text types have shown that the system can detect translations with over 96% precision in a large search space of 820 documents or more. The system was tuned to ignore language-specific similarities and to give similar documents in a second language the same similarity score as equivalent documents in the same language. The application can also be used to detect cross-lingual document plagiarism.

arXiv.org Artificial Intelligence

cs/0609060

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.87)

Add feedback

Building a resource for studying translation shifts

Cyrus, Lea

arXiv.org Artificial IntelligenceDec-1-2009

This paper describes an interdisciplinary approach which brings together the fields of corpus linguistics and translation studies. It presents ongoing work on the creation of a corpus resource in which translation shifts are explicitly annotated. Translation shifts denote departures from formal correspondence between source and target text, i.e. deviations that have occurred during the translation process. A resource in which such shifts are annotated in a systematic way will make it possible to study those phenomena that need to be addressed if machine translation output is to resemble human translation. The resource described in this paper contains English source texts (parliamentary proceedings) and their German translations. The shift annotation is based on predicate-argument structures and proceeds in two steps: first, predicates and their arguments are annotated monolingually in a straightforward manner. Then, the corresponding English and German predicates and arguments are aligned with each other. Whenever a shift - mainly grammatical or semantic -has occurred, the alignment is tagged accordingly.

artificial intelligence, natural language, text processing, (20 more...)

arXiv.org Artificial Intelligence

cs/0606096

Country:

Europe (0.95)
North America > United States (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback

Statistical Machine Translation by Generalized Parsing

Melamed, I. Dan, Wang, Wei

arXiv.org Artificial IntelligenceDec-1-2009

Designers of statistical machine translation (SMT) systems have begun to employ tree-structured translation models. Systems involving tree-structured translation models tend to be complex. This article aims to reduce the conceptual complexity of such systems, in order to make them easier to design, implement, debug, use, study, understand, explain, modify, and improve. In service of this goal, the article extends the theory of semiring parsing to arrive at a novel abstract parsing algorithm with five functional parameters: a logic, a grammar, a semiring, a search strategy, and a termination condition. The article then shows that all the common algorithms that revolve around tree-structured translation models, including hierarchical alignment, inference for parameter estimation, translation, and structured evaluation, can be derived by generalizing two of these parameters -- the grammar and the logic. The article culminates with a recipe for using such generalized parsers to train, apply, and evaluate an SMT system that is driven by tree-structured translation models.

algorithm, artificial intelligence, natural language, (16 more...)

arXiv.org Artificial Intelligence

cs/0407005

Country:

Europe (1.00)
North America > United States > Maryland (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback