Goto

Collaborating Authors

 Machine Translation


Word Sense Disambiguation Using English-Spanish Aligned Phrases over Comparable Corpora

arXiv.org Artificial Intelligence

In this paper we describe a WSD experiment based on bilingual English-Spanish comparable corpora in which individual noun phrases have been identified and aligned with their respective counterparts in the other language. The evaluation of the experiment has been carried out against SemCor. We show that, with the alignment algorithm employed, potential precision is high (74.3%), however the coverage of the method is low (2.7%), due to alignments being far less frequent than we expected. Contrary to our intuition, precision does not rise consistently with the number of alignments. The coverage is low due to several factors; there are important domain differences, and English and Spanish are too close languages for this approach to be able to discriminate efficiently between senses, rendering it unsuitable for WSD, although the method may prove more productive in machine translation.


Word Sense Disambiguation for All Words Without Hard Labor

AAAI Conferences

While the most accurate word sense disambiguation systems are built using supervised learning from sense-tagged data, scaling them up to all words of a language has proved elusive, since preparing a sense-tagged corpus for all words of a language is time-consuming and human labor intensive. In this paper, we propose and implement a completely automatic approach to scale up word sense disambiguation to all words of English.  Our approach relies on English-Chinese parallel corpora, English-Chinese bilingual dictionaries, and automatic methods of finding synonyms of Chinese words. No additional human sense annotations or word translations are needed. We conducted a large-scale empirical evaluation on more than 29,000 noun tokens in English texts annotated in OntoNotes 2.0, based on its coarse-grained sense inventory.  The evaluation results show that our approach is able to achieve high accuracy, outperforming the first-sense baseline and coming close to a prior reported approach that requires manual human efforts to provide Chinese translations of English senses.


Context-Based Approach for Pivot Translation Services

AAAI Conferences

Machine translation services available on the Web are becoming increasingly popular. However, a pivot translation service is required to realize translations between non-English languages by cascading different translation services via English. As a result, the meaning of words often drifts due to the inconsistency , asymmetry and intransitivity of word selections among translation services. In this paper, we propose context-based coordination to maintain the consistency of word meanings during pivot translation services. First, we propose a method to automatically generate multilingual equivalent terms based on bilingual dictionaries and use generated terms to propagate context among combined translation services. Second, we show a multiagent architecture as one way of implementation, wherein a coordinator agent gathers and propagates context from/to a translation agent. We generated trilingual equivalent noun terms and implemented a Japanese-to-German-and-back translation, cascading into four translation services. The evaluation results showed that the generated terms can cover over 58% of all nouns. The translation quality was improved by 40% for all sentences, and the quality rating for all sentences increased by an average of 0.47 points on a five-point scale. These results indicate that we can realize consistent pivot translation services through context-based coordination based on existing services.


Unsupervised Rank Aggregation with Domain-Specific Expertise

AAAI Conferences

Consider the setting where a panel of judges is repeatedly asked to (partially) rank sets of objects according to given criteria, and assume that the judges' expertise depends on the objects' domain.  Learning to aggregate their rankings with the goal of producing a better joint ranking is a fundamental problem in many areas of Information Retrieval and Natural Language Processing, amongst others.  However, supervised ranking data is generally difficult to obtain, especially if coming from multiple domains.  Therefore, we propose a framework for learning to aggregate votes of constituent rankers with domain specific expertise without supervision.  We apply the learning framework to the settings of aggregating full rankings and aggregating top-k lists, demonstrating significant improvements over a domain-agnostic baseline in both cases.


Inferring Shallow-Transfer Machine Translation Rules from Small Parallel Corpora

Journal of Artificial Intelligence Research

This paper describes a method for the automatic inference of structural transfer rules to be used in a shallow-transfer machine translation (MT) system from small parallel corpora. The structural transfer rules are based on alignment templates, like those used in statistical MT. Alignment templates are extracted from sentence-aligned parallel corpora and extended with a set of restrictions which are derived from the bilingual dictionary of the MT system and control their application as transfer rules. The experiments conducted using three different language pairs in the free/open-source MT platform Apertium show that translation quality is improved as compared to word-for-word translation (when no transfer rules are used), and that the resulting translation quality is close to that obtained using hand-coded transfer rules. The method we present is entirely unsupervised and benefits from information in the rest of modules of the MT system in which the inferred rules are applied.


Agreement-Based Learning

Neural Information Processing Systems

The learning of probabilistic models with many hidden variables and nondecomposable dependencies is an important and challenging problem. In contrast to traditional approaches based on approximate inference in a single intractable model, our approach is to train a set of tractable submodels by encouraging them to agree on the hidden variables. This allows us to capture non-decomposable aspects of the data while still maintaining tractability. We propose an objective function for our approach, derive EMstyle algorithms for parameter estimation, and demonstrate their effectiveness on three challenging real-world learning tasks.


Agreement-Based Learning

Neural Information Processing Systems

The learning of probabilistic models with many hidden variables and nondecomposable dependencies is an important and challenging problem. In contrast to traditional approaches based on approximate inference in a single intractable model, our approach is to train a set of tractable submodels by encouraging them to agree on the hidden variables. This allows us to capture non-decomposable aspects of the data while still maintaining tractability. We propose an objective function for our approach, derive EMstyle algorithms for parameter estimation, and demonstrate their effectiveness on three challenging real-world learning tasks.


HM-BiTAM: Bilingual Topic Exploration, Word Alignment, and Translation

Neural Information Processing Systems

We present a novel paradigm for statistical machine translation (SMT), based on joint modeling of word alignment and the topical aspects underlying bilingual document pairs via a hidden Markov Bilingual Topic AdMixture (HM-BiTAM). In this new paradigm, parallel sentence-pairs from a parallel document-pair are coupled via a certain semantic-flow, to ensure coherence of topical context in the alignment of matching words between languages, during likelihood-based training of topic-dependent translational lexicons, as well as topic representations in each language. The resulting trained HM-BiTAM can not only display topic patterns like other methods such as LDA, but now for bilingual corpora; it also offers a principled way of inferring optimal translation in a context-dependent way. Our method integrates the conventional IBM Models based on HMM --- a key component for most of the state-of-the-art SMT systems, with the recently proposed BiTAM model, and we report an extensive empirical analysis (in many way complementary to the description-oriented of our method in three aspects: word alignment, bilingual topic representation, and translation.


Agreement-Based Learning

Neural Information Processing Systems

The learning of probabilistic models with many hidden variables and nondecomposable dependenciesis an important and challenging problem. In contrast to traditional approaches based on approximate inference in a single intractable model, our approach is to train a set of tractable submodels by encouraging them to agree on the hidden variables. This allows us to capture non-decomposable aspects of the data while still maintaining tractability. We propose an objective function for our approach, derive EMstyle algorithms for parameter estimation, and demonstrate their effectiveness on three challenging real-world learning tasks.


Expectation Maximization and Posterior Constraints

Neural Information Processing Systems

The expectation maximization (EM) algorithm is a widely used maximum likelihood estimationprocedure for statistical models when the values of some of the variables in the model are not observed. Very often, however, our aim is primarily tofind a model that assigns values to the latent variables that have intended meaning for our data and maximizing expected likelihood only sometimes accomplishes this.Unfortunately, it is typically difficult to add even simple a-priori information about latent variables in graphical models without making the models overly complex or intractable. In this paper, we present an efficient, principled way to inject rich constraints on the posteriors of latent variables into the EM algorithm. Our method can be used to learn tractable graphical models that satisfy additional,otherwise intractable constraints. Focusing on clustering and the alignment problem for statistical machine translation, we show that simple, intuitive posteriorconstraints can greatly improve the performance over standard baselines and be competitive with more complex, intractable models.