Goto

Collaborating Authors

Polish to English Statistical Machine Translation

arXiv.org Machine Learning

This research explores the effects of various training settings on a Polish to English Statistical Machine Translation system for spoken language. Various elements of the TED, Europarl, and OPUS parallel text corpora were used as the basis for training of language models, for development, tuning and testing of the translation system. The BLEU, NIST, METEOR and TER metrics were used to evaluate the effects of the data preparations on the translation results.


A Topic-Based Coherence Model for Statistical Machine Translation

AAAI Conferences

Coherence that ties sentences of a text into a meaningfully connected structure is of great importance to text generation and translation. In this paper, we propose a topic-based coherence model to produce coherence for document translation, in terms of the continuity of sentence topics in a text. We automatically extract a coherence chain for each source text to be translated. Based on the extracted source coherence chain, we adopt a maximum entropy classifier to predict the target coherence chain that defines a linear topic structure for the target document. The proposed topic-based coherence model then uses the predicted target coherence chain to help decoder select coherent word/phrase translations. Our experiments show that incorporating the topic-based coherence model into machine translation achieves substantial improvement over both the baseline and previous methods that integrate document topics rather than coherence chains into machine translation.



Translation Technology Is Getting Better. What Does That Mean For The Future?

#artificialintelligence

Tools and apps like Google Translate are getting better and better at translating one language into another. Alexander Waibel, professor of computer science at Carnegie Mellon University's Language Technologies Institute (@LTIatCMU), tells Here & Now's Jeremy Hobson how translation technology works, where there's still room to improve and what could be in store in the decades to come. "Over the years I think there's been a big trend on translation to go increasingly from rule-based, knowledge-based methods to learning methods. Systems have now really achieved a phenomenally good accuracy, and so I think, within our lifetime I'm fairly sure that we'll reach -- if we haven't already done so -- human-level performance, and/or exceeding it. "The current technology that really has taken the community by storm is of course neural machine translation.


Polish - English Speech Statistical Machine Translation Systems for the IWSLT 2013

arXiv.org Machine Learning

This research explores the effects of various training settings from Polish to English Statistical Machine Translation system for spoken language. Various elements of the TED parallel text corpora for the IWSLT 2013 evaluation campaign were used as the basis for training of language models, and for development, tuning and testing of the translation system. The BLEU, NIST, METEOR and TER metrics were used to evaluate the effects of data preparations on translation results. Our experiments included systems, which use stems and morphological information on Polish words. We also conducted a deep analysis of provided Polish data as preparatory work for the automatic data correction and cleaning phase.