AITopics | Machine Translation

Collaborating Authors

Machine Translation

"Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains."
– Definition from the European Association for Machine Translation (EAMT).

You can translate text of your choice by using free translators such as: CAPITA, Google Translate, SDL International, SYSTRAN.

News Overviews Instructional Materials AI-Alerts Classics

AI, ML, or DL – learn what it means

#artificialintelligenceFeb-11-2022, 19:07:59 GMT

AI essentially works to develop machines that are self-reliant and can think and act like humans. Examples of AI are machine translation such as Google Translate, speech recognition apps such as Google Assistant or Siri, and AI robots such as Aibo and Sophia. ML looks to solve business problems through predictive models built on analytics and computer models. The work of a machine learning engineer is seen in sales forecasting, stock price predictions, and banking fraud analysis, among others. A subset of ML, DL works with artificial neural networks employing algorithms inspired by the structure and working of the human brain. DL algorithms can work with huge amounts of both structured and unstructured data; ML, in comparison, typically requires structured data. Use cases include the detection of cancerous tumors and other objects and the coloring of images.

ai and machine, algorithm, personality

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.62)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.62)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.62)

Add feedback

Apply profanity masking in Amazon Translate

#artificialintelligenceFeb-11-2022, 18:55:37 GMT

Amazon Translate is a neural machine translation service that delivers fast, high-quality, affordable, and customizable language translation. This post shows how you can mask profane words and phrases with a grawlix string ("?$#@$"). Amazon Translate typically chooses clean words for your translation output. But in some situations, you want to prevent words that are commonly considered as profane terms from appearing in the translated output. For example, when you're translating video captions or subtitle content, or enabling in-game chat, and you want the translated content to be age appropriate and clear of any profanity, Amazon Translate allows you to mask the profane words and phrases using the profanity masking setting.

amazon translate, profane word, profanity, (10 more...)

#artificialintelligence

Industry: Retail > Online (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

ClidSum: A Benchmark Dataset for Cross-Lingual Dialogue Summarization

Wang, Jiaan, Meng, Fandong, Lu, Ziyao, Zheng, Duo, Li, Zhixu, Qu, Jianfeng, Zhou, Jie

arXiv.org Artificial IntelligenceFeb-11-2022

We present ClidSum, a benchmark dataset for building cross-lingual summarization systems on dialogue documents. It consists of 67k+ dialogue documents from two subsets (i.e., SAMSum and MediaSum) and 112k+ annotated summaries in different target languages. Based on the proposed ClidSum, we introduce two benchmark settings for supervised and semi-supervised scenarios, respectively. We then build various baseline systems in different paradigms (pipeline and end-to-end) and conduct extensive experiments on ClidSum to provide deeper analyses. Furthermore, we propose mDialBART which extends mBART-50 (a multi-lingual BART) via further pre-training. The multiple objectives used in the further pre-training stage help the pre-trained model capture the structural characteristics as well as important content in dialogues and the transformation from source to the target language. Experimental results show the superiority of mDialBART, as an end-to-end model, outperforms strong pipeline models on ClidSum. Finally, we discuss specific challenges that current approaches faced with this task and give multiple promising directions for future research. We have released the dataset and code at https://github.com/krystalan/ClidSum.

computational linguistic, dataset, md ial bart, (13 more...)

arXiv.org Artificial Intelligence

2202.05599

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Portugal > Lisbon > Lisbon (0.14)
Asia > China > Hong Kong (0.04)
(11 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Typical Decoding for Natural Language Generation

Meister, Clara, Pimentel, Tiago, Wiher, Gian, Cotterell, Ryan

arXiv.org Artificial IntelligenceFeb-10-2022

Despite achieving incredibly low perplexities on myriad natural language corpora, today's language models still often underperform when used to generate text. This dichotomy has puzzled the language generation community for the last few years. In this work, we posit that the abstraction of natural language as a communication channel (\`a la Shannon, 1948) can provide new insights into the behaviors of probabilistic language generators, e.g., why high-probability texts can be dull or repetitive. Humans use language as a means of communicating information, and do so in an efficient yet error-minimizing manner, choosing each word in a string with this (perhaps subconscious) goal in mind. We propose that generation from probabilistic models should mimic this behavior. Rather than always choosing words from the high-probability region of the distribution--which have a low Shannon information content--we sample from the set of words with an information content close to its expected value, i.e., close to the conditional entropy of our model. This decision criterion can be realized through a simple and efficient implementation, which we call typical sampling. Automatic and human evaluations show that, in comparison to nucleus and top-k sampling, typical sampling offers competitive performance in terms of quality while consistently reducing the number of degenerate repetitions.

computational linguistic, information content, proceedings, (12 more...)

arXiv.org Artificial Intelligence

2202.00666

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > New York (0.04)
North America > Dominican Republic (0.04)
(11 more...)

Genre: Research Report (0.82)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Health & Medicine (1.00)
Law > Criminal Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback

AI in everyday life 🔹

#artificialintelligenceFeb-9-2022, 08:45:30 GMT

Below are some AI applications that you may not realise are AI-powered: Online shopping and advertising Artificial intelligence is widely used to provide personalised recommendations to people, based for example on their previous searches and purchases or other online behaviour. AI is hugely important in commerce: optimising products, planning inventory, logistics etc. Web search Search engines learn from the vast input of data, provided by their users to provide relevant search results. Digital personal assistants Smartphones use AI to provide services that are as relevant and personalised as possible. Virtual assistants answering questions, providing recommendations and helping organise daily routines have become ubiquitous. Machine translations Language translation software, either based on written or spoken text, relies on artificial intelligence to provide and improve translations.

ai application, artificial intelligence, intelligence, (1 more...)

#artificialintelligence

Industry: Information Technology > Security & Privacy (0.37)

Technology:

Information Technology > Information Management > Search (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.60)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.60)

Add feedback

Dong

AAAI ConferencesFeb-8-2022, 11:43:51 GMT

While parallel corpora are an indispensable resource for data-driven multilingual natural language processing tasks such as machine translation, they are limited in quantity, quality and coverage. As a result, learning translation models from non-parallel corpora has become increasingly important nowadays, especially for low-resource languages. In this work, we propose a joint model for iteratively learning parallel lexicons and phrases from nonparallel corpora. The model is trained using a Viterbi EM algorithm that alternates between constructing parallel phrases using lexicons and updating lexicons based on the constructed parallel phrases. Experiments on Chinese-English datasets show that our approach learns better parallel lexicons and phrases and improves translation performance significantly.

corpora, parallel lexicon and phrase, parallel phrase, (1 more...)

AAAI Conferences

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Huang

AAAI ConferencesFeb-8-2022, 11:43:03 GMT

Computer-aided translation (CAT) system is the most popular tool which helps human translators perform language translation efficiently. To further improve the efficiency, there is an increasing interest in applying the machine translation (MT) technology to upgrade CAT. Post-editing is a standard approach: human translators generate the translation by correcting MT outputs. In this paper, we propose a novel approach deeply integrating MT into CAT systems: a well-designed input method which makes full use of the knowledge adopted by MT systems, such as translation rules, decoding hypotheses and n-best translation lists. Our proposed approach allows human translators to focus on choosing better translation results with less time rather than just complete translation themselves. The extensive experiments demonstrate that our method saves more than 14% time and over 33% keystrokes, and it improves the translation quality as well by more than 3 absolute BLEU scores compared with the strong baseline, i.e., post-editing using Google Pinyin.

huang, human translator, translation

AAAI Conferences

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Lee

AAAI ConferencesFeb-8-2022, 11:23:16 GMT

We present the first automatic emotion detection system for Cantonese. This system classifies input text into eight emotion classes: expectancy, joy, love, surprise, anxiety, sorrow, angry, or hate. While a number of emotion corpora and lexica for Mandarin Chinese have been developed, no emotion dataset is available for Cantonese. We leverage existing Mandarin Chinese emotion resources to build the system, with support from Cantonese-Mandarin lexical mappings from a machine translation system, as well as English-Mandarin lexical mappings to handle code-switching in Cantonese input. Evaluation on a set of Cantonese sentences from social media shows promising results.

cantonese, lee, lexical mapping

AAAI Conferences

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Alkhatib

AAAI ConferencesFeb-8-2022, 11:20:10 GMT

The task of transliteration of named entities from one language into another is complicated and considered as one of the challenging tasks in machine translation (MT). To build a well performed transliteration system, we apply well-established techniques based on Hybrid Deep Learning. The system based on convolutional neural network (CNN) followed by Bi-LSTM and CRF. The proposed hybrid mechanism is examined on ANERCorp and Kalimat corpus. The results show that the neural machine translation approach can be employed to build efficient machine transliteration systems achieving state-of-the-art results for Arabic – English language.

alkhatib, transliteration system

AAAI Conferences

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Ahmadnia

AAAI ConferencesFeb-8-2022, 11:18:12 GMT

Neural Machine Translation (NMT) relies heavily on word embeddings, which are continuous representations of words in a vector space, obtained from large monolingual data and, independently, from bilingual data for NMT model training. Word embeddings have proven to be invaluable for performance improvements in natural language analysis tasks that otherwise suffer from data scarcity. This paper defines a new cost function---demonstrated on Farsi-Spanish low-resource attention-based NMT---that encodes word similarity as distances within a word embedding space. The novelty of this cost function is that it encourages our attentional NMT model to generate words that are close to their references in the embedding space. This approach encourages the decoder to select acceptably similar words when potential candidates are found to be Out-Of-Vocabulary (OOV).

ahmadnia, attentional nmt model, cost function

AAAI Conferences

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback