AITopics | Machine Translation

Collaborating Authors

Machine Translation

"Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains."
– Definition from the European Association for Machine Translation (EAMT).

You can translate text of your choice by using free translators such as: CAPITA, Google Translate, SDL International, SYSTRAN.

News Overviews Instructional Materials AI-Alerts Classics

Video Speech Translation

#artificialintelligenceJul-6-2022, 06:50:09 GMT

Have you ever wondered how to make your videos reachable to a wider audience spanning across multiple languages? Adding subtitles in regional languages is one way. But subtitles reduce focus on the actual content of the video. Definitely adding vocal narration improves comprehensibility of a video. But isn't it too much work to create vocal narration separately in individual languages?

speech, video, video speech translation, (11 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.53)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.42)

Add feedback

Building Machine Translation Systems for the Next Thousand Languages

Bapna, Ankur, Caswell, Isaac, Kreutzer, Julia, Firat, Orhan, van Esch, Daan, Siddhant, Aditya, Niu, Mengmeng, Baljekar, Pallavi, Garcia, Xavier, Macherey, Wolfgang, Breiner, Theresa, Axelrod, Vera, Riesa, Jason, Cao, Yuan, Chen, Mia Xu, Macherey, Klaus, Krikun, Maxim, Wang, Pidong, Gutkin, Alexander, Shah, Apurva, Huang, Yanping, Chen, Zhifeng, Wu, Yonghui, Hughes, Macduff

arXiv.org Artificial IntelligenceJul-6-2022

In this paper we share findings from our effort to build practical machine translation (MT) systems capable of translating across over one thousand languages. We describe results in three research domains: (i) Building clean, web-mined datasets for 1500+ languages by leveraging semi-supervised pre-training for language identification and developing data-driven filtering techniques; (ii) Developing practical MT models for under-served languages by leveraging massively multilingual models trained with supervised parallel data for over 100 high-resource languages and monolingual datasets for an additional 1000+ languages; and (iii) Studying the limitations of evaluation metrics for these languages and conducting qualitative analysis of the outputs from our MT models, highlighting several frequent error modes of these types of models. We hope that our work provides useful insights to practitioners working towards building MT systems for currently understudied languages, and highlights research directions that can complement the weaknesses of massively multilingual models in data-sparse settings.

low-resource language, natural language processing, neural machine translation, (14 more...)

arXiv.org Artificial Intelligence

2205.03983

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.13)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
North America > Mexico > Puebla (0.04)
(68 more...)

Genre: Research Report (1.00)

Industry:

Media (0.67)
Health & Medicine (0.67)
Education (0.46)
Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Supervised Visual Attention for Simultaneous Multimodal Machine Translation

Haralampieva, Veneta (Imperial College London) | Caglayan, Ozan | Specia, Lucia (Imperial College London)

Journal of Artificial Intelligence ResearchJul-5-2022

There has been a surge in research in multimodal machine translation (MMT), where additional modalities such as images are used to improve translation quality of textual systems. A particular use for such multimodal systems is the task of simultaneous machine translation, where visual context has been shown to complement the partial information provided by the source sentence, especially in the early phases of translation. In this paper, we propose the first Transformer-based simultaneous MMT architecture, which has not been previously explored in simultaneous translation. Additionally, we extend this model with an auxiliary supervision signal that guides the visual attention mechanism using labelled phrase-region alignments. We perform comprehensive experiments on three language directions and conduct thorough quantitative and qualitative analyses using both automatic metrics and manual inspection. Our results show that (i) supervised visual attention consistently improves the translation quality of the simultaneous MMT models, and (ii) fine-tuning the MMT with supervision loss enabled leads to better performance than training the MMT from scratch. Compared to the state-of-the-art, our proposed model achieves improvements of up to 2.3 BLEU and 3.5 METEOR points.

computational linguistic, proceedings, translation, (11 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.13546

AI Access Foundation

13546

Journal of Artificial Intelligence Research

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Germany > Berlin (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
(17 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Machine Translation of Languages in Artificial Intelligence - GeeksforGeeks

#artificialintelligenceJun-27-2022, 02:10:07 GMT

The automatic translation of text from one natural language (the source) to another is known as machine translation (the target). It was one of the first applications for computers that were imagined (Weaver, 1949). The translation is tough since it necessitates a thorough understanding of the text in the most general scenario. This is true even for very basic messages, such as one-word "texts." Consider the word "Open" on a store's front door.

artificial intelligence, representation, translation, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.74)

Add feedback

Machine Translation Evaluation with Cometinho

#artificialintelligenceJun-24-2022, 06:15:40 GMT

The European Association for Machine Translation (EAMT) conference is a venue where MT researchers, users and translators gather to discuss the latest advances in the industry. It is really interesting to go there and see what is going on in the European continent in terms of MT development and adoption. In this article, I want to share some ideas from the Best Paper Award of this year. Its title is "Searching for COMETINHO: The Little Metric That Could", from the research lab of Unbabel, a company based in Lisbon, Portugal that offers translation services using MT and human translators. You can find the online version of the paper in the ACL Anthology.

computation, towardsdatascience, vector, (13 more...)

#artificialintelligence

Country: Europe > Portugal > Lisbon > Lisbon (0.25)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.99)

Add feedback

Over-Generation Cannot Be Rewarded: Length-Adaptive Average Lagging for Simultaneous Speech Translation

Papi, Sara, Gaido, Marco, Negri, Matteo, Turchi, Marco

arXiv.org Artificial IntelligenceJun-20-2022

Simultaneous speech translation (SimulST) systems aim at generating their output with the lowest possible latency, which is normally computed in terms of Average Lagging (AL). In this paper we highlight that, despite its widespread adoption, AL provides underestimated scores for systems that generate longer predictions compared to the corresponding references. We also show that this problem has practical relevance, as recent SimulST systems have indeed a tendency to over-generate. As a solution, we propose LAAL (Length-Adaptive Average Lagging), a modified version of the metric that takes into account the over-generation phenomenon and allows for unbiased evaluation of both under-/over-generating systems.

artificial intelligence, length-adaptive average lagging, natural language, (4 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2022.autosimtrans-1.2

2206.05807

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.60)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.60)

Add feedback

Synthetic Data Is About To Transform Artificial Intelligence - AI Summary

#artificialintelligenceJun-14-2022, 07:42:45 GMT

So instead, AV companies developed sophisticated simulation engines to synthetically generate the requisite volume of data and efficiently expose their AI systems to the "long tail" of driving scenarios. These simulated worlds make it possible to automatically produce thousands or millions of permutations of any imaginable driving scenario--e.g., changing the locations of other cars, adding or removing pedestrians, increasing or decreasing vehicle speeds, adjusting the weather, and so on. But it didn't take long for AI entrepreneurs to recognize that the synthetic data capabilities that had been developed for the autonomous vehicle industry could be generalized and applied to a host of other computer vision applications. Founded by AI luminary Raquel Urtasun, who previously ran Uber's AV research efforts, Waabi came out of stealth last year with a star-studded team and over $80 million in funding. Dramatic recent advances in natural language processing (NLP) are opening up virtually unbounded opportunities for value creation across the economy, as previously explored in this column.

language processing, natural language processing, synthetic data, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.36)

Add feedback

A machine-learning method hallucinates its way to better text translation

#artificialintelligenceJun-7-2022, 09:05:46 GMT

As babies, we babble and imitate our way to learning languages. We don't start off reading raw text, which requires fundamental knowledge and understanding about the world, as well as the advanced ability to interpret and infer descriptions and relationships. Rather, humans begin our language journey slowly, by pointing and interacting with our environment, basing our words and perceiving their meaning through the context of the physical and social world. Eventually, we can craft full sentences to communicate complex ideas. Similarly, when humans begin learning and translating into another language, the incorporation of other sensory information, like multimedia, paired with the new and unfamiliar words, like flashcards with images, improves language acquisition and retention. Then, with enough practice, humans can accurately translate new, unseen sentences in context without the accompanying media; however, imagining a picture based on the original text helps.

source sentence, transformer, translation, (15 more...)

#artificialintelligence

Country:

North America > United States > California > San Diego County > San Diego (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.77)

Add feedback

Petuum and Inception Institute for AI Partner for Advanced AI

#artificialintelligenceJun-7-2022, 00:12:19 GMT

Petuum, the creator of the world's first composable platform for MLOps, and the Inception Institute for Artificial Intelligence (IIAI), have agreed to partner on the development of revolutionary AI applications. Petuum has recently announced a limited release of the composable platform, which includes the AI OS, Universal Pipelines, Deployment Manager, and Experiment Manager, for select private beta partners. Through the partnership with Petuum, IIAI's enterprise AI/ML teams will operationalize and scale their applications into production. Founded in 2018, IIAI's mission is to build full-stack AI solutions and operating systems for enterprise businesses and developers. Besides being the research arm for G42, IIAI is also empowering stakeholders with AI applications and incubating new technology at the cutting edge of ML innovation.

iiai, inception institute, petuum, (13 more...)

#artificialintelligence

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.17)
Europe > Middle East (0.07)
Africa > Middle East (0.07)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.37)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.34)

Add feedback

Spam Detection Using BERT

Sahmoud, Thaer, Mikki, Mohammad

arXiv.org Artificial IntelligenceJun-7-2022

Abstract-Emails and SMSs are the most popular tools in today communications, and as the increase of emails and SMSs users are increase, the number of spams is also increases. Spam is any kind of unwanted, unsolicited digital communication that gets sent out in bulk, spam emails and SMSs are causing major resource wastage by unnecessarily flooding the network links. Although most spam mail originate with advertisers looking to push their products, some are much more malicious in their intent like phishing emails that aims to trick victims into giving up sensitive information like website logins or credit card information this type of cybercrime is known as phishing. To countermeasure spams, many researches and efforts are done to build spam detectors that are able to filter out messages and emails as spam or ham. In this research we build a spam detector using BERT pre-trained model that classifies emails and messages by understanding to their context, and we trained our spam detector model using multiple corpuses like SMS collection corpus, Enron corpus, SpamAssassin corpus, Ling-Spam corpus and SMS spam collection corpus, our spam detector performance was 98.62%, 97.83%, 99.13% and 99.28% respectively.

corpus, dataset, email, (12 more...)

arXiv.org Artificial Intelligence

2206.02443

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Palestine > Gaza Strip > Gaza Governorate > Gaza (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy > Spam Filtering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback