AITopics | Machine Translation

Collaborating Authors

Machine Translation

"Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains."
– Definition from the European Association for Machine Translation (EAMT).

You can translate text of your choice by using free translators such as: CAPITA, Google Translate, SDL International, SYSTRAN.

News Overviews Instructional Materials AI-Alerts Classics

An Experimental Study on Pretraining Transformers from Scratch for IR

Lassance, Carlos, Déjean, Hervé, Clinchant, Stéphane

arXiv.org Artificial IntelligenceJan-25-2023

Finetuning Pretrained Language Models (PLM) for IR has been de facto the standard practice since their breakthrough effectiveness few years ago. But, is this approach well understood? In this paper, we study the impact of the pretraining collection on the final IR effectiveness. In particular, we challenge the current hypothesis that PLM shall be trained on a large enough generic collection and we show that pretraining from scratch on the collection of interest is surprisingly competitive with the current approach. We benchmark first-stage ranking rankers and cross-encoders for reranking on the task of general passage retrieval on MSMARCO, Mr-Tydi for Arabic, Japanese and Russian, and TripClick for specific domain. Contrary to popular belief, we show that, for finetuning first-stage rankers, models pretrained solely on their collection have equivalent or better effectiveness compared to more general models. However, there is a slight effectiveness drop for rerankers pretrained only on the target collection. Overall, our study sheds a new light on the role of the pretraining collection and should make our community ponder on building specialized models by pretraining from scratch. Last but not least, doing so could enable better control of efficiency, data bias and replicability, which are key research questions for the IR community.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2301.10444

Country:

North America > United States > New York > New York County > New York City (0.05)
North America > Dominican Republic (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.34)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.95)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

A Holistic Cascade System, benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation

Huang, Wen-Chin, Peloquin, Benjamin, Kao, Justine, Wang, Changhan, Gong, Hongyu, Salesky, Elizabeth, Adi, Yossi, Lee, Ann, Chen, Peng-Jen

arXiv.org Artificial IntelligenceJan-25-2023

Expressive speech-to-speech translation (S2ST) aims to transfer prosodic attributes of source speech to target speech while maintaining translation accuracy. Existing research in expressive S2ST is limited, typically focusing on a single expressivity aspect at a time. Likewise, this research area lacks standard evaluation protocols and well-curated benchmark datasets. In this work, we propose a holistic cascade system for expressive S2ST, combining multiple prosody transfer techniques previously considered only in isolation. We curate a benchmark expressivity test set in the TV series domain and explored a second dataset in the audiobook domain. Finally, we present a human evaluation protocol to assess multiple expressive dimensions across speech pairs. Experimental results indicate that bi-lingual annotators can assess the quality of expressive preservation in S2ST systems, and the holistic modeling approach outperforms single-aspect systems. Audio samples can be accessed through our demo webpage: https://facebookresearch.github.io/speech_translation/cascade_expressive_s2st.

artificial intelligence, machine translation, natural language, (17 more...)

arXiv.org Artificial Intelligence

2301.10606

Genre: Research Report (0.82)

Industry: Media (0.70)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

NEW: Document Translation feature available on Eden AI

#artificialintelligenceJan-24-2023, 14:45:30 GMT

Quickly and easily translate multiple documents with just a few simple steps. With Eden AI, you can start translating your documents in seconds and save valuable time and resources. While Machine Translation refers to the translation of a text into another language using rules, statics or ML technics, Document Translation can be used to translate multiple and complex documents into all supported languages and dialects while maintaining the original document structure and data format. Document Translation API can be used to support multi-lingual websites, chatbot, mobile applications etc. It can translate the document in real-time or as a batch process.

artificial intelligence, natural language, translation, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.71)

Add feedback

Cross-lingual German Biomedical Information Extraction: from Zero-shot to Human-in-the-Loop

Liang, Siting, Hartmann, Mareike, Sonntag, Daniel

arXiv.org Artificial IntelligenceJan-24-2023

This paper presents our project proposal for extracting biomedical information from German clinical narratives with limited amounts of annotations. We first describe the applied strategies in transfer learning and active learning for solving our problem. After that, we discuss the design of the user interface for both supplying model inspection and obtaining user annotations in the interactive environment.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2301.09908

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Jordan (0.04)
(5 more...)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Health Care Technology > Medical Record (0.47)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.72)

Add feedback

Interactive-Chain-Prompting: Ambiguity Resolution for Crosslingual Conditional Generation with Interaction

Pilault, Jonathan, Garcia, Xavier, Bražinskas, Arthur, Firat, Orhan

arXiv.org Artificial IntelligenceJan-24-2023

Crosslingual conditional generation (e.g., machine translation) has long enjoyed the benefits of scaling. Nonetheless, there are still issues that scale alone may not overcome. A source query in one language, for instance, may yield several translation options in another language without any extra context. Only one translation could be acceptable however, depending on the translator's preferences and goals. Choosing the incorrect option might significantly affect translation usefulness and quality. We propose a novel method interactive-chain prompting -- a series of question, answering and generation intermediate steps between a Translator model and a User model -- that reduces translations into a list of subproblems addressing ambiguities and then resolving such subproblems before producing the final text to be translated. To check ambiguity resolution capabilities and evaluate translation quality, we create a dataset exhibiting different linguistic phenomena which leads to ambiguities at inference for four languages. To encourage further exploration in this direction, we release all datasets. We note that interactive-chain prompting, using eight interactions as exemplars, consistently surpasses prompt-based methods with direct access to background information to resolve ambiguities.

artificial intelligence, machine translation, natural language, (14 more...)

arXiv.org Artificial Intelligence

2301.10309

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > Denmark (0.04)
(13 more...)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Language Agnostic Data-Driven Inverse Text Normalization

Chen, Szu-Jui, Paul, Debjyoti, Pang, Yutong, Su, Peng, Zhang, Xuedong

arXiv.org Artificial IntelligenceJan-23-2023

With the emergence of automatic speech recognition (ASR) models, converting the spoken form text (from ASR) to the written form is in urgent need. This inverse text normalization (ITN) problem attracts the attention of researchers from various fields. Recently, several works show that data-driven ITN methods can output high-quality written form text. Due to the scarcity of labeled spoken-written datasets, the studies on non-English data-driven ITN are quite limited. In this work, we propose a language-agnostic data-driven ITN framework to fill this gap. Specifically, we leverage the data augmentation in conjunction with neural machine translated data for low resource languages. Moreover, we design an evaluation method for language agnostic ITN model when only English data is available. Our empirical evaluation shows this language agnostic modeling approach is effective for low resource languages while preserving the performance for high resource languages.

accuracy, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2301.08506

Country:

North America > United States > Texas (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Speech (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Representing Interlingual Meaning in Lexical Databases

Giunchiglia, Fausto, Bella, Gabor, Nair, Nandu Chandran, Chi, Yang, Xu, Hao

arXiv.org Artificial IntelligenceJan-22-2023

In today's multilingual lexical databases, the majority of the world's languages are under-represented. Beyond a mere issue of resource incompleteness, we show that existing lexical databases have structural limitations that result in a reduced expressivity on culturally-specific words and in mapping them across languages. In particular, the lexical meaning space of dominant languages, such as English, is represented more accurately while linguistically or culturally diverse languages are mapped in an approximate manner. Our paper assesses state-of-the-art multilingual lexical databases and evaluates their strengths and limitations with respect to their expressivity on lexical phenomena of linguistic diversity.

artificial intelligence, mapping, natural language, (19 more...)

arXiv.org Artificial Intelligence

2301.09169

Country:

Europe > United Kingdom > UK North Sea (0.07)
Atlantic Ocean > North Atlantic Ocean > North Sea > UK North Sea (0.07)
Asia > India (0.05)
(9 more...)

Genre:

Overview (0.68)
Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

AI translation firm unveils 'world-first' timeline to singularity

#artificialintelligenceJan-21-2023, 20:45:43 GMT

An Italian company has unveiled a novel method of measuring AI progress: analyzing improvements in machine translation. Translated, a provider of translation services, used the approach to predict when we will achieve singularity, a vague concept often defined as the point where machines become smarter than humans. The Rome-based business sets this milestone at the moment when AI provides "a perfect translation." According to the new research, this arrives when machine translation (MT) is better than top human translations. Translated's analysis suggests this will happen before the end of the 2020s.

artificial intelligence, natural language, translation, (13 more...)

#artificialintelligence

Genre: Research Report (0.90)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Google Research, 2022 & Beyond: Language, Vision and Generative Models – Google AI Blog

#artificialintelligenceJan-21-2023, 00:08:23 GMT

I've always been interested in computers because of their ability to help people better understand the world around them. Over the last decade, much of the research done at Google has been in pursuit of a similar vision -- to help people better understand the world around them and get things done. We want to build more capable machines that partner with people to accomplish a huge variety of tasks. Analysis and synthesis tasks, like crafting new documents or emails from a few sentences of guidance, or partnering with people to jointly write software together. We want to solve complex mathematical or scientific problems. Transform modalities, or translate the world's information into any language. Diagnose complex diseases, or understand the physical world. We've demonstrated early versions of some of these capabilities in research artifacts, and we've partnered with many teams across Google to ship some of these capabilities in Google products that touch the lives of billions of users. But the most exciting aspects of this journey still lie ahead! With this post, I am kicking off a series in which researchers across Google will highlight some exciting progress we've made in 2022 and present our vision for 2023 and beyond. I will begin with a discussion of language, computer vision, multi-modal models, and generative machine learning models.

large language model, machine learning, natural language, (22 more...)

#artificialintelligence

Genre:

Research Report (0.46)
Overview (0.46)

Industry:

Information Technology > Services (0.54)
Education (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Exploring Methods for Building Dialects-Mandarin Code-Mixing Corpora: A Case Study in Taiwanese Hokkien

Lu, Sin-En, Lu, Bo-Han, Lu, Chao-Yi, Tsai, Richard Tzong-Han

arXiv.org Artificial IntelligenceJan-21-2023

In natural language processing (NLP), code-mixing (CM) is a challenging task, especially when the mixed languages include dialects. In Southeast Asian countries such as Singapore, Indonesia, and Malaysia, Hokkien-Mandarin is the most widespread code-mixed language pair among Chinese immigrants, and it is also common in Taiwan. However, dialects such as Hokkien often have a scarcity of resources and the lack of an official writing system, limiting the development of dialect CM research. In this paper, we propose a method to construct a Hokkien-Mandarin CM dataset to mitigate the limitation, overcome the morphological issue under the Sino-Tibetan language family, and offer an efficient Hokkien word segmentation method through a linguistics-based toolkit. Furthermore, we use our proposed dataset and employ transfer learning to train the XLM (cross-lingual language model) for translation tasks. To fit the code-mixing scenario, we adapt XLM slightly. We found that by using linguistic knowledge, rules, and language tags, the model produces good results on CM data translation while maintaining monolingual translation quality.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2301.08937

Country:

Asia > Singapore (0.24)
Asia > Malaysia (0.24)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(13 more...)

Genre: Research Report > New Finding (0.67)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.89)

Add feedback