AITopics | Machine Translation

Collaborating Authors

Machine Translation

"Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains."
– Definition from the European Association for Machine Translation (EAMT).

You can translate text of your choice by using free translators such as: CAPITA, Google Translate, SDL International, SYSTRAN.

News Overviews Instructional Materials AI-Alerts Classics

Cross-Lingual Knowledge Transfer for Clinical Phenotyping

Papaioannou, Jens-Michalis, Grundmann, Paul, van Aken, Betty, Samaras, Athanasios, Kyparissidis, Ilias, Giannakoulas, George, Gers, Felix, Löser, Alexander

arXiv.org Artificial IntelligenceAug-3-2022

Clinical phenotyping enables the automatic extraction of clinical conditions from patient records, which can be beneficial to doctors and clinics worldwide. However, current state-of-the-art models are mostly applicable to clinical notes written in English. We therefore investigate cross-lingual knowledge transfer strategies to execute this task for clinics that do not use the English language and have a small amount of in-domain data available. We evaluate these strategies for a Greek and a Spanish clinic leveraging clinical notes from different clinical domains such as cardiology, oncology and the ICU. Our results reveal two strategies that outperform the state-of-the-art: Translation-based methods in combination with domain-specific encoders and cross-lingual encoders plus adapters. We find that these strategies perform especially well for classifying rare phenotypes and we advise on which method to prefer in which situation. Our results show that using multilingual data overall improves clinical phenotyping models and can compensate for data sparseness.

dataset, knowledge transfer, translation, (13 more...)

arXiv.org Artificial Intelligence

2208.01912

Country:

Europe > Germany (0.14)
Europe > Greece > Central Macedonia > Thessaloniki (0.05)
North America > United States > Massachusetts (0.04)
(9 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Health Care Providers & Services (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.70)

Add feedback

Cross-lingual Approaches for the Detection of Adverse Drug Reactions in German from a Patient's Perspective

Raithel, Lisa, Thomas, Philippe, Roller, Roland, Sapina, Oliver, Möller, Sebastian, Zweigenbaum, Pierre

arXiv.org Artificial IntelligenceAug-3-2022

In this work, we present the first corpus for German Adverse Drug Reaction (ADR) detection in patient-generated content. The data consists of 4,169 binary annotated documents from a German patient forum, where users talk about health issues and get advice from medical doctors. As is common in social media data in this domain, the class labels of the corpus are very imbalanced. This and a high topic imbalance make it a very challenging dataset, since often, the same symptom can have several causes and is not always related to a medication intake. We aim to encourage further multi-lingual efforts in the domain of ADR detection and provide preliminary experiments for binary classification using different methods of zero- and few-shot learning based on a multi-lingual model. When fine-tuning XLM-RoBERTa first on English patient forum data and then on the new German data, we achieve an F1-score of 37.52 for the positive class. We make the dataset and models publicly available for the community.

computational linguistic, dataset, proceedings, (16 more...)

arXiv.org Artificial Intelligence

2208.02031

Country:

Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Europe > Germany > Berlin (0.04)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
(7 more...)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
Health & Medicine > Health Care Providers & Services (0.34)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Silo NLP's Participation at WAT2022

Parida, Shantipriya, Panda, Subhadarshi, Grönroos, Stig-Arne, Granroth-Wilding, Mark, Koistinen, Mika

arXiv.org Artificial IntelligenceAug-2-2022

This paper provides the system description of "Silo NLP's" submission to the Workshop on Asian Translation (WAT2022). We have participated in the Indic Multimodal tasks (English->Hindi, English->Malayalam, and English->Bengali Multimodal Translation). For text-only translation, we trained Transformers from scratch and fine-tuned mBART-50 models. For multimodal translation, we used the same mBART architecture and extracted object tags from the images to use as visual features concatenated with the text sequence. Our submission tops many tasks including English->Hindi multimodal translation (evaluation test), English->Malayalam text-only and multimodal translation (evaluation test), English->Bengali multimodal translation (challenge test), and English->Bengali text-only translation (evaluation test).

machine learning, natural language, translation, (15 more...)

arXiv.org Artificial Intelligence

2208.01296

Country:

Europe > Finland > Uusimaa > Helsinki (0.05)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > New York (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Sports > Tennis (0.32)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

No Pattern, No Recognition: a Survey about Reproducibility and Distortion Issues of Text Clustering and Topic Modeling

Silva, Marília Costa Rosendo, Siqueira, Felipe Alves, Tarrega, João Pedro Mantovani, Beinotti, João Vitor Pataca, Nunes, Augusto Sousa, Gardini, Miguel de Mattos, da Silva, Vinícius Adolfo Pereira, da Silva, Nádia Félix Felipe, de Carvalho, André Carlos Ponce de Leon Ferreira

arXiv.org Artificial IntelligenceAug-2-2022

Extracting knowledge from unlabeled texts using machine learning algorithms can be complex. Document categorization and information retrieval are two applications that may benefit from unsupervised learning (e.g., text clustering and topic modeling), including exploratory data analysis. However, the unsupervised learning paradigm poses reproducibility issues. The initialization can lead to variability depending on the machine learning algorithm. Furthermore, the distortions can be misleading when regarding cluster geometry. Amongst the causes, the presence of outliers and anomalies can be a determining factor. Despite the relevance of initialization and outlier issues for text clustering and topic modeling, the authors did not find an in-depth analysis of them. This survey provides a systematic literature review (2011-2022) of these subareas and proposes a common terminology since similar procedures have different terms. The authors describe research opportunities, trends, and open issues. The appendices summarize the theoretical background of the text vectorization, the factorization, and the clustering algorithms that are directly or indirectly related to the reviewed works.

algorithm, computational linguistic, reproducibility and distortion issue, (11 more...)

arXiv.org Artificial Intelligence

2208.01712

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
(36 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.47)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
(3 more...)

Add feedback

Sockeye 3: Fast Neural Machine Translation with PyTorch

Hieber, Felix, Denkowski, Michael, Domhan, Tobias, Barros, Barbara Darques, Ye, Celina Dong, Niu, Xing, Hoang, Cuong, Tran, Ke, Hsu, Benjamin, Nadejde, Maria, Lakew, Surafel, Mathur, Prashant, Currey, Anna, Federico, Marcello

arXiv.org Artificial IntelligenceAug-2-2022

Sockeye 3 is the latest version of the Sockeye toolkit for Neural Machine Translation (NMT). Now based on PyTorch, Sockeye 3 provides faster model implementations and more advanced features with a further streamlined codebase. This enables broader experimentation with faster iteration, efficient training of stronger and faster models, and the flexibility to move new ideas quickly from research to production. When running comparable models, Sockeye 3 is up to 126% faster than other PyTorch implementations on GPUs and up to 292% faster on CPUs. Sockeye 3 is open source software released under the Apache 2.0 license.

computational linguistic, proceedings, translation, (11 more...)

arXiv.org Artificial Intelligence

2207.05851

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Europe > Germany > Berlin (0.04)
(7 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On the Pitfalls of Analyzing Individual Neurons in Language Models

Antverg, Omer, Belinkov, Yonatan

arXiv.org Artificial IntelligenceAug-1-2022

While many studies have shown that linguistic information is encoded in hidden word representations, few have studied individual neurons, to show how and in which neurons it is encoded. Among these, the common approach is to use an external probe to rank neurons according to their relevance to some linguistic attribute, and to evaluate the obtained ranking using the same probe that produced it. We show two pitfalls in this methodology: 1. We separate them and draw conclusions on each. We show that these are not the same. We compare two recent ranking methods and a simple one we introduce, and evaluate them with regard to both of these aspects. Many studies attempt to interpret language models by predicting different linguistic properties from word representations, an approach called probing classifiers (Adi et al., 2017; Conneau et al., 2018, inter alia). A growing body of work focuses on individual neurons within the representation, attempting to show in which neurons some information is encoded, and whether it is localized (concentrated in a small set of neurons) or dispersed. Such knowledge may allow us to control the model's output (Bau et al., 2019), to reduce the number of parameters in the model (Voita et al., 2019; Sajjad et al., 2020), and to gain a general scientific knowledge of the model. The common methodology is to train a probe to predict some linguistic attribute from a representation, and to use it, in different ways, to rank the neurons of the representation according to their importance for the attribute in question.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2110.07483

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > Israel (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
(9 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Medicine

#artificialintelligenceJul-30-2022, 16:23:14 GMT

Why are patients not finding their doctors online?: Wendy Sue Swanson at TEDxNijmegen 2013.

wikia, world university, worlduniversity, (13 more...)

#artificialintelligence

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.16)
North America > United States > California > Santa Clara County > Stanford (0.15)
North America > United States > California > Santa Clara County > Palo Alto (0.05)
(10 more...)

Genre: Research Report (0.30)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.69)
Education > Educational Setting > Higher Education (0.47)
Government > Regional Government > North America Government > United States Government (0.31)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Communications > Collaboration (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.47)

Add feedback

New AI Model Translates 200 Languages, Making Technology Accessible to More People -- I-COM

#artificialintelligenceJul-29-2022, 08:40:24 GMT

Language is our lifeline to the world. But because high-quality translation tools don't exist for hundreds of languages, billions of people today can't access digital content or participate fully in conversations and communities online in their preferred or native languages. This is particularly an issue for hundreds of millions of people who speak the many languages of Africa and Asia. To help people connect better today and be part of the metaverse of tomorrow, our AI researchers created No Language Left Behind (NLLB), an effort to develop high-quality machine translation capabilities for most of the world's languages. Today, we're announcing an important breakthrough in NLLB: We've built a single AI model called NLLB-200, which translates 200 different languages with results far more accurate than what previous technology could accomplish.

i-com, new ai model translate 200, nllb

#artificialintelligence

Country:

Asia (0.31)
Africa (0.31)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback

Bilingual Terminology Extraction from Comparable E-Commerce Corpora

Jia, Hao, Gu, Shuqin, Zhang, Yuqi, Duan, Xiangyu

arXiv.org Artificial IntelligenceJul-29-2022

Bilingual terminologies are important machine translation resources in the field of e-commerce, which are usually either manually translated or automatically extracted from parallel data. The human translation is costly and e-commerce parallel corpora is very scarce. However, the comparable data in different languages in the same commodity field is abundant. In this paper, we propose a novel framework of extracting e-commercial bilingual terminologies from comparable data. Benefiting from the cross-lingual pre-training in e-commerce, our framework can make full use of the deep semantic relationship between source-side terminology and target-side sentence to extract corresponding target terminology. Experimental results on various language pairs show that our approaches achieve significantly better performance than various strong baselines.

target sentence, terminology, translation, (13 more...)

arXiv.org Artificial Intelligence

2104.07398

Country: Africa > Middle East > Morocco > Marrakesh-Safi Region > Marrakesh (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Services > e-Commerce Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Benchmarking Azerbaijani Neural Machine Translation

Chen, Chih-Chen, Chen, William

arXiv.org Artificial IntelligenceJul-29-2022

Little research has been done on Neural Machine Translation (NMT) for Azerbaijani. In this paper, we benchmark the performance of Azerbaijani-English NMT systems on a range of techniques and datasets. We evaluate which segmentation techniques work best on Azerbaijani translation and benchmark the performance of Azerbaijani NMT models across several domains of text. Our results show that while Unigram segmentation improves NMT performance and Azerbaijani translation models scale better with dataset quality than quantity, cross-domain generalization remains a challenge

computational linguistic, proceedings, translation, (13 more...)

arXiv.org Artificial Intelligence

2207.14473

Country:

Europe > Finland > Uusimaa > Helsinki (0.04)
Asia > Middle East > Iran (0.04)
Asia > Azerbaijan (0.04)

Genre: Research Report > New Finding (0.87)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback