AITopics | Machine Translation

Collaborating Authors

Machine Translation

"Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains."
– Definition from the European Association for Machine Translation (EAMT).

You can translate text of your choice by using free translators such as: CAPITA, Google Translate, SDL International, SYSTRAN.

News Overviews Instructional Materials AI-Alerts Classics

Hi Guys or Hi Folks? Benchmarking Gender-Neutral Machine Translation with the GeNTE Corpus

Piergentili, Andrea, Savoldi, Beatrice, Fucci, Dennis, Negri, Matteo, Bentivogli, Luisa

arXiv.org Artificial IntelligenceOct-8-2023

Gender inequality is embedded in our communication practices and perpetuated in translation technologies. This becomes particularly apparent when translating into grammatical gender languages, where machine translation (MT) often defaults to masculine and stereotypical representations by making undue binary gender assumptions. Our work addresses the rising demand for inclusive language by focusing head-on on gender-neutral translation from English to Italian. We start from the essentials: proposing a dedicated benchmark and exploring automated evaluation methods. First, we introduce GeNTE, a natural, bilingual test set for gender-neutral translation, whose creation was informed by a survey on the perception and use of neutral language. Based on GeNTE, we then overview existing reference-based evaluation approaches, highlight their limits, and propose a reference-free method more suitable to assess gender-neutral translation.

benchmarking gender-neutral machine translation, gente corpus, hi folk, (1 more...)

arXiv.org Artificial Intelligence

2310.05294

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

sign.mt: Real-Time Multilingual Sign Language Translation Application

Moryossef, Amit

arXiv.org Artificial IntelligenceOct-8-2023

This demo paper presents sign.mt, an open-source application pioneering real-time multilingual bi-directional translation between spoken and signed languages. Harnessing state-of-the-art open-source models, this tool aims to address the communication divide between the hearing and the deaf, facilitating seamless translation in both spoken-to-signed and signed-to-spoken translation directions. Promising reliable and unrestricted communication, sign.mt offers offline functionality, crucial in areas with limited internet connectivity. It further enhances user engagement by offering customizable photo-realistic sign language avatars, thereby encouraging a more personalized and authentic user experience. Licensed under CC BY-NC-SA 4.0, sign.mt signifies an important stride towards open, inclusive communication. The app can be used, and modified for personal and academic uses, and even supports a translation API, fostering integration into a wider range of applications. However, it is by no means a finished product. We invite the NLP community to contribute towards the evolution of sign.mt. Whether it be the integration of more refined models, the development of innovative pipelines, or user experience improvements, your contributions can propel this project to new heights. Available at https://sign.mt, it stands as a testament to what we can achieve together, as we strive to make communication accessible to all.

application, pipeline, translation, (9 more...)

arXiv.org Artificial Intelligence

2310.05064

Country:

North America > Dominican Republic (0.04)
Europe > Croatia > Dubrovnik-Neretva County > Dubrovnik (0.04)

Genre: Research Report (0.51)

Industry: Education > Curriculum > Subject-Specific Education (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.54)

Add feedback

Evaluating Self-Supervised Speech Representations for Indigenous American Languages

Chen, Chih-Chen, Chen, William, Zevallos, Rodolfo, Ortega, John E.

arXiv.org Artificial IntelligenceOct-8-2023

The application of self-supervision to speech representation learning has garnered significant interest in recent years, due to its scalability to large amounts of unlabeled data. However, much progress, both in terms of pre-training and downstream evaluation, has remained concentrated in monolingual models that only consider English. Few models consider other languages, and even fewer consider indigenous ones. In our submission to the New Language Track of the ASRU 2023 ML-SUPERB Challenge, we present an ASR corpus for Quechua, an indigenous South American Language. We benchmark the efficacy of large SSL models on Quechua, along with 6 other indigenous languages such as Guarani and Bribri, on low-resource ASR. Our results show surprisingly strong performance by state-of-the-art SSL models, showing the potential generalizability of large-scale models to real-world data.

computational linguistic, indigenous language, quechua, (13 more...)

arXiv.org Artificial Intelligence

2310.03639

Country:

South America > Brazil (0.05)
North America > Canada > Ontario > Toronto (0.05)
South America > Bolivia (0.05)
(18 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.70)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.49)

Add feedback

A New Dataset for End-to-End Sign Language Translation: The Greek Elementary School Dataset

Voskou, Andreas, Panousis, Konstantinos P., Partaourides, Harris, Tolias, Kyriakos, Chatzis, Sotirios

arXiv.org Artificial IntelligenceOct-7-2023

Automatic Sign Language Translation (SLT) is a research avenue of great societal impact. End-to-End SLT facilitates the interaction of Hard-of-Hearing (HoH) with hearing people, thus improving their social life and opportunities for participation in social life. However, research within this frame of reference is still in its infancy, and current resources are particularly limited. Existing SLT methods are either of low translation ability or are trained and evaluated on datasets of restricted vocabulary and questionable real-world value. A characteristic example is Phoenix2014T benchmark dataset, which only covers weather forecasts in German Sign Language. To address this shortage of resources, we introduce a newly constructed collection of 29653 Greek Sign Language video-translation pairs which is based on the official syllabus of Greek Elementary School. Our dataset covers a wide range of subjects. We use this novel dataset to train recent state-of-the-art Transformer-based methods widely used in SLT research. Our results demonstrate the potential of our introduced dataset to advance SLT research by offering a favourable balance between usability and real-world value.

dataset, end-to-end sign language translation, greek elementary school dataset, (1 more...)

arXiv.org Artificial Intelligence

2310.04753

Genre: Research Report > New Finding (0.53)

Industry:

Education > Curriculum > Subject-Specific Education (1.00)
Education > Educational Setting > K-12 Education (0.60)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.60)

Add feedback

Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach

Chen, Junkun, Xue, Jian, Wang, Peidong, Pan, Jing, Li, Jinyu

arXiv.org Artificial IntelligenceOct-6-2023

Simultaneous Speech-to-Text translation serves a critical role in real-time crosslingual communication. Despite the advancements in recent years, challenges remain in achieving stability in the translation process, a concern primarily manifested in the flickering of partial results. In this paper, we propose a novel revision-controllable method designed to address this issue. Our method introduces an allowed revision window within the beam search pruning process to screen out candidate translations likely to cause extensive revisions, leading to a substantial reduction in flickering and, crucially, providing the capability to completely eliminate flickering. The experiments demonstrate the proposed method can significantly improve the decoding stability without compromising substantially on the translation quality.

stability, translation, translation quality, (14 more...)

arXiv.org Artificial Intelligence

2310.04399

Country:

North America > United States (0.04)
Europe > Belgium (0.04)

Genre: Research Report (0.64)

Industry:

Media > Film (0.34)
Leisure & Entertainment (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

On convex decision regions in deep network representations

Tětková, Lenka, Brüsch, Thea, Scheidt, Teresa Karen, Mager, Fabian Martin, Aagaard, Rasmus Ørtoft, Foldager, Jonathan, Alstrøm, Tommy Sonne, Hansen, Lars Kai

arXiv.org Artificial IntelligenceOct-6-2023

Current work on human-machine alignment aims at understanding machine-learned latent spaces and their correspondence to human representations. G{\"a}rdenfors' conceptual spaces is a prominent framework for understanding human representations. Convexity of object regions in conceptual spaces is argued to promote generalizability, few-shot learning, and interpersonal alignment. Based on these insights, we investigate the notion of convexity of concept regions in machine-learned latent spaces. We develop a set of tools for measuring convexity in sampled data and evaluate emergent convexity in layered representations of state-of-the-art deep networks. We show that convexity is robust to basic re-parametrization and, hence, meaningful as a quality of machine-learned latent spaces. We find that approximate convexity is pervasive in neural representations in multiple application domains, including models of images, audio, human activity, text, and medical images. Generally, we observe that fine-tuning increases the convexity of label regions. We find evidence that pretraining convexity of class label regions predicts subsequent fine-tuning performance.

convexity, decision region, representation, (15 more...)

arXiv.org Artificial Intelligence

2305.17154

Country:

Europe > Denmark > Capital Region > Kongens Lyngby (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Immunology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.66)

Add feedback

DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers

Langedijk, Anna, Mohebbi, Hosein, Sarti, Gabriele, Zuidema, Willem, Jumelet, Jaap

arXiv.org Artificial IntelligenceOct-5-2023

In recent years, many interpretability methods have been proposed to help interpret the internal states of Transformer-models, at different levels of precision and complexity. Here, to analyze encoder-decoder Transformers, we propose a simple, new method: DecoderLens. Inspired by the LogitLens (for decoder-only Transformers), this method involves allowing the decoder to cross-attend representations of intermediate encoder layers instead of using the final encoder output, as is normally done in encoder-decoder models. The method thus maps previously uninterpretable vector representations to human-interpretable sequences of words or symbols. We report results from the DecoderLens applied to models trained on question answering, logical reasoning, speech recognition and machine translation. The DecoderLens reveals several specific subtasks that are solved at low or intermediate layers, shedding new light on the information flow inside the encoder component of this important class of models.

computational linguistic, decoderlen, encoder layer, (13 more...)

arXiv.org Artificial Intelligence

2310.03686

Country:

Asia > Middle East > Republic of Türkiye (0.95)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
(12 more...)

Genre: Research Report (0.82)

Industry: Government > Regional Government > Asia Government > Middle East Government > Republic of Türkiye Government (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.66)

Add feedback

Jury: A Comprehensive Evaluation Toolkit

Cavusoglu, Devrim, Sert, Ulas, Sen, Secil, Altinuc, Sinan

arXiv.org Artificial IntelligenceOct-3-2023

Evaluation plays a critical role in deep learning as a fundamental block of any prediction-based system. However, the vast number of Natural Language Processing (NLP) tasks and the development of various metrics have led to challenges in evaluating different systems with different metrics. To address these challenges, we introduce jury, a toolkit that provides a unified evaluation framework with standardized structures for performing evaluation across different tasks and metrics. The objective of jury is to standardize and improve metric evaluation for all systems and aid the community in overcoming the challenges in evaluation. Since its open-source release, jury has reached a wide audience and is available at https://github.com/obss/jury.

computation, evaluation, metric, (12 more...)

arXiv.org Artificial Intelligence

2310.0204

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(7 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

FRMT: A Benchmark for Few-Shot Region-Aware Machine Translation

Riley, Parker, Dozat, Timothy, Botha, Jan A., Garcia, Xavier, Garrette, Dan, Riesa, Jason, Firat, Orhan, Constant, Noah

arXiv.org Artificial IntelligenceOct-3-2023

We present FRMT, a new dataset and evaluation benchmark for Few-shot Region-aware Machine Translation, a type of style-targeted translation. The dataset consists of professional translations from English into two regional variants each of Portuguese and Mandarin Chinese. Source documents are selected to enable detailed analysis of phenomena of interest, including lexically distinct terms and distractor terms. We explore automatic evaluation metrics for FRMT and validate their correlation with expert human evaluation across both region-matched and mismatched rating scenarios. Finally, we present a number of baseline models for this task, and offer guidelines for how researchers can train, evaluate, and compare their own models. Our dataset and evaluation code are publicly available: https://bit.ly/frmt-task

computational linguistic, exemplar, translation, (15 more...)

arXiv.org Artificial Intelligence

2210.00193

Country:

Asia > Taiwan (0.05)
North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
(13 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Defending Against Authorship Identification Attacks

Wang, Haining

arXiv.org Artificial IntelligenceOct-2-2023

Authorship identification has proven unsettlingly effective in inferring the identity of the author of an unsigned document, even when sensitive personal information has been carefully omitted. In the digital era, individuals leave a lasting digital footprint through their written content, whether it is posted on social media, stored on their employer's computers, or located elsewhere. When individuals need to communicate publicly yet wish to remain anonymous, there is little available to protect them from unwanted authorship identification. This unprecedented threat to privacy is evident in scenarios such as whistle-blowing. Proposed defenses against authorship identification attacks primarily aim to obfuscate one's writing style, thereby making it unlinkable to their pre-existing writing, while concurrently preserving the original meaning and grammatical integrity. The presented work offers a comprehensive review of the advancements in this research area spanning over the past two decades and beyond. It emphasizes the methodological frameworks of modification and generation-based strategies devised to evade authorship identification attacks, highlighting joint efforts from the differential privacy community. Limitations of current research are discussed, with a spotlight on open challenges and potential research avenues.

computational linguistic, differential privacy, proceedings, (14 more...)

arXiv.org Artificial Intelligence

2310.01568

Country:

North America > United States > Washington > King County > Seattle (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > New York > New York County > New York City (0.04)
(32 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Experimental Study (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Law > Civil Rights & Constitutional Law (0.92)
Media (0.92)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
(7 more...)

Add feedback