deepl
EXCLUSIVE: DeepL to Release Interpretation Software for Japan
BERLIN - German technology firm DeepL, known for its artificial intelligence-powered translation software, plans to release a Japanese-language version of its real-time interpretation software by the end of this year, a senior company official has said. The age of machine interpretation has arrived, said Leonardo Doin, head of engineering and research for real-time voice translation service DeepL Voice, in a recent interview. You can just wear an earpiece and ... you can just hear it (foreign-language speech) in your language anytime, Doin said. The interpretation software will integrate DeepL's speech recognition and machine translation technologies, and speech synthesis technology that mimics the tones of the speakers' voices. It will be able to handle multiple languages and speakers, he said, with the software's use in online meetings of multinational companies in mind. DeepL plans to roll out the software on smartphones as well.
- Asia > Nepal (0.16)
- Pacific Ocean > North Pacific Ocean > Sea of Japan (0.05)
- North America > United States (0.05)
- (5 more...)
- Consumer Products & Services > Travel (0.69)
- Government (0.67)
- Leisure & Entertainment (0.50)
Gender Bias in English-to-Greek Machine Translation
Gkovedarou, Eleni, Daems, Joke, De Bruyne, Luna
As the demand for inclusive language increases, concern has grown over the susceptibility of machine translation (MT) systems to reinforce gender stereotypes. This study investigates gender bias in two commercial MT systems, Google Translate and DeepL, focusing on the understudied English-to-Greek language pair. We address three aspects of gender bias: i) male bias, ii) occupational stereotyping, and iii) errors in anti-stereotypical translations. Additionally, we explore the potential of prompted GPT-4o as a bias mitigation tool that provides both gender-explicit and gender-neutral alternatives when necessary. To achieve this, we introduce GendEL, a manually crafted bilingual dataset of 240 gender-ambiguous and unambiguous sentences that feature stereotypical occupational nouns and adjectives. We find persistent gender bias in translations by both MT systems; while they perform well in cases where gender is explicitly defined, with DeepL outperforming both Google Translate and GPT-4o in feminine gender-unambiguous sentences, they are far from producing gender-inclusive or neutral translations when the gender is unspecified. GPT-4o shows promise, generating appropriate gendered and neutral alternatives for most ambiguous cases, though residual biases remain evident.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Europe > Italy > Tuscany > Florence (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (15 more...)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
English Please: Evaluating Machine Translation for Multilingual Bug Reports
Accurate translation of bug reports is critical for efficient collaboration in global software development. In this study, we conduct the first comprehensive evaluation of machine translation (MT) performance on bug reports, analyzing the capabilities of DeepL, AWS Translate, and ChatGPT using data from the Visual Studio Code GitHub repository, specifically focusing on reports labeled with the english-please tag. To thoroughly assess the accuracy and effectiveness of each system, we employ multiple machine translation metrics, including BLEU, BERTScore, COMET, METEOR, and ROUGE. Our findings indicate that DeepL consistently outperforms the other systems across most automatic metrics, demonstrating strong lexical and semantic alignment. AWS Translate performs competitively, particularly in METEOR, while ChatGPT lags in key metrics. This study underscores the importance of domain adaptation for translating technical texts and offers guidance for integrating automated translation into bug-triaging workflows. Moreover, our results establish a foundation for future research to refine machine translation solutions for specialized engineering contexts. The code and dataset for this paper are available at GitHub: https://github.com/av9ash/gitbugs/tree/main/multilingual.
- North America > United States > California > Santa Clara County > Sunnyvale (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
Fine-tuning multilingual language models in Twitter/X sentiment analysis: a study on Eastern-European V4 languages
Filip, Tomáš, Pavlíček, Martin, Sosík, Petr
The aspect-based sentiment analysis (ABSA) is a standard NLP task with numerous approaches and benchmarks, where large language models (LLM) represent the current state-of-the-art. We focus on ABSA subtasks based on Twitter/X data in underrepresented languages. On such narrow tasks, small tuned language models can often outperform universal large ones, providing available and cheap solutions. We fine-tune several LLMs (BERT, BERTweet, Llama2, Llama3, Mistral) for classification of sentiment towards Russia and Ukraine in the context of the ongoing military conflict. The training/testing dataset was obtained from the academic API from Twitter/X during 2023, narrowed to the languages of the V4 countries (Czech Republic, Slovakia, Poland, Hungary). Then we measure their performance under a variety of settings including translations, sentiment targets, in-context learning and more, using GPT4 as a reference model. We document several interesting phenomena demonstrating, among others, that some models are much better fine-tunable on multilingual Twitter tasks than others, and that they can reach the SOTA level with a very small training set. Finally we identify combinations of settings providing the best results.
- Government > Military (0.66)
- Information Technology > Security & Privacy (0.46)
When Does Translation Require Context? A Data-driven, Multilingual Exploration
Fernandes, Patrick, Yin, Kayo, Liu, Emmy, Martins, André F. T., Neubig, Graham
Although proper handling of discourse significantly contributes to the quality of machine translation (MT), these improvements are not adequately measured in common translation quality metrics. Recent works in context-aware MT attempt to target a small set of discourse phenomena during evaluation, however not in a fully systematic way. In this paper, we develop the Multilingual Discourse-Aware (MuDA) benchmark, a series of taggers that identify and evaluate model performance on discourse phenomena in any given dataset. The choice of phenomena is inspired by a novel methodology to systematically identify translations requiring context. We confirm the difficulty of previously studied phenomena while uncovering others that were previously unaddressed. We find that common context-aware MT models make only marginal improvements over context-agnostic models, which suggests these models do not handle these ambiguities effectively. We release code and data for 14 language pairs to encourage the MT community to focus on accurately capturing discourse phenomena.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Portugal > Lisbon > Lisbon (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- (12 more...)
Automatic Discrimination of Human and Neural Machine Translation in Multilingual Scenarios
Chichirau, Malina, van Noord, Rik, Toral, Antonio
We tackle the task of automatically discriminating between human and machine translations. As opposed to most previous work, we perform experiments in a multilingual setting, considering multiple languages and multilingual pretrained language models. We show that a classifier trained on parallel data with a single source language (in our case German-English) can still perform well on English translations that come from different source languages, even when the machine translations were produced by other systems than the one it was trained on. Additionally, we demonstrate that incorporating the source text in the input of a multilingual classifier improves (i) its accuracy and (ii) its robustness on cross-system evaluation, compared to a monolingual classifier. Furthermore, we find that using training data from multiple source languages (German, Russian, and Chinese) tends to improve the accuracy of both monolingual and multilingual classifiers. Finally, we show that bilingual classifiers and classifiers trained on multiple source languages benefit from being trained on longer text sequences, rather than on sentences.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Italy > Tuscany > Florence (0.07)
- Asia (0.04)
- (6 more...)
DeepL targets AI translation for enterprises with fresh $100 million
Check out all the on-demand sessions from the Intelligent Security Summit here. Seeking to target enterprise customers with AI language translation, Cologne, Germany-based DeepL announced a new funding raise that public reports estimate at well over $100 million. Language translation is an increasingly critical function for enterprises working across geographies and different demographics. Basic language translation capabilities have been available on for decades -- for example, services such as Google Translate. But the challenge has been enabling more advanced translation for business use cases that capture not just the literal meaning but the right tone and context.
DeepL, the AI-based language translator, raises over $100M at a $1B+ valuation • TechCrunch
Artificial intelligence startups, and (thanks to GPT and Open AI) specifically those helping humans communicate with each other, are commanding a lot of interest from investors, and today the latest of these is announcing a big round of funding. DeepL, a startup that provides instant translation-as-a-service both to businesses and to individuals -- competing with Google, Bing and other online tools -- has confirmed a fundraise at a €1 billion valuation (just over $1 billion at today's rates). Cologne, Germany-based DeepL is not disclosing the full amount that it's raised -- it doesn't want to focus on this aspect, CEO and founder Jaroslaw Kutylowski said in an interview -- but as we were working on this story we heard a range of figures. At one end, an investor that was pitched on the funding told TechCrunch that DeepL was aiming to raise $125 million. At the other end, a report with a rumor about the funding from back in November said the amount was around $100 million.
Landscape Analysis: Neural Machine Translation
The Big 3, when it comes to neural machine translation (NMT), are Google, Microsoft, and Amazon. Among this group, Google is the most dominant in terms of supporting 109 languages compared to Microsoft's 73, and Amazon's 55. Overall, Google is flush with talent, data, and resources, and they leverage those assets to maintain their dominant position. With that said, Google Translate is a tool that businesses like Native can license in order to leverage best-in-class technology. In this sense, Google is currently a key partner and will only become a competitor when Native builds out its own neural translation engine.