AITopics

2501.08523

Country:

Europe (0.93)
North America > United States (0.46)
Asia > China (0.46)
Asia > Japan (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

arXiv.org Artificial IntelligenceJan-14-2025

Optimizing Speech Multi-View Feature Fusion through Conditional Computation

Shan, Weiqiao, Zhang, Yuhao, Han, Yuchen, Li, Bei, Zhao, Xiaofeng, Li, Yuang, Zhang, Min, Yang, Hao, Xiao, Tong, Zhu, Jingbo

Recent advancements have highlighted the efficacy of self-supervised learning (SSL) features in various speech-related tasks, providing lightweight and versatile multi-view speech representations. However, our study reveals that while SSL features expedite model convergence, they conflict with traditional spectral features like FBanks in terms of update directions. In response, we propose a novel generalized feature fusion framework grounded in conditional computation, featuring a gradient-sensitive gating network and a multi-stage dropout strategy. This framework mitigates feature conflicts and bolsters model robustness to multi-view input features. By integrating SSL and spectral features, our approach accelerates convergence and maintains performance on par with spectral models across multiple speech translation tasks on the MUSTC dataset.

artificial intelligence, machine learning, natural language, (15 more...)

2501.08057

Country: Asia > China > Liaoning Province (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.68)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)

Bădoiu, Vlad-Andrei, Dumitru, Mihai-Valentin, Gherghescu, Alexandru M., Agache, Alexandru, Raiciu, Costin

LLMic: Romanian Foundation Language Model

arXiv.org Artificial IntelligenceJan-13-2025

Recent advances in Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks with commercial models leading the way. While open models usually operate at a smaller scale, they maintain competitiveness through specialization and fine-tuning. However, a significant challenge persists: open models often underperform in low-resource languages due to limited representation in the training corpus. In this paper, we present LLMic, a bilingual foundation language model designed specifically for the Romanian Language. We document the complete process of pretraining a foundation model for a low-resource language, including corpus construction, architecture selection, and hyper-parameter optimization. Our evaluation demonstrates that LLMic can be specialized for tasks in the target language, achieving results comparable to other much larger open models. We show that fine-tuning LLMic for language translation after the initial pretraining phase outperforms existing solutions in English-to-Romanian translation tasks. This opens the path for efficient large-scale processing for the Romanian language community, using the much smaller LLMic model

arxiv preprint arxiv, dataset, llmic, (10 more...)

2501.07721

Country:

Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.05)
North America > United States > Virginia (0.04)
North America > United States > District of Columbia > Washington (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Borchert, Philipp, Vulić, Ivan, Moens, Marie-Francine, De Weerdt, Jochen

Language Fusion for Parameter-Efficient Cross-lingual Transfer

arXiv.org Artificial IntelligenceJan-12-2025

Limited availability of multilingual text corpora for training language models often leads to poor performance on downstream tasks due to undertrained representation spaces for languages other than English. This 'under-representation' has motivated recent cross-lingual transfer methods to leverage the English representation space by e.g. mixing English and 'non-English' tokens at the input level or extending model parameters to accommodate new languages. However, these approaches often come at the cost of increased computational complexity. We propose Fusion forLanguage Representations (FLARE) in adapters, a novel method that enhances representation quality and downstream performance for languages other than English while maintaining parameter efficiency. FLARE integrates source and target language representations within low-rank (LoRA) adapters using lightweight linear transformations, maintaining parameter efficiency while improving transfer performance. A series of experiments across representative cross-lingual natural language understanding tasks, including natural language inference, question-answering and sentiment analysis, demonstrate FLARE's effectiveness. FLARE achieves performance improvements of 4.9% for Llama 3.1 and 2.2% for Gemma~2 compared to standard LoRA fine-tuning on question-answering tasks, as measured by the exact match metric.

large language model, machine learning, natural language, (19 more...)

2501.06892

Country:

Europe (1.00)
North America > United States (0.67)
Asia > Middle East (0.46)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.67)

arXiv.org Artificial IntelligenceJan-11-2025

Towards Global AI Inclusivity: A Large-Scale Multilingual Terminology Dataset (GIST)

Liu, Jiarui, Ouzzani, Iman, Li, Wenkai, Zhang, Lechen, Ou, Tianyue, Bouamor, Houda, Jin, Zhijing, Diab, Mona

The field of machine translation has achieved significant advancements, yet domain-specific terminology translation, particularly in AI, remains challenging. We introduce GIST, a large-scale multilingual AI terminology dataset containing 5K terms extracted from top AI conference papers spanning 2000 to 2023. The terms are translated into Arabic, Chinese, French, Japanese, and Russian using a hybrid framework that combines LLMs for extraction with human expertise for translation. The dataset's quality is benchmarked against existing resources, demonstrating superior translation accuracy through crowdsourced evaluation. GIST is integrated into translation workflows using post-translation refinement methods that require no retraining, where LLM prompting consistently improves BLEU and COMET scores. A web demonstration on the ACL Anthology platform highlights its practical application, showcasing improved accessibility for non-English speakers. This work aims to address critical gaps in AI terminology resources and fosters global inclusivity and collaboration in AI research.

large language model, machine learning, precision mean 0, (17 more...)

2412.18367

Country:

North America > United States (1.00)
Asia (1.00)
Europe (0.92)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

AFRIDOC-MT: Document-level MT Corpus for African Languages

Alabi, Jesujoba O., Azime, Israel Abebe, Zhang, Miaoran, España-Bonet, Cristina, Bawden, Rachel, Zhu, Dawei, Adelani, David Ifeoluwa, Odoje, Clement Oyeleke, Akinade, Idris, Maab, Iffat, David, Davis, Muhammad, Shamsuddeen Hassan, Putini, Neo, Ademuyiwa, David O., Caines, Andrew, Klakow, Dietrich

This paper introduces AFRIDOC-MT, a document-level multi-parallel translation dataset covering English and five African languages: Amharic, Hausa, Swahili, Yor\`ub\'a, and Zulu. The dataset comprises 334 health and 271 information technology news documents, all human-translated from English to these languages. We conduct document-level translation benchmark experiments by evaluating neural machine translation (NMT) models and large language models (LLMs) for translations between English and these languages, at both the sentence and pseudo-document levels. These outputs are realigned to form complete documents for evaluation. Our results indicate that NLLB-200 achieved the best average performance among the standard NMT models, while GPT-4o outperformed general-purpose LLMs. Fine-tuning selected models led to substantial performance gains, but models trained on sentences struggled to generalize effectively to longer documents. Furthermore, our analysis reveals that some LLMs exhibit issues such as under-generation, repetition of words or phrases, and off-target translations, especially for African languages.

large language model, machine learning, translation, (20 more...)

2501.06374

Country:

North America > United States (1.00)
Asia (1.00)
Africa (1.00)
Europe > United Kingdom > England (0.27)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Addressing speaker gender bias in large scale speech translation systems

Bansal, Shubham, Joshi, Vikas, Chadha, Harveen, Mehta, Rupeshkumar, Li, Jinyu

This study addresses the issue of speaker gender bias in Speech Translation (ST) systems, which can lead to offensive and inaccurate translations. The masculine bias often found in large-scale ST systems is typically perpetuated through training data derived from Machine Translation (MT) systems. Our approach involves two key steps. First, we employ Large Language Models (LLMs) to rectify translations based on the speaker's gender in a cost-effective manner. Second, we fine-tune the ST model with the corrected data, enabling the model to generate gender-specific translations directly from audio cues, without the need for explicit gender input. Additionally, we propose a three-mode fine-tuned model for scenarios where the speaker's gender is either predefined or should not be inferred from speech cues. We demonstrate a 70% improvement in translations for female speakers compared to our baseline and other large-scale ST systems, such as Seamless M4T and Canary, on the MuST-SHE test set.

machine learning, natural language, translation, (17 more...)

2501.05989

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Nuutinen, Emil, Rastas, Iiro, Ginter, Filip

Finnish SQuAD: A Simple Approach to Machine Translation of Span Annotations

We apply a simple method to machine translate datasets with span-level annotation using the DeepL MT service and its ability to translate formatted documents. Using this method, we produce a Finnish version of the SQuAD2.0 question answering dataset and train QA retriever models on this new dataset. We evaluate the quality of the dataset and more generally the MT method through direct evaluation, indirect comparison to other similar datasets, a backtranslation experiment, as well as through the performance of downstream trained QA models. In all these evaluations, we find that the method of transfer is not only simple to use but produces consistently better translated data. Given its good performance on the SQuAD dataset, it is likely the method can be used to translate other similar span-annotated datasets for other tasks and languages as well. All code and data is available under an open license: data at HuggingFace TurkuNLP/squad_v2_fi, code on GitHub TurkuNLP/squad2-fi, and model at HuggingFace TurkuNLP/bert-base-finnish-cased-squad2.

large language model, machine learning, natural language, (18 more...)

2501.05963

Country: Europe > Finland (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Khandaker, Md. Arafat Alam, Raha, Ziyan Shirin, Paul, Bidyarthi, Muhammad, Tashreef

Bridging Dialects: Translating Standard Bangla to Regional Variants Using Neural Models

The Bangla language includes many regional dialects, adding to its cultural richness. The translation of Bangla Language into regional dialects presents a challenge due to significant variations in vocabulary, pronunciation, and sentence structure across regions like Chittagong, Sylhet, Barishal, Noakhali, and Mymensingh. These dialects, though vital to local identities, lack of representation in technological applications. This study addresses this gap by translating standard Bangla into these dialects using neural machine translation (NMT) models, including BanglaT5, mT5, and mBART50. The work is motivated by the need to preserve linguistic diversity and improve communication among dialect speakers. The models were fine-tuned using the "Vashantor" dataset, containing 32,500 sentences across various dialects, and evaluated through Character Error Rate (CER) and Word Error Rate (WER) metrics. BanglaT5 demonstrated superior performance with a CER of 12.3% and WER of 15.7%, highlighting its effectiveness in capturing dialectal nuances. The outcomes of this research contribute to the development of inclusive language technologies that support regional dialects and promote linguistic diversity.

machine learning, natural language, translation, (15 more...)

2501.05749

Country: Asia > Bangladesh (0.15)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJan-9-2025

Towards Automatic Evaluation for Image Transcreation

Khanuja, Simran, Iyer, Vivek, He, Claire, Neubig, Graham

Beyond conventional paradigms of translating speech and text, recently, there has been interest in automated transcreation of images to facilitate localization of visual content across different cultures. Attempts to define this as a formal Machine Learning (ML) problem have been impeded by the lack of automatic evaluation mechanisms, with previous work relying solely on human evaluation. In this paper, we seek to close this gap by proposing a suite of automatic evaluation metrics inspired by machine translation (MT) metrics, categorized into: a) Object-based, b) Embedding-based, and c) VLM-based. Drawing on theories from translation studies and real-world transcreation practices, we identify three critical dimensions of image transcreation: cultural relevance, semantic equivalence and visual similarity, and design our metrics to evaluate systems along these axes. Our results show that proprietary VLMs best identify cultural relevance and semantic equivalence, while vision-encoder representations are adept at measuring visual similarity. Meta-evaluation across 7 countries shows our metrics agree strongly with human ratings, with average segment-level correlations ranging from 0.55-0.87. Finally, through a discussion of the merits and demerits of each metric, we offer a robust framework for automated image transcreation evaluation, grounded in both theoretical foundations and practical application. Our code can be found here: https://github.com/simran-khanuja/automatic-eval-transcreation

cultural relevance, similarity, transcreation, (15 more...)

2412.13717

Country:

Asia > Japan (0.05)
Asia > India (0.05)
Africa > Nigeria (0.04)
(13 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)