AITopics

2509.1157

Country:

North America > United States (1.00)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.15)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Education (0.92)
Media > News (0.46)
Health & Medicine > Consumer Health (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceSep-16-2025

Text2Sign Diffusion: A Generative Approach for Gloss-Free Sign Language Production

Feng, Liqian, Wang, Lintao, Hu, Kun, Kong, Dehui, Wang, Zhiyong

Sign language production (SLP) aims to translate spoken language sentences into a sequence of pose frames in a sign language, bridging the communication gap and promoting digital inclusion for deaf and hard-of-hearing communities. Existing methods typically rely on gloss, a symbolic representation of sign language words or phrases that serves as an intermediate step in SLP. This limits the flexibility and generalization of SLP, as gloss annotations are often unavailable and language-specific. Therefore, we present a novel diffusion-based generative approach - Text2Sign Diffusion (Text2SignDiff) for gloss-free SLP. Specifically, a gloss-free latent diffusion model is proposed to generate sign language sequences from noisy latent sign codes and spoken text jointly, reducing the potential error accumulation through a non-autoregressive iterative denoising process. We also design a cross-modal signing aligner that learns a shared latent space to bridge visual and textual content in sign and spoken languages. This alignment supports the conditioned diffusion-based process, enabling more accurate and contextually relevant sign language generation without gloss. Extensive experiments on the commonly used PHOENIX14T and How2Sign datasets demonstrate the effectiveness of our method, achieving the state-of-the-art performance.

artificial intelligence, machine learning, natural language, (16 more...)

2509.10845

Country: Oceania > Australia (0.14)

Genre: Research Report (1.00)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.70)

arXiv.org Artificial IntelligenceSep-16-2025

Program Skeletons for Automated Program Translation

Wang, Bo, Li, Tianyu, Li, Ruishi, Mathur, Umang, Saxena, Prateek

Translating software between programming languages is a challenging task, for which automated techniques have been elusive and hard to scale up to larger programs. A key difficulty in cross-language translation is that one has to re-express the intended behavior of the source program into idiomatic constructs of a different target language. This task needs abstracting away from the source language-specific details, while keeping the overall functionality the same. In this work, we propose a novel and systematic approach for making such translation amenable to automation based on a framework we call program skeletons. A program skeleton retains the high-level structure of the source program by abstracting away and effectively summarizing lower-level concrete code fragments, which can be mechanically translated to the target programming language. A skeleton, by design, permits many different ways of filling in the concrete implementation for fragments, which can work in conjunction with existing data-driven code synthesizers. Most importantly, skeletons can conceptually enable sound decomposition, i.e., if each individual fragment is correctly translated, taken together with the mechanically translated skeleton, the final translated program is deemed to be correct as a whole. We present a prototype system called Skel embodying the idea of skeleton-based translation from Python to JavaScript. Our results show promising scalability compared to prior works. For 9 real-world Python programs, some with more than about 1k lines of code, 95% of their code fragments can be automatically translated, while about 5% require manual effort. All the final translations are correct with respect to whole-program test suites.

large language model, logic & formal reasoning, programming language, (18 more...)

doi: 10.1145/3729287

2504.07483

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.66)
(3 more...)

MultimodalHugs: Enabling Sign Language Processing in Hugging Face

Sant, Gerard, Jiang, Zifan, Escolano, Carlos, Moryossef, Amit, Müller, Mathias, Sennrich, Rico, Ebling, Sarah

In recent years, sign language processing (SLP) has gained importance in the general field of Natural Language Processing. However, compared to research on spoken languages, SLP research is hindered by complex ad-hoc code, inadvertently leading to low reproducibility and unfair comparisons. Existing tools that are built for fast and reproducible experimentation, such as Hugging Face, are not flexible enough to seamlessly integrate sign language experiments. This view is confirmed by a survey we conducted among SLP researchers. To address these challenges, we introduce MultimodalHugs, a framework built on top of Hugging Face that enables more diverse data modalities and tasks, while inheriting the well-known advantages of the Hugging Face ecosystem. Even though sign languages are our primary focus, MultimodalHugs adds a layer of abstraction that makes it more widely applicable to other use cases that do not fit one of the standard templates of Hugging Face. We provide quantitative experiments to illustrate how MultimodalHugs can accommodate diverse modalities such as pose estimation data for sign languages, or pixel data for text characters.

computational linguistic, machine learning, natural language, (17 more...)

2509.09729

Country:

Europe (1.00)
Asia > Middle East > UAE (0.46)
North America > United States > Minnesota (0.28)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report (0.82)
Workflow (0.68)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Hattori, Seiji, Matsuzaki, Takuya, Fujiwara, Makoto

Natural Language Translation of Formal Proofs through Informalization of Proof Steps and Recursive Summarization along Proof Structure

This paper proposes a natural language translation method for machine-verifiable formal proofs that leverages the informalization (verbalization of formal language proof steps) and summarization capabilities of LLMs. For evaluation, it was applied to formal proof data created in accordance with natural language proofs taken from an undergraduate-level textbook, and the quality of the generated natural language proofs was analyzed in comparison with the original natural language proofs. Furthermore, we will demonstrate that this method can output highly readable and accurate natural language proofs by applying it to existing formal proof library of the Lean proof assistant.

formal proof, large language model, logic & formal reasoning, (20 more...)

2509.09726

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.89)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.70)

Kreutzer, Julia, Briakou, Eleftheria, Agrawal, Sweta, Fadaee, Marzieh, Tom, Kocmi

Déjà Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation

Generation capabilities and language coverage of multilingual large language models (mLLMs) are advancing rapidly. However, evaluation practices for generative abilities of mLLMs are still lacking comprehensiveness, scientific rigor, and consistent adoption across research labs, which undermines their potential to meaningfully guide mLLM development. We draw parallels with machine translation (MT) evaluation, a field that faced similar challenges and has, over decades, developed transparent reporting standards and reliable evaluations for multilingual generative models. Through targeted experiments across key stages of the generative evaluation pipeline, we demonstrate how best practices from MT evaluation can deepen the understanding of quality differences between models. Additionally, we identify essential components for robust meta-evaluation of mLLMs, ensuring the evaluation methods themselves are rigorously assessed. We distill these insights into a checklist of actionable recommendations for mLLM research and development.

computational linguistic, large language model, machine learning, (15 more...)

2504.11829

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Jung, JungHo, Lee, Junhyun

Optimal Multi-Task Learning at Regularization Horizon for Speech Translation Task

End-to-end speech-to-text translation typically suffers from the scarcity of paired speech-text data. One way to overcome this shortcoming is to utilize the bitext data from the Machine Translation (MT) task and perform Multi-Task Learning (MTL). In this paper, we formulate MTL from a regularization perspective and explore how sequences can be regularized within and across modalities. By thoroughly investigating the effect of consistency regularization (different modality) and R-drop (same modality), we show how they respectively contribute to the total regularization. We also demonstrate that the coefficient of MT loss serves as another source of regularization in the MTL setting. With these three sources of regularization, we introduce the optimal regularization contour in the high-dimensional space, called the regularization horizon. Experiments show that tuning the hyperparameters within the regularization horizon achieves near state-of-the-art performance on the MuST-C dataset.

artificial intelligence, machine learning, natural language, (18 more...)

2509.09701

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceSep-12-2025

COCO-Urdu: A Large-Scale Urdu Image-Caption Dataset with Multimodal Quality Estimation

Hassan, Umair

Urdu, spoken by over 250 million people, remains critically under-served in multimodal and vision-language research. The absence of large-scale, high-quality datasets has limited the development of Urdu-capable systems and reinforced biases in multilingual vision-language models trained primarily on high-resource languages. To address this gap, we present COCO-Urdu, a large-scale image-caption dataset derived from MS COCO, containing 59,000 images and 319,000 Urdu captions selected through stratified sampling to preserve the original distribution. Captions were translated using SeamlessM4T v2 and validated with a hybrid multimodal quality estimation framework that integrates COMET-Kiwi for translation quality, CLIP-based similarity for visual grounding, and BERTScore with back-translation for semantic consistency; low-scoring captions were iteratively refined using open-source large language models. We further benchmark COCO-Urdu on BLEU, SacreBLEU, and chrF, reporting consistently strong results. To the best of our knowledge, COCO-Urdu is the largest publicly available Urdu captioning dataset. By releasing both the dataset and the quality estimation pipeline, we aim to reduce language bias in multimodal research and establish a foundation for inclusive vision-language systems.

artificial intelligence, caption, natural language, (15 more...)

2509.09014

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Poláková, Lucie, Popel, Martin, Kloudová, Věra, Novák, Michal, Anisimova, Mariia, Balhar, Jiří

Mitigating Language Barriers in Education: Developing Multilingual Digital Learning Materials with Machine Translation

arXiv.org Artificial IntelligenceSep-12-2025

The EdUKate project combines digital education, linguistics, translation studies, and machine translation to develop multilingual learning materials for Czech primary and secondary schools. Launched through collaboration between a major Czech academic institution and the country's largest educational publisher, the project is aimed at translating up to 9,000 multimodal interactive exercises from Czech into Ukrainian, English, and German for an educational web portal. It emphasizes the development and evaluation of a direct Czech-Ukrainian machine translation system tailored to the educational domain, with special attention to processing formatted content such as XML and PDF and handling technical and scientific terminology. We present findings from an initial survey of Czech teachers regarding the needs of non-Czech-speaking students and describe the system's evaluation and implementation on the web portal. All resulting applications are freely available to students, educators, and researchers.

artificial intelligence, natural language, translation, (17 more...)

doi: 10.21125/edulearn.2025.2268

2509.09473

Country: Europe > Czechia (0.15)

Genre: Instructional Material > Course Syllabus & Notes (0.61)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Setting > K-12 Education > Secondary School (0.35)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Teklehaymanot, Hailay Kidu, Fazlija, Dren, Nejdl, Wolfgang

MoVoC: Morphology-Aware Subword Construction for Geez Script Languages

arXiv.org Artificial IntelligenceSep-11-2025

Subword-based tokenization methods often fail to preserve morphological boundaries, a limitation especially pronounced in low-resource, morphologically complex languages such as those written in the Geez script. To address this, we present MoVoC (Morpheme-aware Subword Vocabulary Construction) and train MoVoC-Tok, a tokenizer that integrates supervised morphological analysis into the subword vocabulary. This hybrid segmentation approach combines morpheme-based and Byte Pair Encoding (BPE) tokens to preserve morphological integrity while maintaining lexical meaning. To tackle resource scarcity, we curate and release manually annotated morpheme data for four Geez script languages and a morpheme-aware vocabulary for two of them. While the proposed tokenization method does not lead to significant gains in automatic translation quality, we observe consistent improvements in intrinsic metrics, MorphoScore, and Boundary Precision, highlighting the value of morphology-aware segmentation in enhancing linguistic fidelity and token efficiency. Our morpheme-annotated datasets and tokenizer will be publicly available to support further research in low-resource, morphologically rich languages. Our code and data are available on GitHub: https://github.com/hailaykidu/MoVoC

computational linguistic, machine learning, natural language, (16 more...)

2509.08812

Country:

Europe (1.00)
North America > United States (0.93)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)