AITopics

This research presents a computational social science approach to preserving Telugu Chandassu, the metrical poetry tradition representing centuries of collective cultural intelligence. We develop the first comprehensive digital framework for analyzing Telugu prosodic patterns, bridging traditional community knowledge with modern computational methods. Our social computing approach involves collaborative dataset creation of 4,651 annotated padyams, expert-validated linguistic patterns, and culturally-informed algorithmic design. The framework includes AksharamTokenizer for prosody-aware tokenization, LaghuvuGuruvu Generator for classifying light and heavy syllables, and PadyaBhedam Checker for automated pattern recognition. Our algorithm achieves 91.73% accuracy on the proposed Chandassu Score, with evaluation metrics reflecting traditional literary standards. This work demonstrates how computational social science can preserve endangered cultural knowledge systems while enabling new forms of collective intelligence around literary heritage. The methodology offers insights for community-centered approaches to cultural preservation, supporting broader initiatives in digital humanities and socially-aware computing systems.

machine learning, natural language, pattern recognition, (19 more...)

2510.01233

Country: North America (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.73)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Generating Difficult-to-Translate Texts

Zouhar, Vilém, Xu, Wenda, Riley, Parker, Juraska, Juraj, Finkelstein, Mara, Freitag, Markus, Deutsch, Daniel

Machine translation benchmarks sourced from the real world are quickly obsoleted, due to most examples being easy for state-of-the-art translation models. This limits the benchmark's ability to distinguish which model is better or to reveal models' weaknesses. Current methods for creating difficult test cases, such as subsampling or from-scratch synthesis, either fall short of identifying difficult examples or suffer from a lack of diversity and naturalness. Inspired by the iterative process of human experts probing for model failures, we propose MT-breaker, a method where a large language model iteratively refines a source text to increase its translation difficulty. The LLM iteratively queries a target machine translation model to guide its generation of difficult examples. Our approach generates examples that are more challenging for the target MT model while preserving the diversity of natural texts. While the examples are tailored to a particular machine translation model during the generation, the difficulty also transfers to other models and languages.

large language model, mt -breaker, natural language, (13 more...)

2509.26592

Country: North America > United States > Colorado (0.28)

Genre: Research Report (0.82)

Industry:

Law (1.00)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Belay, Tadesse Destaw, Hussen, Kedir Yassin, Imam, Sukairaj Hafiz, Ahmad, Ibrahim Said, Inuwa-Dutse, Isa, Haile, Abrham Belete, Sidorov, Grigori, Ameer, Iqra, Abdulmumin, Idris, Gwadabe, Tajuddeen, Marivate, Vukosi, Yimam, Seid Muhie, Muhammad, Shamsuddeen Hassan

The Rise of AfricaNLP: Contributions, Contributors, and Community Impact (2005-2025)

Natural Language Processing (NLP) is undergoing constant transformation, as Large Language Models (LLMs) are driving daily breakthroughs in research and practice. In this regard, tracking the progress of NLP research and automatically analyzing the contributions of research papers provides key insights into the nature of the field and the researchers. This study explores the progress of African NLP (AfricaNLP) by asking (and answering) basic research questions such as: i) How has the nature of NLP evolved over the last two decades?, ii) What are the contributions of AfricaNLP papers?, and iii) Which individuals and organizations (authors, affiliated institutions, and funding bodies) have been involved in the development of AfricaNLP? We quantitatively examine the contributions of AfricaNLP research using 1.9K NLP paper abstracts, 4.9K author contributors, and 7.8K human-annotated contribution sentences (AfricaNLPContributions) along with benchmark results. Our dataset and continuously existing NLP progress tracking website provide a powerful lens for tracing AfricaNLP research trends and hold potential for generating data-driven literature surveys.

large language model, machine learning, natural language, (19 more...)

2509.25477

Country:

North America > United States (1.00)
North America > Canada (1.00)
Africa > Nigeria (0.93)
(2 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.88)

Industry:

Government > Regional Government (1.00)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Mishra, Debangan, Rastogi, Arihant, Negi, Agyeya, Goel, Shashwat, Kumaraguru, Ponnurangam

What if I ask in \textit{alia lingua}? Measuring Functional Similarity Across Languages

How similar are model outputs across languages? In this work, we study this question using a recently proposed model similarity metric $κ_p$ applied to 20 languages and 47 subjects in GlobalMMLU. Our analysis reveals that a model's responses become increasingly consistent across languages as its size and capability grow. Interestingly, models exhibit greater cross-lingual consistency within themselves than agreement with other models prompted in the same language. These results highlight not only the value of $κ_p$ as a practical tool for evaluating multilingual reliability, but also its potential to guide the development of more consistent multilingual systems.

artificial intelligence, machine learning, natural language, (17 more...)

2509.04032

Genre: Research Report > New Finding (0.68)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.34)

Should I Share this Translation? Evaluating Quality Feedback for User Reliance on Machine Translation

Ki, Dayeon, Duh, Kevin, Carpuat, Marine

As people increasingly use AI systems in work and daily life, feedback mechanisms that help them use AI responsibly are urgently needed, particularly in settings where users are not equipped to assess the quality of AI predictions. We study a realistic Machine Translation (MT) scenario where monolingual users decide whether to share an MT output, first without and then with quality feedback. We compare four types of quality feedback: explicit feedback that directly give users an assessment of translation quality using (1) error highlights and (2) LLM explanations, and implicit feedback that helps users compare MT inputs and outputs through (3) backtranslation and (4) question-answer (QA) tables. We find that all feedback types, except error highlights, significantly improve both decision accuracy and appropriate reliance. Notably, implicit feedback, especially QA tables, yields significantly greater gains than explicit feedback in terms of decision accuracy, appropriate reliance, and user perceptions, receiving the highest ratings for helpfulness and trust, and the lowest for mental burden.

artificial intelligence, natural language, participant, (15 more...)

2505.24683

Country:

North America > United States (1.00)
Europe (1.00)
Asia (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Neural Information Processing SystemsOct-2-2025, 23:26:59 GMT

6fb52e71b837628ac16539c1ff911667-Paper.pdf

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.68)
North America > Canada > Quebec (0.14)

Industry:

Leisure & Entertainment > Games > Computer Games (0.46)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Vision (0.68)
(2 more...)