AITopics | javanese

Collaborating Authors

javanese

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ASR Under Noise: Exploring Robustness for Sundanese and Javanese

Pranida, Salsabila Zahirah, Airlangga, Muhammad Cendekia, Genadi, Rifo Ahmad, Shehata, Shady

arXiv.org Artificial IntelligenceOct-1-2025

We investigate the robustness of Whisper-based automatic speech recognition (ASR) models for two major Indonesian regional languages: Javanese and Sundanese. While recent work has demonstrated strong ASR performance under clean conditions, their effectiveness in noisy environments remains unclear. To address this, we experiment with multiple training strategies, including synthetic noise augmentation and SpecAugment, and evaluate performance across a range of signal-to-noise ratios (SNRs). Our results show that noise-aware training substantially improves robustness, particularly for larger Whisper models. A detailed error analysis further reveals language-specific challenges, highlighting avenues for future improvements

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2509.25878

Country: North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

LoraxBench: A Multitask, Multilingual Benchmark Suite for 20 Indonesian Languages

Aji, Alham Fikri, Cohn, Trevor

arXiv.org Artificial IntelligenceAug-19-2025

As one of the world's most populous countries, with 700 languages spoken, Indonesia is behind in terms of NLP progress. We introduce LoraxBench, a benchmark that focuses on low-resource languages of Indonesia and covers 6 diverse tasks: reading comprehension, open-domain QA, language inference, causal reasoning, translation, and cultural QA. Our dataset covers 20 languages, with the addition of two formality registers for three languages. We evaluate a diverse set of multilingual and region-focused LLMs and found that this benchmark is challenging. We note a visible discrepancy between performance in Indonesian and other languages, especially the low-resource ones. There is no clear lead when using a region-specific model as opposed to the general multilingual model. Lastly, we show that a change in register affects model performance, especially with registers not commonly found in social media, such as high-level politeness `Krama' Javanese.

computational linguistic, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2508.12459

Country:

North America (1.00)
Europe (1.00)
Asia > Indonesia > Sumatra (0.46)
(3 more...)

Genre: Research Report (0.40)

Industry: Education > Assessment & Standards > Student Performance (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)

Add feedback

Adapting Language Models to Indonesian Local Languages: An Empirical Study of Language Transferability on Zero-Shot Settings

Putri, Rifki Afina

arXiv.org Artificial IntelligenceJul-3-2025

--In this paper, we investigate the transferability of pre-trained language models to low-resource Indonesian local languages through the task of sentiment analysis. We evaluate both zero-shot performance and adapter-based transfer on ten local languages using models of different types: a monolingual Indonesian BERT, multilingual models such as mBERT and XLM-R, and a modular adapter-based approach called MAD-X. T o better understand model behavior, we group the target languages into three categories: seen (included during pre-training), partially seen (not included but linguistically related to seen languages), and unseen (absent and unrelated in pre-training data). Our results reveal clear performance disparities across these groups: multilingual models perform best on seen languages, moderately on partially seen ones, and poorly on unseen languages. We find that MAD-X significantly improves performance, especially for seen and partially seen languages, without requiring labeled data in the target language. Additionally, we conduct a further analysis on tokenization and show that while subword fragmentation and vocabulary overlap with Indonesian correlate weakly with prediction quality, they do not fully explain the observed performance. Instead, the most consistent predictor of transfer success is the model's prior exposure to the language, either directly or through a related language.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.01645

Country: Asia > Indonesia (0.05)

Genre: Research Report > New Finding (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.65)

Add feedback

Extracting and Emulsifying Cultural Explanation to Improve Multilingual Capability of LLMs

Koo, Hamin, Kim, Jaehyung

arXiv.org Artificial IntelligenceMar-7-2025

Large Language Models (LLMs) have achieved remarkable success, but their English-centric training data limits performance in non-English languages, highlighting the need for enhancements in their multilingual capabilities. While some work on multilingual prompting methods handles non-English queries by utilizing English translations or restructuring them to more closely align with LLM reasoning patterns, these works often overlook the importance of cultural context, limiting their effectiveness. To address this limitation, we propose EMCEI, a simple yet effective approach that improves LLMs' multilingual capabilities by incorporating cultural context for more accurate and appropriate responses. Specifically, EMCEI follows a two-step process that first extracts relevant cultural context from the LLM's parametric knowledge via prompting. Then, EMCEI employs an LLM-as-Judge mechanism to select the most appropriate response by balancing cultural relevance and reasoning ability. Experiments on diverse multilingual benchmarks show that EMCEI outperforms existing baselines, demonstrating its effectiveness in handling multilingual queries with LLMs.

dataset, llm, query, (17 more...)

arXiv.org Artificial Intelligence

2503.05846

Country:

Asia > Vietnam (0.04)
South America > Peru (0.04)
South America > Brazil (0.04)
(35 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment > Sports (0.46)
Government (0.46)
Energy (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Do Language Models Understand Honorific Systems in Javanese?

Farhansyah, Mohammad Rifqi, Darmawan, Iwan, Kusumawardhana, Adryan, Winata, Genta Indra, Aji, Alham Fikri, Wijaya, Derry Tanti

arXiv.org Artificial IntelligenceFeb-28-2025

The Javanese language features a complex system of honorifics that vary according to the social status of the speaker, listener, and referent. Despite its cultural and linguistic significance, there has been limited progress in developing a comprehensive corpus to capture these variations for natural language processing (NLP) tasks. In this paper, we present Unggah-Ungguh, a carefully curated dataset designed to encapsulate the nuances of Unggah-Ungguh Basa, the Javanese speech etiquette framework that dictates the choice of words and phrases based on social hierarchy and context. Using Unggah-Ungguh, we assess the ability of language models (LMs) to process various levels of Javanese honorifics through classification and machine translation tasks. To further evaluate cross-lingual LMs, we conduct machine translation experiments between Javanese (at specific honorific levels) and Indonesian. Additionally, we explore whether LMs can generate contextually appropriate Javanese honorifics in conversation tasks, where the honorific usage should align with the social role and contextual cues. Our findings indicate that current LMs struggle with most honorific levels, exhibitinga bias toward certain honorific tiers.

honorific level, ngoko, translation, (15 more...)

arXiv.org Artificial Intelligence

2502.20864

Country:

North America > Haiti (0.05)
Asia > Indonesia > Borneo > Kalimantan > East Kalimantan > Nusantara (0.04)
Asia > India > Haryana (0.04)
(6 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.86)

Add feedback

Cross-lingual Transfer Learning for Javanese Dependency Parsing

Ghiffari, Fadli Aulawi Al, Alfina, Ika, Azizah, Kurniawati

arXiv.org Artificial IntelligenceJan-22-2024

While structure learning achieves remarkable performance in high-resource languages, the situation differs for under-represented languages due to the scarcity of annotated data. This study focuses on assessing the efficacy of transfer learning in enhancing dependency parsing for Javanese, a language spoken by 80 million individuals but characterized by limited representation in natural language processing. We utilized the Universal Dependencies dataset consisting of dependency treebanks from more than 100 languages, including Javanese. We propose two learning strategies to train the model: transfer learning (TL) and hierarchical transfer learning (HTL). While TL only uses a source language to pre-train the model, the HTL method uses a source language and an intermediate language in the learning process. The results show that our best model uses the HTL method, which improves performance with an increase of 10% for both UAS and LAS evaluations compared to the baseline model.

javanese, source language, treebank, (15 more...)

arXiv.org Artificial Intelligence

2401.12072

Country:

Asia > Indonesia > Bali (0.05)
Europe > Italy (0.04)
Europe > Spain > Aragón (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)

Add feedback

XLS-R Deep Learning Model for Multilingual ASR on Low- Resource Languages: Indonesian, Javanese, and Sundanese

Arisaputra, Panji, Handoyo, Alif Tri, Zahra, Amalia

arXiv.org Artificial IntelligenceJan-12-2024

ASR is a technological innovation that automatically converts verbal translations into written texts. It focuses on reducing Word Error Rate (WER) metrics when reproducing oral input. ASR's core capability is to act as an optimal connector for information exchange between human-to-human and human-to-machine entities [1]. It has become increasingly important in various domains, including air traffic control, biometric security, games, closed text for YouTube, voice message transcription, and home automation. ASR's implementation in digital media resources is not a new phenomenon, but its complexity has increased [2]. This study focuses on the rapid development of information and communication technology in Indonesia. In Figure 1, the data from the Central Statistics Agency (Badan Pusat Statistik (BPS)) [3] shows that 62.10% and 82.07% of Indonesians have access to the internet in 2021, followed by an increase in mobile phone use of 65.87%. However, less mobile technology is being abandoned, such as computers and cable phones, which are only 18.24% and 1.36%, respectively. The conclusion is that Indonesians are shifting from traditional technology to more mobile and agile devices like smartphones, which require the right modalities for effective and efficient operation.

dataset, language model, sundanese, (14 more...)

arXiv.org Artificial Intelligence

2401.06832

Country:

Asia > Indonesia > Java > Jakarta > Jakarta (0.06)
Asia > Indonesia > Borneo > Kalimantan > East Kalimantan > Nusantara (0.04)
Asia > Japan (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Transportation > Infrastructure & Services (0.54)
Transportation > Air (0.54)
Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages

Winata, Genta Indra, Aji, Alham Fikri, Cahyawijaya, Samuel, Mahendra, Rahmad, Koto, Fajri, Romadhony, Ade, Kurniawan, Kemal, Moeljadi, David, Prasojo, Radityo Eko, Fung, Pascale, Baldwin, Timothy, Lau, Jey Han, Sennrich, Rico, Ruder, Sebastian

arXiv.org Artificial IntelligenceApr-12-2023

Natural language processing (NLP) has a significant impact on society via technologies such as machine translation and search engines. Despite its success, NLP technology is only widely available for high-resource languages such as English and Chinese, while it remains inaccessible to many languages due to the unavailability of data resources and benchmarks. In this work, we focus on developing resources for languages in Indonesia. Despite being the second most linguistically diverse country, most languages in Indonesia are categorized as endangered and some are even extinct. We develop the first-ever parallel resource for 10 low-resource languages in Indonesia. Our resource includes datasets, a multi-task benchmark, and lexicons, as well as a parallel Indonesian-English dataset. We provide extensive analyses and describe the challenges when creating such resources. We hope that our work can spark NLP research on Indonesian and other underrepresented languages.

computational linguistic, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2205.1596

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Europe > Germany > Saxony > Leipzig (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
(30 more...)

Genre: Research Report (0.82)

Industry: Education > Educational Setting (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Learning an artificial language for knowledge-sharing in multilingual translation

AIHubDec-7-2022, 12:52:52 GMT

In their recent paper Learning an artificial language for knowledge-sharing in multilingual translation, Danni Liu and Jan Niehues investigate multilingual neural machine translation models. Here, they tell us more about the main contributions of their research. Neural machine translation (NMT) is the backbone of many automatic translation platforms nowadays. The second characteristic is especially useful in low-resource conditions, where training data (translated sentence pairs) are limited. To enable knowledge-sharing between languages, and to improve translation quality on low-resource translation directions, a precondition is the ability to capture common features between languages.

artificial language, multilingual translation, translation, (15 more...)

AIHub

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback