North Sumatra
Indonesia sues six companies over environmental harm in flood zones
Indonesia's government has filed multiple lawsuits seeking more than $200m in damages against six firms, after deadly floods wreaked havoc across Sumatra, killing more than 1,000 people last year, although environmentalists criticised the moves as inadequate. Environmentalists, experts and the government pointed the finger at deforestation for its role in last year's disaster that washed torrents of mud and wooden logs into villages across the northwestern part of the island. The sum represents both fines for damage and the proposed monetary value of recovery efforts. The suits were filed to courts on Thursday in Jakarta and North Sumatra's Medan, the ministry added. "We firmly uphold the principle of polluter pays," Environment Minister Hanif Faisol Nurofiq said in a statement.
- North America > United States (0.52)
- North America > Central America (0.41)
- North America > Canada (0.41)
- (11 more...)
- Government (1.00)
- Law > Environmental Law (0.93)
- Law > Litigation (0.57)
Cross-Modal Temporal Fusion for Financial Market Forecasting
Pei, Yunhua, Cartlidge, John, Mandal, Anandadeep, Gold, Daniel, Marcilio, Enrique, Mazzon, Riccardo
Accurate forecasting in financial markets requires integrating diverse data sources, from historical prices to macroeconomic indicators and financial news. However, existing models often fail to align these modalities effectively, limiting their practical use. In this paper, we introduce a transformer-based deep learning framework, Cross-Modal Temporal Fusion (CMTF), that fuses structured and unstructured financial data for improved market prediction. The model incorporates a tensor interpretation module for feature selection and an auto-training pipeline for efficient hyperparameter tuning. Experimental results using FTSE 100 stock data demonstrate that CMTF achieves superior performance in price direction classification compared to classical and deep learning baselines. These findings suggest that our framework is an effective and scalable solution for real-world cross-modal financial forecasting tasks.
- Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.05)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- (4 more...)
LoraxBench: A Multitask, Multilingual Benchmark Suite for 20 Indonesian Languages
Aji, Alham Fikri, Cohn, Trevor
As one of the world's most populous countries, with 700 languages spoken, Indonesia is behind in terms of NLP progress. We introduce LoraxBench, a benchmark that focuses on low-resource languages of Indonesia and covers 6 diverse tasks: reading comprehension, open-domain QA, language inference, causal reasoning, translation, and cultural QA. Our dataset covers 20 languages, with the addition of two formality registers for three languages. We evaluate a diverse set of multilingual and region-focused LLMs and found that this benchmark is challenging. We note a visible discrepancy between performance in Indonesian and other languages, especially the low-resource ones. There is no clear lead when using a region-specific model as opposed to the general multilingual model. Lastly, we show that a change in register affects model performance, especially with registers not commonly found in social media, such as high-level politeness `Krama' Javanese.
- Asia > Indonesia > Bali (0.04)
- Asia > Indonesia > Sulawesi > Gorontalo > Gorontalo (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (27 more...)
NusaAksara: A Multimodal and Multilingual Benchmark for Preserving Indonesian Indigenous Scripts
Adilazuarda, Muhammad Farid, Wijanarko, Musa Izzanardi, Susanto, Lucky, Nur'aini, Khumaisa, Wijaya, Derry, Aji, Alham Fikri
Indonesia is rich in languages and scripts. However, most NLP progress has been made using romanized text. In this paper, we present NusaAksara, a novel public benchmark for Indonesian languages that includes their original scripts. Our benchmark covers both text and image modalities and encompasses diverse tasks such as image segmentation, OCR, transliteration, translation, and language identification. Our data is constructed by human experts through rigorous steps. NusaAksara covers 8 scripts across 7 languages, including low-resource languages not commonly seen in NLP benchmarks. Although unsupported by Unicode, the Lampung script is included in this dataset. We benchmark our data across several models, from LLMs and VLMs such as GPT-4o, Llama 3.2, and Aya 23 to task-specific systems such as PP-OCR and LangID, and show that most NLP technologies cannot handle Indonesia's local scripts, with many achieving near-zero performance.
- Asia > Indonesia > Bali (0.05)
- Asia > Southeast Asia (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (31 more...)
Unveiling Cultural Blind Spots: Analyzing the Limitations of mLLMs in Procedural Text Comprehension
Yari, Amir Hossein, Koto, Fajri
Despite the impressive performance of multilingual large language models (mLLMs) in various natural language processing tasks, their ability to understand procedural texts, particularly those with culture-specific content, remains largely unexplored. Texts describing cultural procedures, including rituals, traditional craftsmanship, and social etiquette, require an inherent understanding of cultural context, presenting a significant challenge for mLLMs. In this work, we introduce CAPTex, a benchmark designed to evaluate mLLMs' ability to process and reason about culturally diverse procedural texts across multiple languages using various methodologies to assess their performance. Our findings indicate that (1) mLLMs face difficulties with culturally contextualized procedural texts, showing notable performance declines in low-resource languages, (2) model performance fluctuates across cultural domains, with some areas presenting greater difficulties, and (3) language models exhibit better performance on multiple-choice tasks within conversational frameworks compared to direct questioning. These results underscore the current limitations of mLLMs in handling culturally nuanced procedural texts and highlight the need for culturally aware benchmarks like CAPTex to enhance their adaptability and comprehension across diverse linguistic and cultural landscapes.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Middle East > Iran (0.04)
- Asia > Japan (0.04)
- (17 more...)
- Leisure & Entertainment (1.00)
- Education (1.00)
IndoCulture: Exploring Geographically-Influenced Cultural Commonsense Reasoning Across Eleven Indonesian Provinces
Koto, Fajri, Mahendra, Rahmad, Aisyah, Nurul, Baldwin, Timothy
Although commonsense reasoning is greatly shaped by cultural and geographical factors, previous studies on language models have predominantly centered on English cultures, potentially resulting in an Anglocentric bias. In this paper, we introduce IndoCulture, aimed at understanding the influence of geographical factors on language model reasoning ability, with a specific emphasis on the diverse cultures found within eleven Indonesian provinces. In contrast to prior works that relied on templates (Yin et al., 2022) and online scrapping (Fung et al., 2024), we created IndoCulture by asking local people to manually develop the context and plausible options based on predefined topics. Evaluations of 23 language models reveal several insights: (1) even the best open-source model struggles with an accuracy of 53.2%, (2) models often provide more accurate predictions for specific provinces, such as Bali and West Java, and (3) the inclusion of location contexts enhances performance, especially in larger models like GPT-4, emphasizing the significance of geographical context in commonsense reasoning.
- Health & Medicine (0.68)
- Education (0.47)
- Information Technology (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
NusaBERT: Teaching IndoBERT to be Multilingual and Multicultural
Wongso, Wilson, Setiawan, David Samuel, Limcorn, Steven, Joyoadikusumo, Ananto
Indonesia's linguistic landscape is remarkably diverse, encompassing over 700 languages and dialects, making it one of the world's most linguistically rich nations. This diversity, coupled with the widespread practice of code-switching and the presence of low-resource regional languages, presents unique challenges for modern pre-trained language models. In response to these challenges, we developed NusaBERT, building upon IndoBERT by incorporating vocabulary expansion and leveraging a diverse multilingual corpus that includes regional languages and dialects. Through rigorous evaluation across a range of benchmarks, NusaBERT demonstrates state-of-the-art performance in tasks involving multiple languages of Indonesia, paving the way for future natural language understanding research for under-represented languages.
- Asia > Indonesia > Sulawesi > Gorontalo > Gorontalo (0.04)
- Asia > Indonesia > Sulawesi > South Sulawesi (0.04)
- North America > United States (0.04)
- (19 more...)
Spatial and Temporal Hierarchy for Autonomous Navigation using Active Inference in Minigrid Environment
de Tinguy, Daria, van de Maele, Toon, Verbelen, Tim, Dhoedt, Bart
Robust evidence suggests that humans explore their environment using a combination of topological landmarks and coarse-grained path integration. This approach relies on identifiable environmental features (topological landmarks) in tandem with estimations of distance and direction (coarse-grained path integration) to construct cognitive maps of the surroundings. This cognitive map is believed to exhibit a hierarchical structure, allowing efficient planning when solving complex navigation tasks. Inspired by human behaviour, this paper presents a scalable hierarchical active inference model for autonomous navigation, exploration, and goal-oriented behaviour. The model uses visual observation and motion perception to combine curiosity-driven exploration with goal-oriented behaviour. Motion is planned using different levels of reasoning, i.e., from context to place to motion. This allows for efficient navigation in new spaces and rapid progress toward a target. By incorporating these human navigational strategies and their hierarchical representation of the environment, this model proposes a new solution for autonomous navigation and exploration. The approach is validated through simulations in a mini-grid environment.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (11 more...)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
- (3 more...)
NusaCrowd: Open Source Initiative for Indonesian NLP Resources
Cahyawijaya, Samuel, Lovenia, Holy, Aji, Alham Fikri, Winata, Genta Indra, Wilie, Bryan, Mahendra, Rahmad, Wibisono, Christian, Romadhony, Ade, Vincentio, Karissa, Koto, Fajri, Santoso, Jennifer, Moeljadi, David, Wirawan, Cahya, Hudi, Frederikus, Parmonangan, Ivan Halim, Alfina, Ika, Wicaksono, Muhammad Satrio, Putra, Ilham Firdausi, Rahmadani, Samsul, Oenang, Yulianti, Septiandri, Ali Akbar, Jaya, James, Dhole, Kaustubh D., Suryani, Arie Ardiyanti, Putri, Rifki Afina, Su, Dan, Stevens, Keith, Nityasya, Made Nindyatama, Adilazuarda, Muhammad Farid, Ignatius, Ryan, Diandaru, Ryandito, Yu, Tiezheng, Ghifari, Vito, Dai, Wenliang, Xu, Yan, Damapuspita, Dyah, Tho, Cuk, Karo, Ichwanul Muslim Karo, Fatyanosa, Tirana Noor, Ji, Ziwei, Fung, Pascale, Neubig, Graham, Baldwin, Timothy, Ruder, Sebastian, Sujaini, Herry, Sakti, Sakriani, Purwarianti, Ayu
We present NusaCrowd, a collaborative initiative to collect and unify existing resources for Indonesian languages, including opening access to previously non-public resources. Through this initiative, we have brought together 137 datasets and 118 standardized data loaders. The quality of the datasets has been assessed manually and automatically, and their value is demonstrated through multiple experiments. NusaCrowd's data collection enables the creation of the first zero-shot benchmarks for natural language understanding and generation in Indonesian and the local languages of Indonesia. Furthermore, NusaCrowd brings the creation of the first multilingual automatic speech recognition benchmark in Indonesian and the local languages of Indonesia. Our work strives to advance natural language processing (NLP) research for languages that are under-represented despite being widely spoken.
- North America > United States > Texas > Dallas County > Dallas (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Timor-Leste (0.14)
- (64 more...)
- Law (0.67)
- Government (0.67)
- Information Technology > Services (0.67)
- (3 more...)
NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages
Winata, Genta Indra, Aji, Alham Fikri, Cahyawijaya, Samuel, Mahendra, Rahmad, Koto, Fajri, Romadhony, Ade, Kurniawan, Kemal, Moeljadi, David, Prasojo, Radityo Eko, Fung, Pascale, Baldwin, Timothy, Lau, Jey Han, Sennrich, Rico, Ruder, Sebastian
Natural language processing (NLP) has a significant impact on society via technologies such as machine translation and search engines. Despite its success, NLP technology is only widely available for high-resource languages such as English and Chinese, while it remains inaccessible to many languages due to the unavailability of data resources and benchmarks. In this work, we focus on developing resources for languages in Indonesia. Despite being the second most linguistically diverse country, most languages in Indonesia are categorized as endangered and some are even extinct. We develop the first-ever parallel resource for 10 low-resource languages in Indonesia. Our resource includes datasets, a multi-task benchmark, and lexicons, as well as a parallel Indonesian-English dataset. We provide extensive analyses and describe the challenges when creating such resources. We hope that our work can spark NLP research on Indonesian and other underrepresented languages.
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Europe > Germany > Saxony > Leipzig (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- (30 more...)