Borneo
Indonesia sues six companies over environmental harm in flood zones
Indonesia's government has filed multiple lawsuits seeking more than $200m in damages against six firms, after deadly floods wreaked havoc across Sumatra, killing more than 1,000 people last year, although environmentalists criticised the moves as inadequate. Environmentalists, experts and the government pointed the finger at deforestation for its role in last year's disaster that washed torrents of mud and wooden logs into villages across the northwestern part of the island. The sum represents both fines for damage and the proposed monetary value of recovery efforts. The suits were filed to courts on Thursday in Jakarta and North Sumatra's Medan, the ministry added. "We firmly uphold the principle of polluter pays," Environment Minister Hanif Faisol Nurofiq said in a statement.
- Asia > Indonesia > Sumatra > North Sumatra (0.27)
- Asia > Indonesia > Java > Jakarta > Jakarta (0.25)
- North America > United States (0.16)
- (11 more...)
- Government (1.00)
- Law > Environmental Law (0.93)
- Law > Litigation (0.57)
Culture Cartography: Mapping the Landscape of Cultural Knowledge
Ziems, Caleb, Held, William, Yu, Jane, Goldberg, Amir, Grusky, David, Yang, Diyi
To serve global users safely and productively, LLMs need culture-specific knowledge that might not be learned during pre-training. How do we find such knowledge that is (1) salient to in-group users, but (2) unknown to LLMs? The most common solutions are single-initiative: either researchers define challenging questions that users passively answer (traditional annotation), or users actively produce data that researchers structure as benchmarks (knowledge extraction). The process would benefit from mixed-initiative collaboration, where users guide the process to meaningfully reflect their cultures, and LLMs steer the process towards more challenging questions that meet the researcher's goals. We propose a mixed-initiative methodology called CultureCartography. Here, an LLM initializes annotation with questions for which it has low-confidence answers, making explicit both its prior knowledge and the gaps therein. This allows a human respondent to fill these gaps and steer the model towards salient topics through direct edits. We implement this methodology as a tool called CultureExplorer. Compared to a baseline where humans answer LLM-proposed questions, we find that CultureExplorer more effectively produces knowledge that leading models like DeepSeek R1 and GPT-4o are missing, even with web search. Fine-tuning on this data boosts the accuracy of Llama-3.1-8B by up to 19.2% on related culture benchmarks.
- Africa > Nigeria (0.94)
- North America (0.93)
- Asia > Indonesia > Borneo > Kalimantan (0.93)
- Asia > Indonesia > Java (0.93)
What Do Indonesians Really Need from Language Technology? A Nationwide Survey
Kautsar, Muhammad Dehan Al, Susanto, Lucky, Wijaya, Derry, Koto, Fajri
There is an emerging effort to develop NLP for Indonesias 700+ local languages, but progress remains costly due to the need for direct engagement with native speakers. However, it is unclear what these language communities truly need from language technology. To address this, we conduct a nationwide survey to assess the actual needs of native speakers in Indonesia. Our findings indicate that addressing language barriers, particularly through machine translation and information retrieval, is the most critical priority. Although there is strong enthusiasm for advancements in language technology, concerns around privacy, bias, and the use of public data for AI training highlight the need for greater transparency and clear communication to support broader AI adoption.
- Europe (1.00)
- Asia > Indonesia > Sulawesi (1.00)
- Asia > Indonesia > Borneo > Kalimantan (0.68)
- (2 more...)
- Research Report > New Finding (1.00)
- Questionnaire & Opinion Survey (1.00)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)
- Education > Educational Setting (0.68)
LoraxBench: A Multitask, Multilingual Benchmark Suite for 20 Indonesian Languages
Aji, Alham Fikri, Cohn, Trevor
As one of the world's most populous countries, with 700 languages spoken, Indonesia is behind in terms of NLP progress. We introduce LoraxBench, a benchmark that focuses on low-resource languages of Indonesia and covers 6 diverse tasks: reading comprehension, open-domain QA, language inference, causal reasoning, translation, and cultural QA. Our dataset covers 20 languages, with the addition of two formality registers for three languages. We evaluate a diverse set of multilingual and region-focused LLMs and found that this benchmark is challenging. We note a visible discrepancy between performance in Indonesian and other languages, especially the low-resource ones. There is no clear lead when using a region-specific model as opposed to the general multilingual model. Lastly, we show that a change in register affects model performance, especially with registers not commonly found in social media, such as high-level politeness `Krama' Javanese.
- North America (1.00)
- Europe (1.00)
- Asia > Indonesia > Sumatra (0.46)
- (3 more...)
Marco-Bench-MIF: On Multilingual Instruction-Following Capability of Large Language Models
Zeng, Bo, Lyu, Chenyang, Liu, Sinuo, Zeng, Mingyan, Wu, Minghao, Ni, Xuanfan, Shi, Tianqi, Zhao, Yu, Liu, Yefeng, Zhu, Chenyu, Li, Ruizhe, Geng, Jiahui, Li, Qing, Tong, Yu, Wang, Longyue, Luo, Weihua, Zhang, Kaifu
Instruction-following capability has become a major ability to be evaluated for Large Language Models (LLMs). However, existing datasets, such as IFEval, are either predominantly monolingual and centered on English or simply machine translated to other languages, limiting their applicability in multilingual contexts. In this paper, we present an carefully-curated extension of IFEval to a localized multilingual version named Marco-Bench-MIF, covering 30 languages with varying levels of localization. Our benchmark addresses linguistic constraints (e.g., modifying capitalization requirements for Chinese) and cultural references (e.g., substituting region-specific company names in prompts) via a hybrid pipeline combining translation with verification. Through comprehensive evaluation of 20+ LLMs on our Marco-Bench-MIF, we found that: (1) 25-35% accuracy gap between high/low-resource languages, (2) model scales largely impact performance by 45-60% yet persists script-specific challenges, and (3) machine-translated data underestimates accuracy by7-22% versus localized data. Our analysis identifies challenges in multilingual instruction following, including keyword consistency preservation and compositional constraint adherence across languages. Our Marco-Bench-MIF is available at https://github.com/AIDC-AI/Marco-Bench-MIF.
- Asia > Indonesia > Borneo > Kalimantan > East Kalimantan > Nusantara (0.04)
- Asia > Middle East > Saudi Arabia > Asir Province > Abha (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Indonesia > Riau Islands (0.04)
Elevating Semantic Exploration: A Novel Approach Utilizing Distributed Repositories
Centralized and distributed systems are two main approaches to organizing ICT infrastructure, each with its pros and cons. Centralized systems concentrate resources in one location, making management easier but creating single points of failure. Distributed systems, on the other hand, spread resources across multiple nodes, offering better scalability and fault tolerance, but requiring more complex management. The choice between them depends on factors like application needs, scalability, and data sensitivity. Centralized systems suit applications with limited scalability and centralized control, while distributed systems excel in large-scale environments requiring high availability and performance. This paper explores a distributed document repository system developed for the Italian Ministry of Justice, using edge repositories to analyze textual data and metadata, enhancing semantic exploration capabilities.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Italy > Lombardy > Milan (0.04)
- Asia > Indonesia > Borneo > Kalimantan > East Kalimantan > Nusantara (0.04)
- Asia > China (0.04)
- Research Report > Promising Solution (0.40)
- Overview > Innovation (0.40)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)
Dual-Class Prompt Generation: Enhancing Indonesian Gender-Based Hate Speech Detection through Data Augmentation
Ibrahim, Muhammad Amien, Faisal, null, Winarto, Tora Sangputra Yopie, Sulistiya, Zefanya Delvin
Detecting gender-based hate speech in Indonesian social media remains challenging due to limited labeled datasets. While binary hate speech classification has advanced, a more granular category like gender-targeted hate speech is understudied because of class imbalance issues. This paper addresses this gap by comparing three data augmentation techniques for Indonesian gender-based hate speech detection. We evaluate backtranslation, single-class prompt generation (using only hate speech examples), and our proposed dual-class prompt generation (using both hate speech and non-hate speech examples). Experiments show all augmentation methods improve classification performance, with our dual-class approach achieving the best results (88.5% accuracy, 88.1% F1-score using Random Forest). Semantic similarity analysis reveals dual-class prompt generation produces the most novel content, while T-SNE visualizations confirm these samples occupy distinct feature space regions while maintaining class characteristics. Our findings suggest that incorporating examples from both classes helps language models generate more diverse yet representative samples, effectively addressing limited data challenges in specialized hate speech detection.
- Asia > Indonesia > Borneo > Kalimantan > East Kalimantan > Nusantara (0.05)
- Asia > Indonesia > Java > Jakarta > Jakarta (0.05)
- North America > United States > Hawaii (0.04)
Enhancing Poverty Targeting with Spatial Machine Learning: An application to Indonesia
Martinez, Rolando Gonzales, Cooray, Mariza
This study leverages spatial machine learning (SML) to enhance the accuracy of Proxy Means Testing (PMT) for poverty targeting in Indonesia. Conventional PMT methodologies are prone to exclusion and inclusion errors due to their inability to account for spatial dependencies and regional heterogeneity. By integrating spatial contiguity matrices, SML models mitigate these limitations, facilitating a more precise identification and comparison of geographical poverty clusters. Utilizing household survey data from the Social Welfare Integrated Data Survey (DTKS) for the periods 2016 to 2020 and 2016 to 2021, this study examines spatial patterns in income distribution and delineates poverty clusters at both provincial and district levels. Empirical findings indicate that the proposed SML approach reduces exclusion errors from 28% to 20% compared to standard machine learning models, underscoring the critical role of spatial analysis in refining machine learning-based poverty targeting. These results highlight the potential of SML to inform the design of more equitable and effective social protection policies, particularly in geographically diverse contexts. Future research can explore the applicability of spatiotemporal models and assess the generalizability of SML approaches across varying socio-economic settings.
- North America > United States (0.05)
- Asia > Indonesia > Nusa Tenggara Islands (0.05)
- Asia > Indonesia > Sumatra > Bengkulu > Bengkulu (0.04)
- (17 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.89)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)
Pretrained LLMs as Real-Time Controllers for Robot Operated Serial Production Line
Waseem, Muhammad, Bhatta, Kshitij, Li, Chen, Chang, Qing
The manufacturing industry is undergoing a transformative shift, driven by cutting-edge technologies like 5G, AI, and cloud computing. Despite these advancements, effective system control, which is crucial for optimizing production efficiency, remains a complex challenge due to the intricate, knowledge-dependent nature of manufacturing processes and the reliance on domain-specific expertise. Conventional control methods often demand heavy customization, considerable computational resources, and lack transparency in decision-making. In this work, we investigate the feasibility of using Large Language Models (LLMs), particularly GPT-4, as a straightforward, adaptable solution for controlling manufacturing systems, specifically, mobile robot scheduling. We introduce an LLM-based control framework to assign mobile robots to different machines in robot assisted serial production lines, evaluating its performance in terms of system throughput. Our proposed framework outperforms traditional scheduling approaches such as First-Come-First-Served (FCFS), Shortest Processing Time (SPT), and Longest Processing Time (LPT). While it achieves performance that is on par with state-of-the-art methods like Multi-Agent Reinforcement Learning (MARL), it offers a distinct advantage by delivering comparable throughput without the need for extensive retraining. These results suggest that the proposed LLM-based solution is well-suited for scenarios where technical expertise, computational resources, and financial investment are limited, while decision transparency and system scalability are critical concerns.
- North America > United States > Virginia (0.05)
- Asia > Indonesia > Borneo > Kalimantan > East Kalimantan > Nusantara (0.04)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Twenty Years of Personality Computing: Threats, Challenges and Future Directions
Celli, Fabio, Kartelj, Aleksandar, Đorđević, Miljan, Suhartono, Derwin, Filipović, Vladimir, Milutinović, Veljko, Spathoulas, Georgios, Vinciarelli, Alessandro, Kosinski, Michal, Lepri, Bruno
Personality Computing is a field at the intersection of Personality Psychology and Computer Science. Started in 2005, research in the field utilizes computational methods to understand and predict human personality traits. The expansion of the field has been very rapid and, by analyzing digital footprints (text, images, social media, etc.), it helped to develop systems that recognize and even replicate human personality. While offering promising applications in talent recruiting, marketing and healthcare, the ethical implications of Personality Computing are significant. Concerns include data privacy, algorithmic bias, and the potential for manipulation by personality-aware Artificial Intelligence. This paper provides an overview of the field, explores key methodologies, discusses the challenges and threats, and outlines potential future directions for responsible development and deployment of Personality Computing technologies.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Europe > Serbia > Central Serbia > Belgrade (0.05)
- (23 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Media (1.00)
- Law Enforcement & Public Safety (1.00)
- Information Technology > Services (1.00)
- (5 more...)