Goto

Collaborating Authors

 Java


CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark David Romero

Neural Information Processing Systems

Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA models use datasets that are primarily focused on English and a few major world languages, with images that are typically Western-centric.


Enhancing Poverty Targeting with Spatial Machine Learning: An application to Indonesia

arXiv.org Machine Learning

This study leverages spatial machine learning (SML) to enhance the accuracy of Proxy Means Testing (PMT) for poverty targeting in Indonesia. Conventional PMT methodologies are prone to exclusion and inclusion errors due to their inability to account for spatial dependencies and regional heterogeneity. By integrating spatial contiguity matrices, SML models mitigate these limitations, facilitating a more precise identification and comparison of geographical poverty clusters. Utilizing household survey data from the Social Welfare Integrated Data Survey (DTKS) for the periods 2016 to 2020 and 2016 to 2021, this study examines spatial patterns in income distribution and delineates poverty clusters at both provincial and district levels. Empirical findings indicate that the proposed SML approach reduces exclusion errors from 28% to 20% compared to standard machine learning models, underscoring the critical role of spatial analysis in refining machine learning-based poverty targeting. These results highlight the potential of SML to inform the design of more equitable and effective social protection policies, particularly in geographically diverse contexts. Future research can explore the applicability of spatiotemporal models and assess the generalizability of SML approaches across varying socio-economic settings.


NusaAksara: A Multimodal and Multilingual Benchmark for Preserving Indonesian Indigenous Scripts

arXiv.org Artificial Intelligence

Indonesia is rich in languages and scripts. However, most NLP progress has been made using romanized text. In this paper, we present NusaAksara, a novel public benchmark for Indonesian languages that includes their original scripts. Our benchmark covers both text and image modalities and encompasses diverse tasks such as image segmentation, OCR, transliteration, translation, and language identification. Our data is constructed by human experts through rigorous steps. NusaAksara covers 8 scripts across 7 languages, including low-resource languages not commonly seen in NLP benchmarks. Although unsupported by Unicode, the Lampung script is included in this dataset. We benchmark our data across several models, from LLMs and VLMs such as GPT-4o, Llama 3.2, and Aya 23 to task-specific systems such as PP-OCR and LangID, and show that most NLP technologies cannot handle Indonesia's local scripts, with many achieving near-zero performance.


Synthetic Data Generation for Culturally Nuanced Commonsense Reasoning in Low-Resource Languages

arXiv.org Artificial Intelligence

Quantifying reasoning capability in low-resource languages remains a challenge in NLP due to data scarcity and limited access to annotators. While LLM-assisted dataset construction has proven useful for medium- and high-resource languages, its effectiveness in low-resource languages, particularly for commonsense reasoning, is still unclear. In this paper, we compare three dataset creation strategies: (1) LLM-assisted dataset generation, (2) machine translation, and (3) human-written data by native speakers, to build a culturally nuanced story comprehension dataset. We focus on Javanese and Sundanese, two major local languages in Indonesia, and evaluate the effectiveness of open-weight and closed-weight LLMs in assisting dataset creation through extensive manual validation. To assess the utility of synthetic data, we fine-tune language models on classification and generation tasks using this data and evaluate performance on a human-written test set. Our findings indicate that LLM-assisted data creation outperforms machine translation.


LLM Distillation for Efficient Few-Shot Multiple Choice Question Answering

arXiv.org Artificial Intelligence

Multiple Choice Question Answering (MCQA) is an important problem with numerous real-world applications, such as medicine, law, and education. The high cost of building MCQA datasets makes few-shot learning pivotal in this domain. While Large Language Models (LLMs) can enable few-shot learning, their direct application in real-world scenarios is often hindered by their high computational cost. To address this challenge, we propose a simple yet effective approach that uses LLMs for data generation and scoring. Our approach utilizes LLMs to create MCQA data which contains questions and choices, and to assign probability scores to the generated choices. We then use the generated data and LLM-assigned scores to finetune a smaller and more efficient encoder-only model, DeBERTa-v3-base by leveraging distillation loss. Extensive experiments on the Massive Multitask Language Understanding (MMLU) benchmark demonstrate that our method improves accuracy from 28.9% to 39.3%, representing a gain of over 10% compared to a baseline finetuned directly on 5-shot examples. This shows the effectiveness of LLM-driven data generation and knowledge distillation for few-shot MCQA.


The Femininomenon of Inequality: A Data-Driven Analysis and Cluster Profiling in Indonesia

arXiv.org Artificial Intelligence

This study addresses the persistent challenges of Workplace Gender Equality (WGE) in Indonesia, examining regional disparities in gender empowerment and inequality through the Gender Empowerment Index (IDG) and Gender Inequality Index (IKG). Despite Indonesia's economic growth and incremental progress in gender equality, as indicated by improvements in the IDG and IKG scores from 2018 to 2023, substantial regional differences remain. Utilizing k-means clustering, the study identifies two distinct clusters of regions with contrasting gender profiles. Cluster 0 includes regions like DKI Jakarta and Central Java, characterized by higher gender empowerment and lower inequality, while Cluster 1 comprises areas such as Papua and North Maluku, where gender disparities are more pronounced. The analysis reveals that local socio-economic conditions and governance frameworks play a critical role in shaping regional gender dynamics. Correlation analyses further demonstrate that higher empowerment is generally associated with lower inequality and greater female representation in professional roles. These findings underscore the importance of targeted, region-specific interventions to promote WGE, addressing both structural and cultural barriers. The insights provided by this study aim to guide policymakers in developing tailored strategies to foster gender equality and enhance women's participation in the workforce across Indonesia's diverse regions.


Enhancing Phishing Detection through Feature Importance Analysis and Explainable AI: A Comparative Study of CatBoost, XGBoost, and EBM Models

arXiv.org Artificial Intelligence

Phishing attacks remain a persistent threat to online security, demanding robust detection methods. This study investigates the use of machine learning to identify phishing URLs, emphasizing the crucial role of feature selection and model interpretability for improved performance. Employing Recursive Feature Elimination, the research pinpointed key features like "length_url," "time_domain_activation" and "Page_rank" as strong indicators of phishing attempts. The study evaluated various algorithms, including CatBoost, XGBoost, and Explainable Boosting Machine, assessing their robustness and scalability. XGBoost emerged as highly efficient in terms of runtime, making it well-suited for large datasets. CatBoost, on the other hand, demonstrated resilience by maintaining high accuracy even with reduced features. To enhance transparency and trustworthiness, Explainable AI techniques, such as SHAP, were employed to provide insights into feature importance. The study's findings highlight that effective feature selection and model interpretability can significantly bolster phishing detection systems, paving the way for more efficient and adaptable defenses against evolving cyber threats


Time and Frequency Synergy for Source-Free Time-Series Domain Adaptations

arXiv.org Artificial Intelligence

The issue of source-free time-series domain adaptations still gains scarce research attentions. On the other hand, existing approaches rely solely on time-domain features ignoring frequency components providing complementary information. This paper proposes Time Frequency Domain Adaptation (TFDA), a method to cope with the source-free time-series domain adaptation problems. TFDA is developed with a dual branch network structure fully utilizing both time and frequency features in delivering final predictions. It induces pseudo-labels based on a neighborhood concept where predictions of a sample group are aggregated to generate reliable pseudo labels. The concept of contrastive learning is carried out in both time and frequency domains with pseudo label information and a negative pair exclusion strategy to make valid neighborhood assumptions. In addition, the time-frequency consistency technique is proposed using the self-distillation strategy while the uncertainty reduction strategy is implemented to alleviate uncertainties due to the domain shift problem. Last but not least, the curriculum learning strategy is integrated to combat noisy pseudo labels. Our experiments demonstrate the advantage of our approach over prior arts with noticeable margins in benchmark problems.


Using LLM for Real-Time Transcription and Summarization of Doctor-Patient Interactions into ePuskesmas in Indonesia

arXiv.org Artificial Intelligence

One of the key issues contributing to inefficiency in Puskesmas is the time-consuming nature of doctor-patient interactions. Doctors need to conduct thorough consultations, which include diagnosing the patient's condition, providing treatment advice, and transcribing detailed notes into medical records. In regions with diverse linguistic backgrounds, doctors often have to ask clarifying questions, further prolonging the process. While diagnosing is essential, transcription and summarization can often be automated using AI to improve time efficiency and help doctors enhance care quality and enable early diagnosis and intervention. This paper proposes a solution using a localized large language model (LLM) to transcribe, translate, and summarize doctor-patient conversations. We utilize the Whisper model for transcription and GPT-3 to summarize them into the ePuskemas medical records format. This system is implemented as an add-on to an existing web browser extension, allowing doctors to fill out patient forms while talking. By leveraging this solution for real-time transcription, translation, and summarization, doctors can improve the turnaround time for patient care while enhancing the quality of records, which become more detailed and insightful for future visits. This innovation addresses challenges like overcrowded facilities and the administrative burden on healthcare providers in Indonesia. We believe this solution will help doctors save time, provide better care, and produce more accurate medical records, representing a significant step toward modernizing healthcare and ensuring patients receive timely, high-quality care, even in resource-constrained settings.


Machine Learning Models for Dengue Forecasting in Singapore

arXiv.org Artificial Intelligence

With emerging prevalence beyond traditionally endemic regions, the global burden of dengue disease is forecasted to be one of the fastest growing. With limited direct treatment or vaccination currently available, prevention through vector control is widely believed to be the most effective form of managing outbreaks. This study examines traditional state space models (moving average, autoregressive, ARIMA, SARIMA), supervised learning techniques (XGBoost, SVM, KNN) and deep networks (LSTM, CNN, ConvLSTM) for forecasting weekly dengue cases in Singapore. Meteorological data and search engine trends were included as features for ML techniques. Forecasts using CNNs yielded lowest RMSE in weekly cases in 2019.