AITopics | Visayas

Collaborating Authors

Visayas

Auslan-Daily: Australian Sign Language Translation for Daily Communication and News

Neural Information Processing SystemsFeb-18-2026, 04:02:11 GMT

Considering different geographic regions generally have their own native sign languages, it is valuable to establish corresponding SL T datasets to support related communication and research. Auslan, as a sign language specific to Australia, still lacks a dedicated large-scale dataset for SL T.

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country:

Asia > Philippines > Luzon > National Capital Region > City of Manila (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(22 more...)

Industry:

Education > Curriculum > Subject-Specific Education (0.96)
Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

HiligayNER: A Baseline Named Entity Recognition Model for Hiligaynon

Teves, James Ald, Cal, Ray Daniel, Villaluz, Josh Magdiel, Malolos, Jean, Magtira, Mico, Rodriguez, Ramon, Abisado, Mideth, Imperial, Joseph Marvin

arXiv.org Artificial IntelligenceOct-14-2025

The language of Hiligaynon, spoken predominantly by the people of Panay Island, Negros Occidental, and Soccsksargen in the Philippines, remains underrepresented in language processing research due to the absence of annotated corpora and baseline models. This study introduces HiligayNER, the first publicly available baseline model for the task of Named Entity Recognition (NER) in Hiligaynon. The dataset used to build HiligayNER contains over 8,000 annotated sentences collected from publicly available news articles, social media posts, and literary texts. Two Transformer-based models, mBERT and XLM-RoBERTa, were fine-tuned on this collected corpus to build versions of HiligayNER. Evaluation results show strong performance, with both models achieving over 80% in precision, recall, and F1-score across entity types. Furthermore, cross-lingual evaluation with Cebuano and Tagalog demonstrates promising transferability, suggesting the broader applicability of HiligayNER for multilingual NLP in low-resource settings. This work aims to contribute to language technology development for underrepresented Philippine languages, specifically for Hiligaynon, and support future research in regional language processing.

computational linguistic, information retrieval, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2510.10776

Country:

North America > United States > Minnesota (0.28)
Asia > Philippines > Visayas > Negros Island Region > Province of Negros Occidental (0.24)
Asia > Philippines > Mindanao > Soccsksargen (0.24)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

feb34ce77fc8b94c85d12e608b23ce67-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsOct-9-2025, 12:52:44 GMT

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country:

Asia > Philippines > Luzon > National Capital Region > City of Manila (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(22 more...)

Industry:

Health & Medicine (0.69)
Media (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Large Language Models Are Effective Human Annotation Assistants, But Not Good Independent Annotators

Gu, Feng, Li, Zongxia, Colon, Carlos Rafael, Evans, Benjamin, Mondal, Ishani, Boyd-Graber, Jordan Lee

arXiv.org Artificial IntelligenceMar-9-2025

Event annotation is important for identifying market changes, monitoring breaking news, and understanding sociological trends. Although expert annotators set the gold standards, human coding is expensive and inefficient. Unlike information extraction experiments that focus on single contexts, we evaluate a holistic workflow that removes irrelevant documents, merges documents about the same event, and annotates the events. Although LLM-based automated annotations are better than traditional TF-IDF-based methods or Event Set Curation, they are still not reliable annotators compared to human experts. However, adding LLMs to assist experts for Event Set Curation can reduce the time and mental effort required for Variable Annotation. When using LLMs to extract event variables to assist expert annotators, they agree more with the extracted variables than fully automated LLMs for annotation.

agreement, annotation, annotator, (13 more...)

arXiv.org Artificial Intelligence

2503.06778

Country:

North America > United States > New York > New York County > New York City (0.15)
Asia > Middle East > Jordan (0.05)
North America > United States > Maryland > Prince George's County > College Park (0.04)
(12 more...)

Genre: Research Report (1.00)

Industry:

Law Enforcement & Public Safety > Terrorism (0.48)
Media > News (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Semantic Decomposition and Selective Context Filtering -- Text Processing Techniques for Context-Aware NLP-Based Systems

Villardar, Karl John

arXiv.org Artificial IntelligenceFeb-19-2025

In this paper, we present two techniques for use in context-aware systems: Semantic Decomposition, which sequentially decomposes input prompts into a structured and hierarchal information schema in which systems can parse and process easily, and Selective Context Filtering, which enables systems to systematically filter out specific irrelevant sections of contextual information that is fed through a system's NLP-based pipeline. We will explore how context-aware systems and applications can utilize these two techniques in order to implement dynamic LLM-to-system interfaces, improve an LLM's ability to generate more contextually cohesive user-facing responses, and optimize complex automated workflows and pipelines.

arxiv, information, language model, (14 more...)

arXiv.org Artificial Intelligence

2502.14048

Country:

Asia > South Korea (0.04)
Asia > Philippines > Visayas > Central Visayas > Province of Cebu > City of Cebu (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine (1.00)
Banking & Finance (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Advancing Vehicle Plate Recognition: Multitasking Visual Language Models with VehiclePaliGemma

AlDahoul, Nouar, Tan, Myles Joshua Toledo, Tera, Raghava Reddy, Karim, Hezerul Abdul, Lim, Chee How, Mishra, Manish Kumar, Zaki, Yasir

arXiv.org Artificial IntelligenceDec-14-2024

License plate recognition (LPR) involves automated systems that utilize cameras and computer vision to read vehicle license plates. Such plates collected through LPR can then be compared against databases to identify stolen vehicles, uninsured drivers, crime suspects, and more. The LPR system plays a significant role in saving time for institutions such as the police force. In the past, LPR relied heavily on Optical Character Recognition (OCR), which has been widely explored to recognize characters in images. Usually, collected plate images suffer from various limitations, including noise, blurring, weather conditions, and close characters, making the recognition complex. Existing LPR methods still require significant improvement, especially for distorted images. To fill this gap, we propose utilizing visual language models (VLMs) such as OpenAI GPT4o, Google Gemini 1.5, Google PaliGemma (Pathways Language and Image model + Gemma model), Meta Llama 3.2, Anthropic Claude 3.5 Sonnet, LLaVA, NVIDIA VILA, and moondream2 to recognize such unclear plates with close characters. This paper evaluates the VLM's capability to address the aforementioned problems. Additionally, we introduce ``VehiclePaliGemma'', a fine-tuned Open-sourced PaliGemma VLM designed to recognize plates under challenging conditions. We compared our proposed VehiclePaliGemma with state-of-the-art methods and other VLMs using a dataset of Malaysian license plates collected under complex conditions. The results indicate that VehiclePaliGemma achieved superior performance with an accuracy of 87.6\%. Moreover, it is able to predict the car's plate at a speed of 7 frames per second using A100-80GB GPU. Finally, we explored the multitasking capability of VehiclePaliGemma model to accurately identify plates containing multiple cars of various models and colors, with plates positioned and oriented in different directions.

large language model, machine learning, recognition, (20 more...)

arXiv.org Artificial Intelligence

2412.14197

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Asia > Malaysia (0.04)
North America > United States > New York (0.04)
Asia > Philippines > Visayas > Negros Island Region > Province of Negros Occidental > City of Bacolod (0.04)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (0.93)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Table-LLM-Specialist: Language Model Specialists for Tables using Iterative Generator-Validator Fine-tuning

Xing, Junjie, He, Yeye, Zhou, Mengyu, Dong, Haoyu, Han, Shi, Zhang, Dongmei, Chaudhuri, Surajit

arXiv.org Artificial IntelligenceOct-15-2024

In this work, we propose Table-LLM-Specialist, or Table-Specialist for short, as a new self-trained fine-tuning paradigm specifically designed for table tasks. Our insight is that for each table task, there often exist two dual versions of the same task, one generative and one classification in nature. Leveraging their duality, we propose a Generator-Validator paradigm, to iteratively generate-then-validate training data from language-models, to fine-tune stronger \sys models that can specialize in a given task, without requiring manually-labeled data. Our extensive evaluations suggest that our Table-Specialist has (1) \textit{strong performance} on diverse table tasks over vanilla language-models -- for example, Table-Specialist fine-tuned on GPT-3.5 not only outperforms vanilla GPT-3.5, but can often match or surpass GPT-4 level quality, (2) \textit{lower cost} to deploy, because when Table-Specialist fine-tuned on GPT-3.5 achieve GPT-4 level quality, it becomes possible to deploy smaller models with lower latency and inference cost, with comparable quality, and (3) \textit{better generalizability} when evaluated across multiple benchmarks, since \sys is fine-tuned on a broad range of training data systematically generated from diverse real tables. Our code and data will be available at https://github.com/microsoft/Table-LLM-Specialist.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.12164

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Russia (0.14)
Asia > Russia (0.14)
(73 more...)

Genre: Research Report (1.00)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Media (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Enhancing coastal water body segmentation with Landsat Irish Coastal Segmentation (LICS) dataset

O'Sullivan, Conor, Kashyap, Ambrish, Coveney, Seamus, Monteys, Xavier, Dev, Soumyabrata

arXiv.org Artificial IntelligenceSep-5-2024

Ireland's coastline, a critical and dynamic resource, is facing challenges such as erosion, sedimentation, and human activities. Monitoring these changes is a complex task we approach using a combination of satellite imagery and deep learning methods. However, limited research exists in this area, particularly for Ireland. This paper presents the Landsat Irish Coastal Segmentation (LICS) dataset, which aims to facilitate the development of deep learning methods for coastal water body segmentation while addressing modelling challenges specific to Irish meteorology and coastal types. The dataset is used to evaluate various automated approaches for segmentation, with U-NET achieving the highest accuracy of 95.0% among deep learning methods. Nevertheless, the Normalised Difference Water Index (NDWI) benchmark outperformed U-NET with an average accuracy of 97.2%. The study suggests that deep learning approaches can be further improved with more accurate training data and by considering alternative measurements of erosion. The LICS dataset and code are freely available to support reproducible research and further advancements in coastal monitoring efforts.

coastline, pixel, segmentation, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.rsase.2024.101276

2409.15311

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Pacific Ocean > North Pacific Ocean > East China Sea > Yellow Sea (0.04)
Oceania > Australia (0.04)
(4 more...)

Genre: Research Report > Experimental Study (0.48)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Machine Learning Framework for High-Resolution Air Temperature Downscaling Using LiDAR-Derived Urban Morphological Features

Chajaei, Fatemeh, Bagheri, Hossein

arXiv.org Artificial IntelligenceAug-31-2024

Climate models lack the necessary resolution for urban climate studies, requiring computationally intensive processes to estimate high resolution air temperatures. In contrast, Data-driven approaches offer faster and more accurate air temperature downscaling. This study presents a data-driven framework for downscaling air temperature using publicly available outputs from urban climate models, specifically datasets generated by UrbClim. The proposed framework utilized morphological features extracted from LiDAR data. To extract urban morphological features, first a three-dimensional building model was created using LiDAR data and deep learning models. Then, these features were integrated with meteorological parameters such as wind, humidity, etc., to downscale air temperature using machine learning algorithms. The results demonstrated that the developed framework effectively extracted urban morphological features from LiDAR data. Deep learning algorithms played a crucial role in generating three-dimensional models for extracting the aforementioned features. Also, the evaluation of air temperature downscaling results using various machine learning models indicated that the LightGBM model had the best performance with an RMSE of 0.352{\deg}K and MAE of 0.215{\deg}K. Furthermore, the examination of final air temperature maps derived from downscaling showed that the developed framework successfully estimated air temperatures at higher resolutions, enabling the identification of local air temperature patterns at street level. The corresponding source codes are available on GitHub: https://github.com/FatemehCh97/Air-Temperature-Downscaling.

air temperature, building footprint, resolution, (12 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.uclim.2024.102102

2409.0212

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.07)
Asia > Singapore (0.04)
(23 more...)

Genre: Research Report > New Finding (0.69)

Industry: Energy (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evaluating Algorithmic Bias in Models for Predicting Academic Performance of Filipino Students

Švábenský, Valdemar, Verger, Mélina, Rodrigo, Maria Mercedes T., Monterozo, Clarence James G., Baker, Ryan S., Saavedra, Miguel Zenon Nicanor Lerias, Lallé, Sébastien, Shimada, Atsushi

arXiv.org Artificial IntelligenceJul-15-2024

Algorithmic bias is a major issue in machine learning models in educational contexts. However, it has not yet been studied thoroughly in Asian learning contexts, and only limited work has considered algorithmic bias based on regional (sub-national) background. As a step towards addressing this gap, this paper examines the population of 5,986 students at a large university in the Philippines, investigating algorithmic bias based on students' regional background. The university used the Canvas learning management system (LMS) in its online courses across a broad range of domains. Over the period of three semesters, we collected 48.7 million log records of the students' activity in Canvas. We used these logs to train binary classification models that predict student grades from the LMS activity. The best-performing model reached AUC of 0.75 and weighted F1-score of 0.79. Subsequently, we examined the data for bias based on students' region. Evaluation using three metrics: AUC, weighted F1-score, and MADD showed consistent results across all demographic groups. Thus, no unfairness was observed against a particular student group in the grade predictions.

educational data mining, philippines, student, (10 more...)

arXiv.org Artificial Intelligence

doi: 10.5281/zenodo.12729936

2405.09821

Country:

Asia > Philippines > Luzon > National Capital Region > City of Manila (0.05)
Asia > India > Karnataka > Bengaluru (0.05)
Asia > Japan > Kyūshū & Okinawa > Kyūshū (0.04)
(8 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)
Instructional Material > Course Syllabus & Notes (0.46)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.48)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback