Philippines
Supplementary Materials
We provide the supplements of "Contextual Gaussian Process Bandits with Neural Networks" here. Specifically, we discuss alternative acquisition functions that can be incorporated with the neural network-accompanied Gaussian process (NN-AGP) model in Section 6. In Section 7, we discuss the bandit algorithm with NN-AGP, where the neural network approximation error is considered. In Section 8, we provide the detailed proof of theorems. We provide the experimental details and include additional numerical experiments in Section 9. Last we discuss the limitations of NN-AGP and propose the potential approaches to addressing the limitations for future work, including sparse NN-AGP for alleviating computational burdens and transfer learning with NN-AGP to address cold-start issue; see Section 10. In the main text, we employ the upper confidence bound function as the acquisition function in the contextual Bayesian optimization approach. Here, we provide two alternative choices: Thompson sampling (TS) and knowledge gradient (KG). We describe the two procedures of the contextual GP bandit problems with NN-AGP, where the acquisition function is replaced by TS or KG. It chooses the action that maximizes the expected reward with respect to a random belief that is drawn for a posterior distribution.
USAM-Net: A U-Net-based Network for Improved Stereo Correspondence and Scene Depth Estimation using Features from a Pre-trained Image Segmentation network
Dayo, Joseph Emmanuel DL, Naval, Prospero C. Jr
The increasing demand for high-accuracy depth estimation in autonomous driving and augmented reality applications necessitates advanced neural architectures capable of effectively leveraging multiple data modalities. In this context, we introduce the Unified Segmentation Attention Mechanism Network (USAM-Net), a novel convolutional neural network that integrates stereo image inputs with semantic segmentation maps and attention to enhance depth estimation performance. USAM-Net employs a dual-pathway architecture, which combines a pre-trained segmentation model (SAM) and a depth estimation model. The segmentation pathway preprocesses the stereo images to generate semantic masks, which are then concatenated with the stereo images as inputs to the depth estimation pathway. This integration allows the model to focus on important features such as object boundaries and surface textures which are crucial for accurate depth perception. Empirical evaluation on the DrivingStereo dataset demonstrates that USAM-Net achieves superior performance metrics, including a Global Difference (GD) of 3.61\% and an End-Point Error (EPE) of 0.88, outperforming traditional models such as CFNet, SegStereo, and iResNet. These results underscore the effectiveness of integrating segmentation information into stereo depth estimation tasks, highlighting the potential of USAM-Net in applications demanding high-precision depth data.
Reinforcement Learning Environment with LLM-Controlled Adversary in D&D 5th Edition Combat
Dayo, Joseph Emmanuel DL, Ogbinar, Michel Onasis S., Naval, Prospero C. Jr
The objective of this study is to design and implement a reinforcement learning (RL) environment using D\&D 5E combat scenarios to challenge smaller RL agents through interaction with a robust adversarial agent controlled by advanced Large Language Models (LLMs) like GPT-4o and LLaMA 3 8B. This research employs Deep Q-Networks (DQN) for the smaller agents, creating a testbed for strategic AI development that also serves as an educational tool by simulating dynamic and unpredictable combat scenarios. We successfully integrated sophisticated language models into the RL framework, enhancing strategic decision-making processes. Our results indicate that while RL agents generally outperform LLM-controlled adversaries in standard metrics, the strategic depth provided by LLMs significantly enhances the overall AI capabilities in this complex, rule-based setting. The novelty of our approach and its implications for mastering intricate environments and developing adaptive strategies are discussed, alongside potential innovations in AI-driven interactive simulations. This paper aims to demonstrate how integrating LLMs can create more robust and adaptable AI systems, providing valuable insights for further research and educational applications.
An Enhancement of Jiang, Z., et al.s Compression-Based Classification Algorithm Applied to News Article Categorization
Benavides, Sean Lester C., Masapol, Cid Antonio F., Morano, Jonathan C., Cortez, Dan Michael A.
This study enhances Jiang et al.'s compression-based classification algorithm by addressing its limitations in detecting semantic similarities between text documents. The proposed improvements focus on unigram extraction and optimized concatenation, eliminating reliance on entire document compression. By compressing extracted unigrams, the algorithm mitigates sliding window limitations inherent to gzip, improving compression efficiency and similarity detection. The optimized concatenation strategy replaces direct concatenation with the union of unigrams, reducing redundancy and enhancing the accuracy of Normalized Compression Distance (NCD) calculations. Experimental results across datasets of varying sizes and complexities demonstrate an average accuracy improvement of 5.73%, with gains of up to 11% on datasets containing longer documents. Notably, these improvements are more pronounced in datasets with high-label diversity and complex text structures. The methodology achieves these results while maintaining computational efficiency, making it suitable for resource-constrained environments. This study provides a robust, scalable solution for text classification, emphasizing lightweight preprocessing techniques to achieve efficient compression, which in turn enables more accurate classification.
Semantic Decomposition and Selective Context Filtering -- Text Processing Techniques for Context-Aware NLP-Based Systems
In this paper, we present two techniques for use in context-aware systems: Semantic Decomposition, which sequentially decomposes input prompts into a structured and hierarchal information schema in which systems can parse and process easily, and Selective Context Filtering, which enables systems to systematically filter out specific irrelevant sections of contextual information that is fed through a system's NLP-based pipeline. We will explore how context-aware systems and applications can utilize these two techniques in order to implement dynamic LLM-to-system interfaces, improve an LLM's ability to generate more contextually cohesive user-facing responses, and optimize complex automated workflows and pipelines.
Supplementary Materials
We provide the supplements of "Contextual Gaussian Process Bandits with Neural Networks" here. Specifically, we discuss alternative acquisition functions that can be incorporated with the neural network-accompanied Gaussian process (NN-AGP) model in Section 6. In Section 7, we discuss the bandit algorithm with NN-AGP, where the neural network approximation error is considered. In Section 8, we provide the detailed proof of theorems. We provide the experimental details and include additional numerical experiments in Section 9. Last we discuss the limitations of NN-AGP and propose the potential approaches to addressing the limitations for future work, including sparse NN-AGP for alleviating computational burdens and transfer learning with NN-AGP to address cold-start issue; see Section 10. In the main text, we employ the upper confidence bound function as the acquisition function in the contextual Bayesian optimization approach. Here, we provide two alternative choices: Thompson sampling (TS) and knowledge gradient (KG). We describe the two procedures of the contextual GP bandit problems with NN-AGP, where the acquisition function is replaced by TS or KG. It chooses the action that maximizes the expected reward with respect to a random belief that is drawn for a posterior distribution.
Advancing Vehicle Plate Recognition: Multitasking Visual Language Models with VehiclePaliGemma
AlDahoul, Nouar, Tan, Myles Joshua Toledo, Tera, Raghava Reddy, Karim, Hezerul Abdul, Lim, Chee How, Mishra, Manish Kumar, Zaki, Yasir
License plate recognition (LPR) involves automated systems that utilize cameras and computer vision to read vehicle license plates. Such plates collected through LPR can then be compared against databases to identify stolen vehicles, uninsured drivers, crime suspects, and more. The LPR system plays a significant role in saving time for institutions such as the police force. In the past, LPR relied heavily on Optical Character Recognition (OCR), which has been widely explored to recognize characters in images. Usually, collected plate images suffer from various limitations, including noise, blurring, weather conditions, and close characters, making the recognition complex. Existing LPR methods still require significant improvement, especially for distorted images. To fill this gap, we propose utilizing visual language models (VLMs) such as OpenAI GPT4o, Google Gemini 1.5, Google PaliGemma (Pathways Language and Image model + Gemma model), Meta Llama 3.2, Anthropic Claude 3.5 Sonnet, LLaVA, NVIDIA VILA, and moondream2 to recognize such unclear plates with close characters. This paper evaluates the VLM's capability to address the aforementioned problems. Additionally, we introduce ``VehiclePaliGemma'', a fine-tuned Open-sourced PaliGemma VLM designed to recognize plates under challenging conditions. We compared our proposed VehiclePaliGemma with state-of-the-art methods and other VLMs using a dataset of Malaysian license plates collected under complex conditions. The results indicate that VehiclePaliGemma achieved superior performance with an accuracy of 87.6\%. Moreover, it is able to predict the car's plate at a speed of 7 frames per second using A100-80GB GPU. Finally, we explored the multitasking capability of VehiclePaliGemma model to accurately identify plates containing multiple cars of various models and colors, with plates positioned and oriented in different directions.
An Enhancement of CNN Algorithm for Rice Leaf Disease Image Classification in Mobile Applications
Rodrigo, Kayne Uriel K., Marcial, Jerriane Hillary Heart S., Brillo, Samuel C., Mata, Khatalyn E., Morano, Jonathan C.
This study focuses on enhancing rice leaf disease image classification algorithms, which have traditionally relied on Convolutional Neural Network (CNN) models. We employed transfer learning with MobileViTV2_050 using ImageNet-1k weights, a lightweight model that integrates CNN's local feature extraction with Vision Transformers' global context learning through a separable self-attention mechanism. Our approach resulted in a significant 15.66% improvement in classification accuracy for MobileViTV2_050-A, our first enhanced model trained on the baseline dataset, achieving 93.14%. Furthermore, MobileViTV2_050-B, our second enhanced model trained on a broader rice leaf dataset, demonstrated a 22.12% improvement, reaching 99.6% test accuracy. Additionally, MobileViTV2-A attained an F1-score of 93% across four rice labels and a Receiver Operating Characteristic (ROC) curve ranging from 87% to 97%. In terms of resource consumption, our enhanced models reduced the total parameters of the baseline CNN model by up to 92.50%, from 14 million to 1.1 million. These results indicate that MobileViTV2_050 not only improves computational efficiency through its separable self-attention mechanism but also enhances global context learning. Consequently, it offers a lightweight and robust solution suitable for mobile deployment, advancing the interpretability and practicality of models in precision agriculture.
AI TrackMate: Finally, Someone Who Will Give Your Music More Than Just "Sounds Great!"
Jiang, Yi-Lin, Hsiung, Chia-Ho, Yeh, Yen-Tung, Chen, Lu-Rong, Chen, Bo-Yu
The rise of "bedroom producers" has democratized music creation, while challenging producers to objectively evaluate their work. To address this, we present AI TrackMate, an LLM-based music chatbot designed to provide constructive feedback on music productions. By combining LLMs' inherent musical knowledge with direct audio track analysis, AI TrackMate offers production-specific insights, distinguishing it from text-only approaches. Our framework integrates a Music Analysis Module, an LLM-Readable Music Report, and Music Production-Oriented Feedback Instruction, creating a plug-and-play, training-free system compatible with various LLMs and adaptable to future advancements. We demonstrate AI TrackMate's capabilities through an interactive web interface and present findings from a pilot study with a music producer. By bridging AI capabilities with the needs of independent producers, AI TrackMate offers on-demand analytical feedback, potentially supporting the creative process and skill development in music production. This system addresses the growing demand for objective self-assessment tools in the evolving landscape of independent music production.
Drone footage shows huge fire engulfing Manila shanty town
Huge flames can be seen engulfing a closely-built shanty community in the port area of Manila, in drone footage released by the city's disaster management office. Hundreds of residents have been left without homes, after around 1,000 houses were destroyed in the blaze on Sunday, Manila Fire District said. Dozens of fire engines and fire boats were deployed to tackle the blaze, and the Philippine Air Force sent two helicopters which scooped up up water from Manila Bay to help extinguish the fire. Emergency services have reported no casualties so far, and are yet to identify the cause. The incident comes months after 11 people died in residential fire in the Chinatown district of the Philippine capital.