Goto

Collaborating Authors

 Performance Analysis


Improving Functional Reliability of Near-Field Monitoring for Emergency Braking in Autonomous Vehicles

arXiv.org Artificial Intelligence

-- Autonomous vehicles require reliable hazard detection. However, primary sensor systems may miss near-field obstacles, resulting in safety risks. Although a dedicated fast-reacting near-field monitoring system can mitigate this, it typically suffers from false positives. T o mitigate these, in this paper, we introduce three monitoring strategies based on dynamic spatial properties, relevant object sizes, and motion-aware prediction. In experiments in a validated simulation, we compare the initial monitoring strategy against the proposed improvements. The results demonstrate that the proposed strategies can significantly improve the reliability of near-field monitoring systems. Advanced Driver Assistance Systems (ADASs) are rapidly advancing toward improved automation and safety in transportation; however, ensuring robust and reliable safety of self-driving vehicles remains a critical challenge. While Autonomous Emergency Braking Systems (AEBSs) have demonstrated potential in reducing collision risks, these systems often rely on the same sensor setup as the high-level driving system, which can miss hazards outside the sensors' Field-of-View (FOV). Especially in complex urban environments, such blind spots underscore the need for dedicated near-field monitoring systems as an additional safety layer.


Smart Eyes for Silent Threats: VLMs and In-Context Learning for THz Imaging

arXiv.org Artificial Intelligence

T erahertz (THz) imaging enables non-invasive analysis for applications such as security screening and material classification, but effective image classification remains challenging due to limited annotations, low resolution, and visual ambiguity. W e introduce In-Context Learning (ICL) with Vision-Language Models (VLMs) as a flexible, interpretable alternative that requires no fine-tuning. Using a modality-aligned prompting framework, we adapt two open-weight VLMs to the THz domain and evaluate them under zero-shot and one-shot settings. Our results show that ICL improves classification and interpretability in low-data regimes. This is the first application of ICL-enhanced VLMs to THz imaging, offering a promising direction for resource-constrained scientific domains.


An Investigation of Test-time Adaptation for Audio Classification under Background Noise

arXiv.org Artificial Intelligence

Domain shift is a prominent problem in Deep Learning, causing a model pre-trained on a source dataset to suffer significant performance degradation on test datasets. This research aims to address the issue of audio classification under domain shift caused by background noise using Test-Time Adaptation (TTA), a technique that adapts a pre-trained model during testing using only unlabelled test data before making predictions. We adopt two common TTA methods, TTT and TENT, and a state-of-the-art method CoNMix, and investigate their respective performance on two popular audio classification datasets, AudioMNIST (AM) and SpeechCommands V1 (SC), against different types of background noise and noise severity levels. The experimental results reveal that our proposed modified version of CoN-Mix produced the highest classification accuracy under domain shift (5.31% error rate under 10 dB exercise bike background noise and 12.75% error rate under 3 dB running tap background noise for AM) compared to TTT and TENT. The literature search provided no evidence of similar works, thereby motivating the work reported here as the first study to leverage TTA techniques for audio classification under domain shift.


Robots for Kiwifruit Harvesting and Pollination

arXiv.org Artificial Intelligence

This research was a part of a project that developed mobile robots that performed targeted pollen spraying and automated harvesting in pergola structured kiwifruit orchards. Multiple kiwifruit detachment mechanisms were designed and field testing of one of the concepts showed that the mechanism could reliably pick kiwifruit. Furthermore, this kiwifruit detachment mechanism was able to reach over 80 percent of fruit in the cluttered kiwifruit canopy, whereas the previous state of the art mechanism was only able to reach less than 70 percent of the fruit. Artificial pollination was performed by detecting flowers and then spraying pollen in solution onto the detected flowers from a line of sprayers on a boom, while driving at up to 1.4 ms-1. In addition, the height of the canopy was measured and the spray boom was moved up and down to keep the boom close enough to the flowers for the spray to reach the flowers, while minimising collisions with the canopy. Mobile robot navigation was performed using a 2D lidar in apple orchards and vineyards. Lidar navigation in kiwifruit orchards was more challenging because the pergola structure only provides a small amount of data for the direction of rows, compared to the amount of data from the overhead canopy, the undulating ground and other objects in the orchards. Multiple methods are presented here for extracting structure defining features from 3D lidar data in kiwifruit orchards. In addition, a 3D lidar navigation system -- which performed row following, row end detection and row end turns -- was tested for over 30 km of autonomous driving in kiwifruit orchards. Computer vision algorithms for row detection and row following were also tested. The computer vision algorithm worked as well as the 3D lidar row following method in testing.


Improving AEBS Validation Through Objective Intervention Classification Leveraging the Prediction Divergence Principle

arXiv.org Artificial Intelligence

Personal use of this material is permitted. Abstract --The safety validation of automatic emergency braking system (AEBS) requires accurately distinguishing between false positive (FP) and true positive (TP) system activations. While simulations allow straightforward differentiation by comparing scenarios with and without interventions, analyzing activations from open-loop resimulations -- such as those from field operational testing (FOT) -- is more complex. This complexity arises from scenario parameter uncertainty and the influence of driver interventions in the recorded data. Human labeling is frequently used to address these challenges, relying on subjective assessments of intervention necessity or situational criticality, potentially introducing biases and limitations. This work proposes a rule-based classification approach leveraging the Prediction Divergence Principle (PDP) to address those issues. Applied to a simplified AEBS, the proposed method reveals key strengths, limitations, and system requirements for effective implementation. The findings suggest that combining this approach with human labeling may enhance the transparency and consistency of classification, thereby improving the overall validation process. While the rule set for classification derived in this work adopts a conservative approach, the paper outlines future directions for refinement and broader applicability. Finally, this work highlights the potential of such methods to complement existing practices, paving the way for more reliable and reproducible AEBS validation frameworks.


Know Or Not: a library for evaluating out-of-knowledge base robustness

arXiv.org Artificial Intelligence

While the capabilities of large language models (LLMs) have progressed significantly, their use in high-stakes applications have been limited due to risks of hallucination. One key approach in reducing hallucination is retrieval-augmented generation (RAG), but even in such setups, LLMs may still hallucinate when presented with questions outside of the knowledge base. Such behavior is unacceptable in high-stake applications where LLMs are expected to abstain from answering queries it does not have sufficient context on. In this work, we present a novel methodology for systematically evaluating out-of-knowledge base (OOKB) robustness of LLMs (whether LLMs know or do not know) in the RAG setting, without the need for manual annotation of gold standard answers. We implement our methodology in knowornot, an open-source library that enables users to develop their own customized evaluation data and pipelines for OOKB robustness. knowornot comprises four main features. Firstly, it provides a unified, high-level API that streamlines the process of setting up and running robustness benchmarks. Secondly, its modular architecture emphasizes extensibility and flexibility, allowing users to easily integrate their own LLM clients and RAG settings. Thirdly, its rigorous data modeling design ensures experiment reproducibility, reliability and traceability. Lastly, it implements a comprehensive suite of tools for users to customize their pipelines. We demonstrate the utility of knowornot by developing a challenging benchmark, PolicyBench, which spans four Question-Answer (QA) chatbots on government policies, and analyze its OOKB robustness. The source code of knowornot is available https://github.com/govtech-responsibleai/KnowOrNot.


Beyond Easy Wins: A Text Hardness-Aware Benchmark for LLM-generated Text Detection

arXiv.org Artificial Intelligence

We present a novel evaluation paradigm for AI text detectors that prioritizes real-world and equitable assessment. Current approaches predominantly report conventional metrics like AUROC, overlooking that even modest false positive rates constitute a critical impediment to practical deployment of detection systems. Furthermore, real-world deployment necessitates predetermined threshold configuration, making detector stability (i.e. the maintenance of consistent performance across diverse domains and adversarial scenarios), a critical factor. These aspects have been largely ignored in previous research and benchmarks. Our benchmark, SHIELD, addresses these limitations by integrating both reliability and stability factors into a unified evaluation metric designed for practical assessment. Furthermore, we develop a post-hoc, model-agnostic humanification framework that modifies AI text to more closely resemble human authorship, incorporating a controllable hardness parameter. This hardness-aware approach effectively challenges current SOTA zero-shot detection methods in maintaining both reliability and stability. (Data and code: https://github.com/navid-aub/SHIELD-Benchmark)


PromptArmor: Simple yet Effective Prompt Injection Defenses

arXiv.org Artificial Intelligence

Despite their potential, recent research has demonstrated that LLM agents are vulnerable to prompt injection attacks, where malicious prompts are injected into the agent's input, causing it to perform an attacker-specified task rather than the intended task provided by the user. In this paper, we present PromptArmor, a simple yet effective defense against prompt injection attacks. Specifically, PromptArmor prompts an off-the-shelf LLM to detect and remove potential injected prompts from the input before the agent processes it. Our results show that PromptArmor can accurately identify and remove injected prompts. For example, using GPT-4o, GPT-4.1, or o4-mini, PromptArmor achieves both a false positive rate and a false negative rate below 1% on the AgentDojo benchmark. Moreover, after removing injected prompts with PromptArmor, the attack success rate drops to below 1%. We also demonstrate PromptArmor's effectiveness against adaptive attacks and explore different strategies for prompting an LLM. We recommend that PromptArmor be adopted as a standard baseline for evaluating new defenses against prompt injection attacks.


Design of an Edge-based Portable EHR System for Anemia Screening in Remote Health Applications

arXiv.org Artificial Intelligence

The design of medical systems for remote, resource-limited environments faces persistent challenges due to poor interoperability, lack of offline support, and dependency on costly infrastructure. Many existing digital health solutions neglect these constraints, limiting their effectiveness for frontline health workers in underserved regions. This paper presents a portable, edge-enabled Electronic Health Record platform optimized for offline-first operation, secure patient data management, and modular diagnostic integration. Running on small-form factor embedded devices, it provides AES-256 encrypted local storage with optional cloud synchronization for interoperability. As a use case, we integrated a non-invasive anemia screening module leveraging fingernail pallor analysis. Trained on 250 patient cases (27\% anemia prevalence) with KDE-balanced data, the Random Forest model achieved a test RMSE of 1.969 g/dL and MAE of 1.490 g/dL. A severity-based model reached 79.2\% sensitivity. To optimize performance, a YOLOv8n-based nail bed detector was quantized to INT8, reducing inference latency from 46.96 ms to 21.50 ms while maintaining mAP@0.5 at 0.995. The system emphasizes low-cost deployment, modularity, and data privacy compliance (HIPAA/GDPR), addressing critical barriers to digital health adoption in disconnected settings. Our work demonstrates a scalable approach to enhance portable health information systems and support frontline healthcare in underserved regions.


Quantum Machine Learning for Secure Cooperative Multi-Layer Edge AI with Proportional Fairness

arXiv.org Artificial Intelligence

Abstract--This paper proposes a communication-efficient, event-tri ggered inference framework for cooperative edge AI systems comprising multiple user devices and edge servers. Buildin g upon dual-threshold early-exit strategies for rare-even t detection, the proposed approach extends classical single-device infere nce to a distributed, multi-device setting while incorpora ting proportional fairness constraints across users. A joint optimization fr amework is formulated to maximize classification utility un der communication, energy, and fairness constraints. T o solve the resulting pr oblem efficiently, we exploit the monotonicity of the utilit y function with respect to the confidence thresholds and apply alternating optimiza tion with Benders decomposition. Experimental results sho w that the proposed framework significantly enhances system-wide per formance and fairness in resource allocation compared to si ngle-device baselines.