Fookes, Clinton
Mining-Gym: A Configurable RL Benchmarking Environment for Truck Dispatch Scheduling
Banerjee, Chayan, Nguyen, Kien, Fookes, Clinton
--Mining process optimization, particularly truck dispatch scheduling, is a critical factor in enhancing the efficiency of open-pit mining operations. However, the dynamic and stochastic nature of mining environments--characterized by uncertainties such as equipment failures, truck maintenance, and variable haul cycle times--poses significant challenges for traditional optimization methods. While Reinforcement Learning (RL) has demonstrated promise in adaptive decision-making for mining logistics, its practical deployment requires rigorous evaluation in realistic and customizable simulation environments. T o address this challenge, we introduce Mining-Gym, a configurable, open-source benchmarking environment designed for training, testing, and comparing RL algorithms in mining process optimization. Built on Discrete Event Simulation (DES) and seamlessly integrated with the OpenAI Gym interface, Mining-Gym offers a structured testbed that enables the direct application of advanced RL algorithms from Stable Baselines. The framework models key mining-specific uncertainties, such as equipment failures, queue congestion, and stochasticity of mining processes, ensuring a realistic and adaptive learning environment. Additionally, a graphic user interface (GUI) for easy parameter selection for mine-site configuration, comprehensive data logging system, a built-in KPI dashboard and real-time representative visualization of mine-site enables in-depth performance analysis, facilitating standardized, reproducible evaluation across multiple RL strategies and baseline heuristics. INING process optimization aims to enhance efficiency and productivity by improving resource allocation, equipment scheduling, and material handling. However, these operations are highly complex, influenced by dynamic factors such as equipment failures, fluctuating ore quality, and unpredictable environmental conditions. Traditional optimization methods, such as linear programming and heuristics, struggle to adapt in real time, leading to inefficiencies and increased costs.
HOTFormerLoc: Hierarchical Octree Transformer for Versatile Lidar Place Recognition Across Ground and Aerial Views
Griffiths, Ethan, Haghighat, Maryam, Denman, Simon, Fookes, Clinton, Ramezani, Milad
We present HOTFormerLoc, a novel and versatile Hierarchical Octree-based TransFormer, for large-scale 3D place recognition in both ground-to-ground and ground-to-aerial scenarios across urban and forest environments. We propose an octree-based multi-scale attention mechanism that captures spatial and semantic features across granularities. To address the variable density of point distributions from spinning lidar, we present cylindrical octree attention windows to reflect the underlying distribution during attention. We introduce relay tokens to enable efficient global-local interactions and multi-scale representation learning at reduced computational cost. Our pyramid attentional pooling then synthesises a robust global descriptor for end-to-end place recognition in challenging environments. In addition, we introduce CS-Wild-Places, a novel 3D cross-source dataset featuring point cloud data from aerial and ground lidar scans captured in dense forests. Point clouds in CS-Wild-Places contain representational gaps and distinctive attributes such as varying point densities and noise patterns, making it a challenging benchmark for cross-view localisation in the wild. HOTFormerLoc achieves a top-1 average recall improvement of 5.5% - 11.5% on the CS-Wild-Places benchmark. Furthermore, it consistently outperforms SOTA 3D place recognition methods, with an average performance gain of 4.9% on well-established urban and forest datasets. The code and CS-Wild-Places benchmark is available at https://csiro-robotics.github.io/HOTFormerLoc.
Face Deepfakes - A Comprehensive Review
Fernando, Tharindu, Priyasad, Darshana, Sridharan, Sridha, Ross, Arun, Fookes, Clinton
In recent years, remarkable advancements in deep- fake generation technology have led to unprecedented leaps in its realism and capabilities. Despite these advances, we observe a notable lack of structured and deep analysis deepfake technology. The principal aim of this survey is to contribute a thorough theoretical analysis of state-of-the-art face deepfake generation and detection methods. Furthermore, we provide a coherent and systematic evaluation of the implications of deepfakes on face biometric recognition approaches. In addition, we outline key applications of face deepfake technology, elucidating both positive and negative applications of the technology, provide a detailed discussion regarding the gaps in existing research, and propose key research directions for further investigation.
An Adversarial Approach to Register Extreme Resolution Tissue Cleared 3D Brain Images
Naziba, Abdullah, Fookes, Clinton, Perrin, Dimitri
We developed a generative patch based 3D image registration model that can register very high resolution images obtained from a biochemical process name tissue clearing. Tissue clearing process removes lipids and fats from the tissue and make the tissue transparent. When cleared tissues are imaged with Light-sheet fluorescent microscopy, the resulting images give a clear window to the cellular activities and dynamics inside the tissue.Thus the images obtained are very rich with cellular information and hence their resolution is extremely high (eg .2560x2160x676). Analyzing images with such high resolution is a difficult task for any image analysis pipeline.Image registration is a common step in image analysis pipeline when comparison between images are required. Traditional image registration methods fail to register images with such extant. In this paper we addressed this very high resolution image registration issue by proposing a patch-based generative network named InvGAN. Our proposed network can register very high resolution tissue cleared images. The tissue cleared dataset used in this paper are obtained from a tissue clearing protocol named CUBIC. We compared our method both with traditional and deep-learning based registration methods.Two different versions of CUBIC dataset are used, representing two different resolutions 25% and 100% respectively. Experiments on two different resolutions clearly show the impact of resolution on the registration quality. At 25% resolution, our method achieves comparable registration accuracy with very short time (7 minutes approximately). At 100% resolution, most of the traditional registration methods fail except Elastix registration tool.Elastix takes 28 hours to register where proposed InvGAN takes only 10 minutes.
Radar Signal Recognition through Self-Supervised Learning and Domain Adaptation
Huang, Zi, Denman, Simon, Pemasiri, Akila, Fookes, Clinton, Martin, Terrence
Automatic radar signal recognition (RSR) plays a pivotal role in electronic warfare (EW), as accurately classifying radar signals is critical for informing decision-making processes. Recent advances in deep learning have shown significant potential in improving RSR performance in domains with ample annotated data. However, these methods fall short in EW scenarios where annotated RF data are scarce or impractical to obtain. To address these challenges, we introduce a self-supervised learning (SSL) method which utilises masked signal modelling and RF domain adaption to enhance RSR performance in environments with limited RF samples and labels. Specifically, we investigate pre-training masked autoencoders (MAE) on baseband in-phase and quadrature (I/Q) signals from various RF domains and subsequently transfer the learned representation to the radar domain, where annotated data are limited. Empirical results show that our lightweight self-supervised ResNet model with domain adaptation achieves up to a 17.5% improvement in 1-shot classification accuracy when pre-trained on in-domain signals (i.e., radar signals) and up to a 16.31% improvement when pre-trained on out-of-domain signals (i.e., comm signals), compared to its baseline without SSL. We also provide reference results for several MAE designs and pre-training strategies, establishing a new benchmark for few-shot radar signal classification.
Damage Assessment after Natural Disasters with UAVs: Semantic Feature Extraction using Deep Learning
Hewawiththi, Nethmi S., Viduranga, M. Mahesha, Warnasooriya, Vanodhya G., Fernando, Tharindu, Suraweera, Himal A., Sridharan, Sridha, Fookes, Clinton
Unmanned aerial vehicle-assisted disaster recovery missions have been promoted recently due to their reliability and flexibility. Machine learning algorithms running onboard significantly enhance the utility of UAVs by enabling real-time data processing and efficient decision-making, despite being in a resource-constrained environment. However, the limited bandwidth and intermittent connectivity make transmitting the outputs to ground stations challenging. This paper proposes a novel semantic extractor that can be adopted into any machine learning downstream task for identifying the critical data required for decision-making. The semantic extractor can be executed onboard which results in a reduction of data that needs to be transmitted to ground stations. We test the proposed architecture together with the semantic extractor on two publicly available datasets, FloodNet and RescueNet, for two downstream tasks: visual question answering and disaster damage level classification. Our experimental results demonstrate the proposed method maintains high accuracy across different downstream tasks while significantly reducing the volume of transmitted data, highlighting the effectiveness of our semantic extractor in capturing task-specific salient information.
Physics Augmented Tuple Transformer for Autism Severity Level Detection
Ranasingha, Chinthaka, Gammulle, Harshala, Fernando, Tharindu, Sridharan, Sridha, Fookes, Clinton
Early diagnosis of Autism Spectrum Disorder (ASD) is an effective and favorable step towards enhancing the health and well-being of children with ASD. Manual ASD diagnosis testing is labor-intensive, complex, and prone to human error due to several factors contaminating the results. This paper proposes a novel framework that exploits the laws of physics for ASD severity recognition. The proposed physics-informed neural network architecture encodes the behaviour of the subject extracted by observing a part of the skeleton-based motion trajectory in a higher dimensional latent space. Two decoders, namely physics-based and non-physics-based decoder, use this latent embedding and predict the future motion patterns. The physics branch leverages the laws of physics that apply to a skeleton sequence in the prediction process while the non-physics-based branch is optimised to minimise the difference between the predicted and actual motion of the subject. A classifier also leverages the same latent space embeddings to recognise the ASD severity. This dual generative objective explicitly forces the network to compare the actual behaviour of the subject with the general normal behaviour of children that are governed by the laws of physics, aiding the ASD recognition task. The proposed method attains state-of-the-art performance on multiple ASD diagnosis benchmarks. To illustrate the utility of the proposed framework beyond the task ASD diagnosis, we conduct a third experiment using a publicly available benchmark for the task of fall prediction and demonstrate the superiority of our model.
Online 6DoF Pose Estimation in Forests using Cross-View Factor Graph Optimisation and Deep Learned Re-localisation
de Lima, Lucas Carvalho, Griffiths, Ethan, Haghighat, Maryam, Denman, Simon, Fookes, Clinton, Borges, Paulo, Brünig, Michael, Ramezani, Milad
Abstract-- This paper presents a novel approach for robust global localisation and 6DoF pose estimation of ground robots in forest environments by leveraging cross-view factor graph optimisation and deep-learned re-localisation. The proposed method addresses the challenges of aligning aerial and ground data for pose estimation, which is crucial for accurate pointto-point navigation in GPS-denied environments. By integrating information from both perspectives into a factor graph framework, our approach effectively estimates the robot's global position and orientation. Experimental results show that our proposed localisation system can achieve drift-free localisation with bounded positioning errors, ensuring reliable and safe robot navigation under canopies. Reliable geo-localisation in forest environments is crucial for executing various robotics tasks ranging from forest inventory and monitoring to search and rescue missions.
Enhancing Semantic Segmentation with Adaptive Focal Loss: A Novel Approach
Islam, Md Rakibul, Hassan, Riad, Nazib, Abdullah, Nguyen, Kien, Fookes, Clinton, Islam, Md Zahidul
Deep learning has achieved outstanding accuracy in medical image segmentation, particularly for objects like organs or tumors with smooth boundaries or large sizes. Whereas, it encounters significant difficulties with objects that have zigzag boundaries or are small in size, leading to a notable decrease in segmentation effectiveness. In this context, using a loss function that incorporates smoothness and volume information into a model's predictions offers a promising solution to these shortcomings. In this work, we introduce an Adaptive Focal Loss (A-FL) function designed to mitigate class imbalance by down-weighting the loss for easy examples that results in up-weighting the loss for hard examples and giving greater emphasis to challenging examples, such as small and irregularly shaped objects. The proposed A-FL involves dynamically adjusting a focusing parameter based on an object's surface smoothness, size information, and adjusting the class balancing parameter based on the ratio of targeted area to total area in an image. We evaluated the performance of the A-FL using ResNet50-encoded U-Net architecture on the Picai 2022 and BraTS 2018 datasets. On the Picai 2022 dataset, the A-FL achieved an Intersection over Union (IoU) of 0.696 and a Dice Similarity Coefficient (DSC) of 0.769, outperforming the regular Focal Loss (FL) by 5.5% and 5.4% respectively. It also surpassed the best baseline Dice-Focal by 2.0% and 1.2%. On the BraTS 2018 dataset, A-FL achieved an IoU of 0.883 and a DSC of 0.931. The comparative studies show that the proposed A-FL function surpasses conventional methods, including Dice Loss, Focal Loss, and their hybrid variants, in IoU, DSC, Sensitivity, and Specificity metrics. This work highlights A-FL's potential to improve deep learning models for segmenting clinically significant regions in medical images, leading to more precise and reliable diagnostic tools.
Part-based Quantitative Analysis for Heatmaps
Tursun, Osman, Kalkan, Sinan, Denman, Simon, Sridharan, Sridha, Fookes, Clinton
Heatmaps have been instrumental in helping understand deep network decisions, and are a common approach for Explainable AI (XAI). While significant progress has been made in enhancing the informativeness and accessibility of heatmaps, heatmap analysis is typically very subjective and limited to domain experts. As such, developing automatic, scalable, and numerical analysis methods to make heatmap-based XAI more objective, end-user friendly, and cost-effective is vital. In addition, there is a need for comprehensive evaluation metrics to assess heatmap quality at a granular level.