Denman, Simon
HOTFormerLoc: Hierarchical Octree Transformer for Versatile Lidar Place Recognition Across Ground and Aerial Views
Griffiths, Ethan, Haghighat, Maryam, Denman, Simon, Fookes, Clinton, Ramezani, Milad
We present HOTFormerLoc, a novel and versatile Hierarchical Octree-based TransFormer, for large-scale 3D place recognition in both ground-to-ground and ground-to-aerial scenarios across urban and forest environments. We propose an octree-based multi-scale attention mechanism that captures spatial and semantic features across granularities. To address the variable density of point distributions from spinning lidar, we present cylindrical octree attention windows to reflect the underlying distribution during attention. We introduce relay tokens to enable efficient global-local interactions and multi-scale representation learning at reduced computational cost. Our pyramid attentional pooling then synthesises a robust global descriptor for end-to-end place recognition in challenging environments. In addition, we introduce CS-Wild-Places, a novel 3D cross-source dataset featuring point cloud data from aerial and ground lidar scans captured in dense forests. Point clouds in CS-Wild-Places contain representational gaps and distinctive attributes such as varying point densities and noise patterns, making it a challenging benchmark for cross-view localisation in the wild. HOTFormerLoc achieves a top-1 average recall improvement of 5.5% - 11.5% on the CS-Wild-Places benchmark. Furthermore, it consistently outperforms SOTA 3D place recognition methods, with an average performance gain of 4.9% on well-established urban and forest datasets. The code and CS-Wild-Places benchmark is available at https://csiro-robotics.github.io/HOTFormerLoc.
Radar Signal Recognition through Self-Supervised Learning and Domain Adaptation
Huang, Zi, Denman, Simon, Pemasiri, Akila, Fookes, Clinton, Martin, Terrence
Automatic radar signal recognition (RSR) plays a pivotal role in electronic warfare (EW), as accurately classifying radar signals is critical for informing decision-making processes. Recent advances in deep learning have shown significant potential in improving RSR performance in domains with ample annotated data. However, these methods fall short in EW scenarios where annotated RF data are scarce or impractical to obtain. To address these challenges, we introduce a self-supervised learning (SSL) method which utilises masked signal modelling and RF domain adaption to enhance RSR performance in environments with limited RF samples and labels. Specifically, we investigate pre-training masked autoencoders (MAE) on baseband in-phase and quadrature (I/Q) signals from various RF domains and subsequently transfer the learned representation to the radar domain, where annotated data are limited. Empirical results show that our lightweight self-supervised ResNet model with domain adaptation achieves up to a 17.5% improvement in 1-shot classification accuracy when pre-trained on in-domain signals (i.e., radar signals) and up to a 16.31% improvement when pre-trained on out-of-domain signals (i.e., comm signals), compared to its baseline without SSL. We also provide reference results for several MAE designs and pre-training strategies, establishing a new benchmark for few-shot radar signal classification.
Online 6DoF Pose Estimation in Forests using Cross-View Factor Graph Optimisation and Deep Learned Re-localisation
de Lima, Lucas Carvalho, Griffiths, Ethan, Haghighat, Maryam, Denman, Simon, Fookes, Clinton, Borges, Paulo, Brรผnig, Michael, Ramezani, Milad
Abstract-- This paper presents a novel approach for robust global localisation and 6DoF pose estimation of ground robots in forest environments by leveraging cross-view factor graph optimisation and deep-learned re-localisation. The proposed method addresses the challenges of aligning aerial and ground data for pose estimation, which is crucial for accurate pointto-point navigation in GPS-denied environments. By integrating information from both perspectives into a factor graph framework, our approach effectively estimates the robot's global position and orientation. Experimental results show that our proposed localisation system can achieve drift-free localisation with bounded positioning errors, ensuring reliable and safe robot navigation under canopies. Reliable geo-localisation in forest environments is crucial for executing various robotics tasks ranging from forest inventory and monitoring to search and rescue missions.
Part-based Quantitative Analysis for Heatmaps
Tursun, Osman, Kalkan, Sinan, Denman, Simon, Sridharan, Sridha, Fookes, Clinton
Heatmaps have been instrumental in helping understand deep network decisions, and are a common approach for Explainable AI (XAI). While significant progress has been made in enhancing the informativeness and accessibility of heatmaps, heatmap analysis is typically very subjective and limited to domain experts. As such, developing automatic, scalable, and numerical analysis methods to make heatmap-based XAI more objective, end-user friendly, and cost-effective is vital. In addition, there is a need for comprehensive evaluation metrics to assess heatmap quality at a granular level.
Multi-stage Learning for Radar Pulse Activity Segmentation
Huang, Zi, Pemasiri, Akila, Denman, Simon, Fookes, Clinton, Martin, Terrence
Radio signal recognition is a crucial function in electronic warfare. Precise identification and localisation of radar pulse activities are required by electronic warfare systems to produce effective countermeasures. Despite the importance of these tasks, deep learning-based radar pulse activity recognition methods have remained largely underexplored. While deep learning for radar modulation recognition has been explored previously, classification tasks are generally limited to short and non-interleaved IQ signals, limiting their applicability to military applications. To address this gap, we introduce an end-to-end multi-stage learning approach to detect and localise pulse activities of interleaved radar signals across an extended time horizon. We propose a simple, yet highly effective multi-stage architecture for incrementally predicting fine-grained segmentation masks that localise radar pulse activities across multiple channels. We demonstrate the performance of our approach against several reference models on a novel radar dataset, while also providing a first-of-its-kind benchmark for radar pulse activity segmentation.
Multi-task Learning for Radar Signal Characterisation
Huang, Zi, Pemasiri, Akila, Denman, Simon, Fookes, Clinton, Martin, Terrence
Radio signal recognition is a crucial task in both civilian and The application of convolutional neural networks (CNNs) military applications, as accurate and timely identification of to automatic modulation classification (AMC) was introduced unknown signals is an essential part of spectrum management by [8]. Their early works [9, 10] together with the release and electronic warfare. The majority of research in this field of several public datasets [11] initiated a wave of interest in has focused on applying deep learning for modulation classification, DL-based RSR. Recently, several alternative DL approaches leaving the task of signal characterisation as an understudied that adopt recurrent neural networks (RNNs) and hybrid architectures area. This paper addresses this gap by presenting [12] were able to consistently achieve above 90% an approach for tackling radar signal classification and characterisation modulation classification accuracy in relatively high signalto-noise as a multi-task learning (MTL) problem. We propose ratio (SNR) settings. Despite the success of DNNs, the IQ Signal Transformer (IQST) among several reference many recent approaches still rely on handcrafted features to architectures that allow for simultaneous optimisation of pre-process the complex-valued, in-phase and quadrature (IQ) multiple regression and classification tasks. We demonstrate data into image-based representations, such as spectrograms the performance of our proposed MTL model on a synthetic [12], prior to training. These approaches effectively transform radar dataset, while also providing a first-of-its-kind benchmark RSR into an image classification problem, and thus limits the for radar signal characterisation.
Towards Self-Explainability of Deep Neural Networks with Heatmap Captioning and Large-Language Models
Tursun, Osman, Denman, Simon, Sridharan, Sridha, Fookes, Clinton
Heatmaps are widely used to interpret deep neural networks, particularly for computer vision tasks, and the heatmap-based explainable AI (XAI) techniques are a well-researched topic. However, most studies concentrate on enhancing the quality of the generated heatmap or discovering alternate heatmap generation techniques, and little effort has been devoted to making heatmap-based XAI automatic, interactive, scalable, and accessible. To address this gap, we propose a framework that includes two modules: (1) context modelling and (2) reasoning. We proposed a template-based image captioning approach for context modelling to create text-based contextual information from the heatmap and input data. The reasoning module leverages a large language model to provide explanations in combination with specialised knowledge. Our qualitative experiments demonstrate the effectiveness of our framework and heatmap captioning approach. The code for the proposed template-based heatmap captioning approach will be publicly available.
Continuous Human Action Recognition for Human-Machine Interaction: A Review
Gammulle, Harshala, Ahmedt-Aristizabal, David, Denman, Simon, Tychsen-Smith, Lachlan, Petersson, Lars, Fookes, Clinton
With advances in data-driven machine learning research, a wide variety of prediction models have been proposed to capture spatio-temporal features for the analysis of video streams. Recognising actions and detecting action transitions within an input video are challenging but necessary tasks for applications that require real-time human-machine interaction. By reviewing a large body of recent related work in the literature, we thoroughly analyse, explain and compare action segmentation methods and provide details on the feature extraction and learning strategies that are used on most state-of-the-art methods. We cover the impact of the performance of object detection and tracking techniques on human action segmentation methodologies. We investigate the application of such models to real-world scenarios and discuss several limitations and key research directions towards improving interpretability, generalisation, optimisation and deployment.
In-Bed Human Pose Estimation from Unseen and Privacy-Preserving Image Domains
Cao, Ting, Armin, Mohammad Ali, Denman, Simon, Petersson, Lars, Ahmedt-Aristizabal, David
Medical applications have benefited greatly from the rapid advancement in computer vision. Considering patient monitoring in particular, in-bed human posture estimation offers important health-related metrics with potential value in medical condition assessments. Despite great progress in this domain, it remains challenging due to substantial ambiguity during occlusions, and the lack of large corpora of manually labeled data for model training, particularly with domains such as thermal infrared imaging which are privacy-preserving, and thus of great interest. Motivated by the effectiveness of self-supervised methods in learning features directly from data, we propose a multi-modal conditional variational autoencoder (MC-VAE) capable of reconstructing features from missing modalities seen during training. This approach is used with HRNet to enable single modality inference for in-bed pose estimation. Through extensive evaluations, we demonstrate that body positions can be effectively recognized from the available modality, achieving on par results with baseline models that are highly dependent on having access to multiple modes at inference time. The proposed framework supports future research towards self-supervised learning that generates a robust model from a single source, and expects it to generalize over many unknown distributions in clinical environments.
A Survey on Graph-Based Deep Learning for Computational Histopathology
Ahmedt-Aristizabal, David, Armin, Mohammad Ali, Denman, Simon, Fookes, Clinton, Petersson, Lars
With the remarkable success of representation learning for prediction problems, we have witnessed a rapid expansion of the use of machine learning and deep learning for the analysis of digital pathology and biopsy image patches. However, learning over patch-wise features using convolutional neural networks limits the ability of the model to capture global contextual information and comprehensively model tissue composition. The phenotypical and topological distribution of constituent histological entities play a critical role in tissue diagnosis. As such, graph data representations and deep learning have attracted significant attention for encoding tissue representations, and capturing intra- and inter- entity level interactions. In this review, we provide a conceptual grounding for graph analytics in digital pathology, including entity-graph construction and graph architectures, and present their current success for tumor localization and classification, tumor invasion and staging, image retrieval, and survival prediction. We provide an overview of these methods in a systematic manner organized by the graph representation of the input image, scale, and organ on which they operate. We also outline the limitations of existing techniques, and suggest potential future research directions in this domain.