Goto

Collaborating Authors

 Bremen



Imputation Uncertainty in Interpretable Machine Learning Methods

Golchian, Pegah, Wright, Marvin N.

arXiv.org Machine Learning

In real data, missing values occur frequently, which affects the interpretation with interpretable machine learning (IML) methods. Recent work considers bias and shows that model explanations may differ between imputation methods, while ignoring additional imputation uncertainty and its influence on variance and confidence intervals. We therefore compare the effects of different imputation methods on the confidence interval coverage probabilities of the IML methods permutation feature importance, partial dependence plots and Shapley values. We show that single imputation leads to underestimation of variance and that, in most cases, only multiple imputation is close to nominal coverage.


Towards Open-World Human Action Segmentation Using Graph Convolutional Networks

Xing, Hao, Boey, Kai Zhe, Cheng, Gordon

arXiv.org Artificial Intelligence

Human-object interaction segmentation is a fundamental task of daily activity understanding, which plays a crucial role in applications such as assistive robotics, healthcare, and autonomous systems. Most existing learning-based methods excel in closed-world action segmentation, they struggle to generalize to open-world scenarios where novel actions emerge. Collecting exhaustive action categories for training is impractical due to the dynamic diversity of human activities, necessitating models that detect and segment out-of-distribution actions without manual annotation. To address this issue, we formally define the open-world action segmentation problem and propose a structured framework for detecting and segmenting unseen actions. Our framework introduces three key innovations: 1) an Enhanced Pyramid Graph Convolutional Network (EPGCN) with a novel decoder module for robust spatiotemporal feature upsampling. 2) Mixup-based training to synthesize out-of-distribution data, eliminating reliance on manual annotations. 3) A novel Temporal Clustering loss that groups in-distribution actions while distancing out-of-distribution samples. We evaluate our framework on two challenging human-object interaction recognition datasets: Bimanual Actions and 2 Hands and Object (H2O) datasets. Experimental results demonstrate significant improvements over state-of-the-art action segmentation models across multiple open-set evaluation metrics, achieving 16.9% and 34.6% relative gains in open-set segmentation (F1@50) and out-of-distribution detection performances (AUROC), respectively. Additionally, we conduct an in-depth ablation study to assess the impact of each proposed component, identifying the optimal framework configuration for open-world action segmentation.


Multi-Modal Graph Convolutional Network with Sinusoidal Encoding for Robust Human Action Segmentation

Xing, Hao, Boey, Kai Zhe, Wu, Yuankai, Burschka, Darius, Cheng, Gordon

arXiv.org Artificial Intelligence

Abstract-- Accurate temporal segmentation of human actions is critical for intelligent robots in collaborative settin gs, where a precise understanding of sub-activity labels and their tem poral structure is essential. However, the inherent noise in both human pose estimation and object detection often leads to over-segmentation errors, disrupting the coherence of act ion sequences. T o address this, we propose a Multi-Modal Graph Convolutional Network (MMGCN) that integrates low-frame-rate (e.g., 1 fps) visual data with high-frame-rate (e.g., 3 0 fps) motion data (skeleton and object detections) to mitiga te fragmentation. Our framework introduces three key contributions. First, a sinusoidal encoding strategy that maps 3D skeleton coordinates into a continuous sin-cos space to enh ance spatial representation robustness. Second, a temporal gra ph fusion module that aligns multi-modal inputs with differin g resolutions via hierarchical feature aggregation, Third, inspired by the smooth transitions inherent to human actions, we desi gn SmoothLabelMix, a data augmentation technique that mixes i n-put sequences and labels to generate synthetic training exa mples with gradual action transitions, enhancing temporal consi stency in predictions and reducing over-segmentation artifacts. Extensive experiments on the Bimanual Actions Dataset, a public benchmark for human-object interaction understand ing, demonstrate that our approach outperforms state-of-the-a rt methods, especially in action segmentation accuracy, achi eving F1@10: 94.5% and F1@25: 92.8%. I. INTRODUCTION Human action segmentation, the task of temporally decomposing continuous activities into coherent sub-action uni ts, is a cornerstone of intelligent robotic systems operating in collaborative environments.


ShadowWolf -- Automatic Labelling, Evaluation and Model Training Optimised for Camera Trap Wildlife Images

Dede, Jens, Förster, Anna

arXiv.org Artificial Intelligence

The continuous growth of the global human population is leading to the expansion of human habitats, resulting in decreasing wildlife spaces and increasing human-wildlife interactions. These interactions can range from minor disturbances, such as raccoons in urban waste bins, to more severe consequences, including species extinction. As a result, the monitoring of wildlife is gaining significance in various contexts. Artificial intelligence (AI) offers a solution by automating the recognition of animals in images and videos, thereby reducing the manual effort required for wildlife monitoring. Traditional AI training involves three main stages: image collection, labelling, and model training. However, the variability, for example, in the landscape (e.g., mountains, open fields, forests), weather (e.g., rain, fog, sunshine), lighting (e.g., day, night), and camera-animal distances presents significant challenges to model robustness and adaptability in real-world scenarios. In this work, we propose a unified framework, called ShadowWolf, designed to address these challenges by integrating and optimizing the stages of AI model training and evaluation. The proposed framework enables dynamic model retraining to adjust to changes in environmental conditions and application requirements, thereby reducing labelling efforts and allowing for on-site model adaptation. This adaptive and unified approach enhances the accuracy and efficiency of wildlife monitoring systems, promoting more effective and scalable conservation efforts.


NOCTIS: Novel Object Cyclic Threshold based Instance Segmentation

Gandyra, Max, Santonicola, Alessandro, Beetz, Michael

arXiv.org Artificial Intelligence

Instance segmentation of novel objects instances in RGB images, given some example images for each object, is a well known problem in computer vision. Designing a model general enough to be employed for all kinds of novel objects without (re-) training has proven to be a difficult task. T o handle this, we present a new training-free framework, called: Novel Object Cyclic Threshold based Instance Segmentation (NOCTIS). NOCTIS integrates two pre-trained models: Grounded-SAM 2 for object proposals with precise bounding boxes and corresponding segmentation masks; and DINOv2 for robust class and patch embeddings, due to its zero-shot capabilities. Internally, the proposal-object matching is realized by determining an object matching score based on the similarity of the class embeddings and the average maximum similarity of the patch embeddings with a new cyclic thresholding (CT) mechanism that mitigates unstable matches caused by repetitive textures or visually similar patterns. Beyond CT, NOCTIS introduces: (i) an appearance score that is unaffected by object selection bias; (ii) the usage of the average confidence of the proposals' bounding box and mask as a scoring component; and (iii) an RGB-only pipeline that performs even better than RGB-D ones. W e empirically show that NOCTIS, without further training/fine tuning, outperforms the best RGB and RGB-D methods regarding the mean AP score on the seven core datasets of the BOP 2023 challenge for the "Model-based 2D segmentation of unseen objects" task.


Maritime Small Object Detection from UAVs using Deep Learning with Altitude-Aware Dynamic Tiling

Ahmed, Sakib, Pizarro, Oscar

arXiv.org Artificial Intelligence

Unmanned Aerial Vehicles (UAVs) are crucial in Search and Rescue (SAR) missions due to their ability to monitor vast maritime areas. However, small objects often remain difficult to detect from high altitudes due to low object-to-background pixel ratios. We propose an altitude-aware dynamic tiling method that scales and adaptively subdivides the image into tiles for enhanced small object detection. By integrating altitude-dependent scaling with an adaptive tiling factor, we reduce unnecessary computation while maintaining detection performance. Tested on the SeaDronesSee dataset [1] with YOLOv5 [2] and Slicing Aided Hyper Inference (SAHI) framework [3], our approach improves Mean Average Precision (mAP) for small objects by 38% compared to a baseline and achieves more than double the inference speed compared to static tiling. This approach enables more efficient and accurate UAV-based SAR operations under diverse conditions.


Beyond Human Judgment: A Bayesian Evaluation of LLMs' Moral Values Understanding

Skorski, Maciej, Landowska, Alina

arXiv.org Artificial Intelligence

How do Large Language Models understand moral dimensions compared to humans? This first large-scale Bayesian evaluation of market-leading language models provides the answer. In contrast to prior work using deterministic ground truth (majority or inclusion rules), we model annotator disagreements to capture both aleatoric uncertainty (inherent human disagreement) and epistemic uncertainty (model domain sensitivity). We evaluated the best language models (Claude Sonnet 4, DeepSeek-V3, Llama 4 Maverick) across 250K+ annotations from nearly 700 annotators in 100K+ texts spanning social networks, news and forums. Our GPU-optimized Bayesian framework processed 1M+ model queries, revealing that AI models typically rank among the top 25\% of human annotators, performing much better than average balanced accuracy. Importantly, we find that AI produces far fewer false negatives than humans, highlighting their more sensitive moral detection capabilities.


gfnx: Fast and Scalable Library for Generative Flow Networks in JAX

Tiapkin, Daniil, Agarkov, Artem, Morozov, Nikita, Maksimov, Ian, Tsyganov, Askar, Gritsaev, Timofei, Samsonov, Sergey

arXiv.org Artificial Intelligence

In this paper, we present gfnx, a fast and scalable package for training and evaluating Generative Flow Networks (GFlowNets) written in JAX. gfnx provides an extensive set of environments and metrics for benchmarking, accompanied with single-file implementations of core objectives for training GFlowNets. We include synthetic hypergrids, multiple sequence generation environments with various editing regimes and particular reward designs for molecular generation, phylogenetic tree construction, Bayesian structure learning, and sampling from the Ising model energy. Across different tasks, gfnx achieves significant wall-clock speedups compared to Pytorch-based benchmarks (such as torchgfn library) and author implementations. For example, gfnx achieves up to 55 times speedup on CPU-based sequence generation environments, and up to 80 times speedup with the GPU-based Bayesian network structure learning setup. Our package provides a diverse set of benchmarks and aims to standardize empirical evaluation and accelerate research and applications of GFlowNets. The library is available on GitHub (https://github.com/d-tiapkin/gfnx) and on pypi (https://pypi.org/project/gfnx/). Documentation is available on https://gfnx.readthedocs.io.


InEKFormer: A Hybrid State Estimator for Humanoid Robots

Hohmeyer, Lasse, Popescu, Mihaela, Bergonzani, Ivan, Mronga, Dennis, Kirchner, Frank

arXiv.org Artificial Intelligence

Humanoid robots have great potential for a wide range of applications, including industrial and domestic use, healthcare, and search and rescue missions. However, bipedal locomotion in different environments is still a challenge when it comes to performing stable and dynamic movements. This is where state estimation plays a crucial role, providing fast and accurate feedback of the robot's floating base state to the motion controller. Although classical state estimation methods such as Kalman filters are widely used in robotics, they require expert knowledge to fine-tune the noise parameters. Due to recent advances in the field of machine learning, deep learning methods are increasingly used for state estimation tasks. In this work, we propose the InEKFormer, a novel hybrid state estimation method that incorporates an invariant extended Kalman filter (InEKF) and a Transformer network. We compare our method with the InEKF and the KalmanNet approaches on datasets obtained from the humanoid robot RH5. The results indicate the potential of Transformers in humanoid state estimation, but also highlight the need for robust autoregressive training in these high-dimensional problems.