Goto

Collaborating Authors

 Pattern Recognition


MoEMba: A Mamba-based Mixture of Experts for High-Density EMG-based Hand Gesture Recognition

arXiv.org Artificial Intelligence

MoEMba: A Mamba-based Mixture of Experts for High-Density EMG-based Hand Gesture Recognition Mehran Shabanpour, Kasra Rad, Sadaf Khademi, and Arash Mohammadi Abstract -- High-Density surface Electromyography (HD-sEMG) has emerged as a pivotal resource for Human-Computer Interaction (HCI), offering direct insights into muscle activities and motion intentions. However, a significant challenge in practical implementations of HD-sEMG-based models is the low accuracy of inter-session and inter-subject classification. V ariability between sessions can reach up to 40% due to the inherent temporal variability of HD-sEMG signals. T argeting this challenge, the paper introduces the MoEMba framework, a novel approach leveraging Selective State-Space Models (SSMs) to enhance HD-sEMG-based gesture recognition. Furthermore, wavelet feature modulation is integrated to capture multi-scale temporal and spatial relations, improving signal representation. Experimental results on the CapgMyo HD-sEMG dataset demonstrate that MoEMba achieves a balanced accuracy of 56 .9% The proposed framework's robustness to session-to-session variability and its efficient handling of high-dimensional multivariate time series data highlight its potential for advancing HD-sEMG-powered HCI systems.


Performance Evaluation of Image Enhancement Techniques on Transfer Learning for Touchless Fingerprint Recognition

arXiv.org Artificial Intelligence

Fingerprint recognition remains one of the most reliable biometric technologies due to its high accuracy and uniqueness. Traditional systems rely on contact-based scanners, which are prone to issues such as image degradation from surface contamination and inconsistent user interaction. To address these limitations, contactless fingerprint recognition has emerged as a promising alternative, providing non-intrusive and hygienic authentication. This study evaluates the impact of image enhancement tech-niques on the performance of pre-trained deep learning models using transfer learning for touchless fingerprint recognition. The IIT-Bombay Touchless and Touch-Based Fingerprint Database, containing data from 200 subjects, was employed to test the per-formance of deep learning architectures such as VGG-16, VGG-19, Inception-V3, and ResNet-50. Experimental results reveal that transfer learning methods with fingerprint image enhance-ment (indirect method) significantly outperform those without enhancement (direct method). Specifically, VGG-16 achieved an accuracy of 98% in training and 93% in testing when using the enhanced images, demonstrating superior performance compared to the direct method. This paper provides a detailed comparison of the effectiveness of image enhancement in improving the accuracy of transfer learning models for touchless fingerprint recognition, offering key insights for developing more efficient biometric systems.


Graph Neural Network-Driven Hierarchical Mining for Complex Imbalanced Data

arXiv.org Artificial Intelligence

This study presents a hierarchical mining framework for high-dimensional imbalanced data, leveraging a depth graph model to address the inherent performance limitations of conventional approaches in handling complex, high-dimensional data distributions with imbalanced sample representations. By constructing a structured graph representation of the dataset and integrating graph neural network (GNN) embeddings, the proposed method effectively captures global interdependencies among samples. Furthermore, a hierarchical strategy is employed to enhance the characterization and extraction of minority class feature patterns, thereby facilitating precise and robust imbalanced data mining. Empirical evaluations across multiple experimental scenarios validate the efficacy of the proposed approach, demonstrating substantial improvements over traditional methods in key performance metrics, including pattern discovery count, average support, and minority class coverage. Notably, the method exhibits superior capabilities in minority-class feature extraction and pattern correlation analysis. These findings underscore the potential of depth graph models, in conjunction with hierarchical mining strategies, to significantly enhance the efficiency and accuracy of imbalanced data analysis. This research contributes a novel computational framework for high-dimensional complex data processing and lays the foundation for future extensions to dynamically evolving imbalanced data and multi-modal data applications, thereby expanding the applicability of advanced data mining methodologies to more intricate analytical domains.


MORPH-LER: Log-Euclidean Regularization for Population-Aware Image Registration

arXiv.org Artificial Intelligence

Spatial transformations that capture population-level morphological statistics are critical for medical image analysis. Commonly used smoothness regularizers for image registration fail to integrate population statistics, leading to anatomically inconsistent transformations. Inverse consistency regularizers promote geometric consistency but lack population morphometrics integration. Regularizers that constrain deformation to low-dimensional manifold methods address this. However, they prioritize reconstruction over interpretability and neglect diffeomorphic properties, such as group composition and inverse consistency. We introduce MORPH-LER, a Log-Euclidean regularization framework for population-aware unsupervised image registration. MORPH-LER learns population morphometrics from spatial transformations to guide and regularize registration networks, ensuring anatomically plausible deformations. It features a bottleneck autoencoder that computes the principal logarithm of deformation fields via iterative square-root predictions. It creates a linearized latent space that respects diffeomorphic properties and enforces inverse consistency. By integrating a registration network with a diffeomorphic autoencoder, MORPH-LER produces smooth, meaningful deformation fields. The framework offers two main contributions: (1) a data-driven regularization strategy that incorporates population-level anatomical statistics to enhance transformation validity and (2) a linearized latent space that enables compact and interpretable deformation fields for efficient population morphometrics analysis. We validate MORPH-LER across two families of deep learning-based registration networks, demonstrating its ability to produce anatomically accurate, computationally efficient, and statistically meaningful transformations on the OASIS-1 brain imaging dataset.


ProtoSnap: Prototype Alignment for Cuneiform Signs

arXiv.org Artificial Intelligence

The cuneiform writing system served as the medium for transmitting knowledge in the ancient Near East for a period of over three thousand years. Cuneiform signs have a complex internal structure which is the subject of expert paleographic analysis, as variations in sign shapes bear witness to historical developments and transmission of writing and culture over time. However, prior automated techniques mostly treat sign types as categorical and do not explicitly model their highly varied internal configurations. In this work, we present an unsupervised approach for recovering the fine-grained internal configuration of cuneiform signs by leveraging powerful generative models and the appearance and structure of prototype font images as priors. Our approach, ProtoSnap, enforces structural consistency on matches found with deep image features to estimate the diverse configurations of cuneiform characters, snapping a skeleton-based template to photographed cuneiform signs. We provide a new benchmark of expert annotations and evaluate our method on this task. Our evaluation shows that our approach succeeds in aligning prototype skeletons to a wide variety of cuneiform signs. Moreover, we show that conditioning on structures produced by our method allows for generating synthetic data with correct structural configurations, significantly boosting the performance of cuneiform sign recognition beyond existing techniques, in particular over rare signs. Cuneiform signs have complex internal structures which varied significantly across the eras, cultures, and geographic regions among which cuneiform writing was used. The study of these variations is part of a field called paleography, which is crucial for understanding the historical context of attested writing (Biggs, 1973; Homburg, 2021). However, while computational methods show promise for aiding experts in analyzing cuneiform texts (Bogacz and Mara, 2022), they are challenged by the vast variety of complex sign variants and their visual nature: Represented as wedge-shaped imprints in clay tablets which have often sustained physical damage, cuneiform appears as shadows on a non-uniform clay surface which may even be difficult for human experts to identify under non-optimal lighting conditions (Taylor, 2015).


International AI Safety Report

arXiv.org Artificial Intelligence

I am honoured to present the International AI Safety Report. It is the work of 96 international AI experts who collaborated in an unprecedented effort to establish an internationally shared scientific understanding of risks from advanced AI and methods for managing them. We embarked on this journey just over a year ago, shortly after the countries present at the Bletchley Park AI Safety Summit agreed to support the creation of this report. Since then, we published an Interim Report in May 2024, which was presented at the AI Seoul Summit. We are now pleased to publish the present, full report ahead of the AI Action Summit in Paris in February 2025. Since the Bletchley Summit, the capabilities of general-purpose AI, the type of AI this report focuses on, have increased further. For example, new models have shown markedly better performance at tests of Professor Yoshua Bengio programming and scientific reasoning.


Reviews: Recurrent Registration Neural Networks for Deformable Image Registration

Neural Information Processing Systems

The main advantage of this approach is its efficiency at inference time with comparable performance of B-spline based approach where an optimization is needed per registration. And it has, according to the authors, much less parameters to optimize. Please confirm if this understanding is correct? 2. What is the reason of making the choice of using multiple steps to gradually transform the moving image to the fixed one? Could the local transformation done in one step instead? For instance, the position network could directly predict K locations to transform in one step instead of prediction one location for K steps.


Reviews: Recurrent Registration Neural Networks for Deformable Image Registration

Neural Information Processing Systems

The paper seems to contribute in a significant way in proposing an alternative RNN-based approach for deformable image registration. Although the experimental setting is not extremely strong, the proposed approach seems to give significant computational advantages. Rebuttal clarified most of the reviewers concerns.


Reviews: This Looks Like That: Deep Learning for Interpretable Image Recognition

Neural Information Processing Systems

The prototypical parts network presented in this work is original and potentially very useful learning framework for domains where process-based interpretability is critical. The method is thoroughly evaluated against alternative approaches and performs comparable to other state-of-the-art interpretable learning algorithms. The paper is well written, well motivated, and is accompanied by empirical results to validate the algorithmic contributions. Overall, I would recommend this paper for acceptance. One place for improvement is the discussion of this work in the context of alternative interpretable approaches, specifically the methods that show comparable accuracy.