AITopics | Pattern Recognition

Collaborating Authors

Pattern Recognition

"... the research area that studies the operation and design of systems that recognize patterns in data." It includes statistical methods like discriminant analysis, feature extraction, error estimation, cluster analysis.
– Pattern Recognition Laboratory at Delft University of Technology

News Overviews Instructional Materials AI-Alerts Classics

Reading in the Dark with Foveated Event Vision

Brander, Carl, Cioffi, Giovanni, Messikommer, Nico, Scaramuzza, Davide

arXiv.org Artificial IntelligenceJun-10-2025

Current smart glasses equipped with RGB cameras struggle to perceive the environment in low-light and high-speed motion scenarios due to motion blur and the limited dynamic range of frame cameras. Additionally, capturing dense images with a frame camera requires large bandwidth and power consumption, consequently draining the battery faster . These challenges are especially relevant for developing algorithms that can read text from images. In this work, we propose a novel event-based Optical Character Recognition (OCR) approach for smart glasses. By using the eye gaze of the user, we foveate the event stream to significantly reduce bandwidth by around 98% while exploiting the benefits of event cameras in high-dynamic and fast scenes. Our proposed method performs deep binary reconstruction trained on synthetic data and leverages multi-modal LLMs for OCR, outperforming traditional OCR solutions. Our results demonstrate the ability to read text in low light environments where RGB cameras struggle while using up to 2'400 times less bandwidth than a wearable RGB camera.

large language model, machine learning, pattern recognition, (20 more...)

arXiv.org Artificial Intelligence

2506.06918

Country: Europe > Switzerland (0.46)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.69)

Add feedback

Neural networks with image recognition by pairs

Geidarov, Polad

arXiv.org Artificial IntelligenceJun-10-2025

Neural networks based on metric recognition methods have a strictly determined architecture. Number of neurons, connections, as well as weights and thresholds values are calculated analytically, based on the initial conditions of tasks: number of recognizable classes, number of samples, metric expressions used. This paper discusses the possibility of transforming these networks in order to apply classical learning algorithms to them without using analytical expressions that calculate weight values. In the received network, training is carried out by recognizing images in pairs. This approach simplifies the learning process and easily allows to expand the neural network by adding new images to the recognition task. The advantages of these networks, including such as: 1) network architecture simplicity and transparency; 2) training simplicity and reliability; 3) the possibility of using a large number of images in the recognition problem using a neural network; 4) a consistent increase in the number of recognizable classes without changing the previous values of weights and thresholds.

artificial intelligence, machine learning, pattern recognition, (17 more...)

arXiv.org Artificial Intelligence

2506.06322

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.41)

Add feedback

The Latent Space Hypothesis: Toward Universal Medical Representation Learning

Patel, Salil

arXiv.org Artificial IntelligenceJun-6-2025

Medical data range from genomic sequences and retinal photographs to structured laboratory results and unstructured clinical narratives. Although these modalities appear disparate, many encode convergent information about a single underlying physiological state. The Latent Space Hypothesis frames each observation as a projection of a unified, hierarchically organized manifold -- much like shadows cast by the same three-dimensional object. Within this learned geometric representation, an individual's health status occupies a point, disease progression traces a trajectory, and therapeutic intervention corresponds to a directed vector. Interpreting heterogeneous evidence in a shared space provides a principled way to re-examine eponymous conditions -- such as Parkinson's or Crohn's -- that often mask multiple pathophysiological entities and involve broader anatomical domains than once believed. By revealing sub-trajectories and patient-specific directions of change, the framework supplies a quantitative rationale for personalised diagnosis, longitudinal monitoring, and tailored treatment, moving clinical practice away from grouping by potentially misleading labels toward navigation of each person's unique trajectory. Challenges remain -- bias amplification, data scarcity for rare disorders, privacy, and the correlation-causation divide -- but scale-aware encoders, continual learning on longitudinal data streams, and perturbation-based validation offer plausible paths forward.

bioinformatics, machine learning, pattern recognition, (19 more...)

arXiv.org Artificial Intelligence

2506.04515

Country: North America > United States (0.67)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology > Parkinson's Disease (1.00)
Health & Medicine > Therapeutic Area > Musculoskeletal (1.00)
(8 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(7 more...)

Add feedback

Plant Bioelectric Early Warning Systems: A Five-Year Investigation into Human-Plant Electromagnetic Communication

Gloor, Peter A.

arXiv.org Artificial IntelligenceJun-5-2025

We present a comprehensive investigation into plant bioelectric responses to human presence and emotional states, building on five years of systematic research. Using custom-built plant sensors and machine learning classification, we demonstrate that plants generate distinct bioelectric signals correlating with human proximity, emotional states, and physiological conditions. A deep learning model based on ResNet50 architecture achieved 97% accuracy in classifying human emotional states through plant voltage spectrograms, while control models with shuffled labels achieved only 30% accuracy. This study synthesizes findings from multiple experiments spanning 2020-2025, including individual recognition (66% accuracy), eurythmic gesture detection, stress prediction, and responses to human voice and movement. We propose that these phenomena represent evolved anti-herbivory early warning systems, where plants detect approaching animals through bioelectric field changes before physical contact. Our results challenge conventional understanding of plant sensory capabilities and suggest practical applications in agriculture, healthcare, and human-plant interaction research.

accuracy, machine learning, pattern recognition, (16 more...)

arXiv.org Artificial Intelligence

2506.04132

Country: Europe > Switzerland (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.95)
Food & Agriculture > Agriculture (0.89)
Commercial Services & Supplies > Security & Alarm Services (0.72)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.47)

Add feedback

Implicit Deformable Medical Image Registration with Learnable Kernels

Fogarollo, Stefano, Laimer, Gregor, Bale, Reto, Harders, Matthias

arXiv.org Artificial IntelligenceJun-4-2025

Deformable medical image registration is an essential task in computer-assisted interventions. This problem is particularly relevant to oncological treatments, where precise image alignment is necessary for tracking tumor growth, assessing treatment response, and ensuring accurate delivery of therapies. Recent AI methods can outperform traditional techniques in accuracy and speed, yet they often produce unreliable deformations that limit their clinical adoption. In this work, we address this challenge and introduce a novel implicit registration framework that can predict accurate and reliable deformations. Our insight is to reformulate image registration as a signal reconstruction problem: we learn a kernel function that can recover the dense displacement field from sparse keypoint correspondences. We integrate our method in a novel hierarchical architecture, and estimate the displacement field in a coarse-to-fine manner. Our formulation also allows for efficient refinement at test time, permitting clinicians to easily adjust registrations when needed. We validate our method on challenging intra-patient thoracic and abdominal zero-shot registration tasks, using public and internal datasets from the local University Hospital. Our method not only shows competitive accuracy to state-of-the-art approaches, but also bridges the generalization gap between implicit and explicit registration techniques. In particular, our method generates deformations that better preserve anatomical relationships and matches the performance of specialized commercial systems, underscoring its potential for clinical adoption.

machine learning, pattern recognition, registration, (15 more...)

arXiv.org Artificial Intelligence

2506.0215

Country: Europe > Austria (0.15)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.88)

Add feedback

QARI-OCR: High-Fidelity Arabic Text Recognition through Multimodal Large Language Model Adaptation

Wasfy, Ahmed, Nacar, Omer, Elkhateb, Abdelakreem, Reda, Mahmoud, Elshehy, Omar, Ammar, Adel, Boulila, Wadii

arXiv.org Artificial IntelligenceJun-4-2025

The inherent complexities of Arabic script; its cursive nature, diacritical marks (tashkeel), and varied typography, pose persistent challenges for Optical Character Recognition (OCR). We present Qari-OCR, a series of vision-language models derived from Qwen2-VL-2B-Instruct, progressively optimized for Arabic through iterative fine-tuning on specialized synthetic datasets. Our leading model, QARI v0.2, establishes a new open-source state-of-the-art with a Word Error Rate (WER) of 0.160, Character Error Rate (CER) of 0.061, and BLEU score of 0.737 on diacritically-rich texts. Qari-OCR demonstrates superior handling of tashkeel, diverse fonts, and document layouts, alongside impressive performance on low-resolution images. Further explorations (QARI v0.3) showcase strong potential for structural document understanding and handwritten text. This work delivers a marked improvement in Arabic OCR accuracy and efficiency, with all models and datasets released to foster further research.

machine learning, pattern recognition, recognition, (19 more...)

arXiv.org Artificial Intelligence

2506.02295

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Text Recognition (0.42)

Add feedback

DiG-Net: Enhancing Quality of Life through Hyper-Range Dynamic Gesture Recognition in Assistive Robotics

Beeri, Eran Bamani, Nissinman, Eden, Sintov, Avishai

arXiv.org Artificial IntelligenceJun-2-2025

Dynamic hand gestures play a pivotal role in assistive human-robot interaction (HRI), facilitating intuitive, non-verbal communication, particularly for individuals with mobility constraints or those operating robots remotely. Current gesture recognition methods are mostly limited to short-range interactions, reducing their utility in scenarios demanding robust assistive communication from afar. In this paper, we introduce a novel approach designed specifically for assistive robotics, enabling dynamic gesture recognition at extended distances of up to 30 meters, thereby significantly improving accessibility and quality of life. Our proposed Distance-aware Gesture Network (DiG-Net) effectively combines Depth-Conditioned Deformable Alignment (DADA) blocks with Spatio-Temporal Graph modules, enabling robust processing and classification of gesture sequences captured under challenging conditions, including significant physical attenuation, reduced resolution, and dynamic gesture variations commonly experienced in real-world assistive environments. We further introduce the Radiometric Spatio-Temporal Depth Attenuation Loss (RSTDAL), shown to enhance learning and strengthen model robustness across varying distances. Our model demonstrates significant performance improvement over state-of-the-art gesture recognition frameworks, achieving a recognition accuracy of 97.3% on a diverse dataset with challenging hyper-range gestures. Introduction The growing number of individuals living with disabilities and requiring assistance has created a pressing demand for assistive technologies that enhance users' independence, safety, and quality of life [1]. Among these, assistive robotic systems are increasingly integrated into environments where intuitive, nonverbal communication is essential for enabling natural interaction with individuals of varied abilities. Gesture-based interaction is particularly important in scenarios where speech is not an option. To contextualize our contribution, Table 1 presents a comparative overview of recent systems in this area, outlining their target users, sensing modalities, application domains, and level of human involvement. Our method is the only one to support dynamic gesture recognition at hyper-range distances, defined here as up to 30 meters, and to operate reliably in both indoor and outdoor environments, making it uniquely suited for real-world assistive deployment. Recent research has catalyzed a new generation of assistive systems capable of perceiving complex environments and interacting with humans in context-aware, natural ways [2, 3, 4].

machine learning, pattern recognition, recognition, (15 more...)

arXiv.org Artificial Intelligence

2505.24786

Country: North America > United States (0.28)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision > Gesture Recognition (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)

Add feedback

This Looks Like That: Deep Learning for Interpretable Image Recognition

Neural Information Processing SystemsJun-1-2025, 08:31:29 GMT

When we are faced with challenging image classification tasks, we often explain our reasoning by dissecting the image, and pointing out prototypical aspects of one class or another. The mounting evidence for each of the classes helps us make our final decision. In this work, we introduce a deep network architecture -- prototypical part network (ProtoPNet), that reasons in a similar way: the network dissects the image by finding prototypical parts, and combines evidence from the prototypes to make a final classification. The model thus reasons in a way that is qualitatively similar to the way ornithologists, physicians, and others would explain to people on how to solve challenging image classification tasks. The network uses only image-level labels for training without any annotations for parts of images.

deep learning, interpretable image recognition, protopnet, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

AnchorAttention: Difference-Aware Sparse Attention with Stripe Granularity

Zhang, Yu, Guo, Dong, Wu, Fang, Zhu, Guoliang, Ding, Dian, Zhang, Yiming

arXiv.org Artificial IntelligenceMay-30-2025

Large Language Models (LLMs) with extended context lengths face significant computational challenges during the pre-filling phase, primarily due to the quadratic complexity of self-attention. Existing methods typically employ dynamic pattern matching and block-sparse low-level implementations. However, their reliance on local information for pattern identification fails to capture global contexts, and the coarse granularity of blocks leads to persistent internal sparsity, resulting in suboptimal accuracy and efficiency. To address these limitations, we propose \textbf{AnchorAttention}, a difference-aware, dynamic sparse attention mechanism that efficiently identifies critical attention regions at a finer stripe granularity while adapting to global contextual information, achieving superior speed and accuracy. AnchorAttention comprises three key components: (1) \textbf{Pattern-based Anchor Computation}, leveraging the commonalities present across all inputs to rapidly compute a set of near-maximum scores as the anchor; (2) \textbf{Difference-aware Stripe Sparsity Identification}, performing difference-aware comparisons with the anchor to quickly obtain discrete coordinates of significant regions in a stripe-like sparsity pattern; (3) \textbf{Fine-grained Sparse Computation}, replacing the traditional contiguous KV block loading approach with simultaneous discrete KV position loading to maximize sparsity rates while preserving full hardware computational potential. With its finer-grained sparsity strategy, \textbf{AnchorAttention} achieves higher sparsity rates at the same recall level, significantly reducing computation time. Compared to previous state-of-the-art methods, at a text length of 128k, it achieves a speedup of 1.44$\times$ while maintaining higher recall rates.

large language model, machine learning, pattern recognition, (18 more...)

arXiv.org Artificial Intelligence

2505.2352

Country: Asia > China (0.28)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.48)

Add feedback

Generalized Category Discovery in Event-Centric Contexts: Latent Pattern Mining with LLMs

Luo, Yi, Wang, Qiwen, Yang, Junqi, Tang, Luyao, Lin, Zhenghao, Ying, Zhenzhe, Wang, Weiqiang, Lin, Chen

arXiv.org Artificial IntelligenceMay-30-2025

Generalized Category Discovery (GCD) aims to classify both known and novel categories using partially labeled data that contains only known classes. Despite achieving strong performance on existing benchmarks, current textual GCD methods lack sufficient validation in realistic settings. We introduce Event-Centric GCD (EC-GCD), characterized by long, complex narratives and highly imbalanced class distributions, posing two main challenges: (1) divergent clustering versus classification groupings caused by subjective criteria, and (2) Unfair alignment for minority classes. To tackle these, we propose PaMA, a framework leveraging LLMs to extract and refine event patterns for improved cluster-class alignment. Additionally, a ranking-filtering-mining pipeline ensures balanced representation of prototypes across imbalanced categories. Evaluations on two EC-GCD benchmarks, including a newly constructed Scam Report dataset, demonstrate that PaMA outperforms prior methods with up to 12.58% H-score gains, while maintaining strong generalization on base GCD datasets.

large language model, machine learning, pattern recognition, (15 more...)

arXiv.org Artificial Intelligence

2505.23304

Country:

North America > United States (1.00)
Asia (1.00)

Genre: Research Report (1.00)

Industry:

Banking & Finance (1.00)
Information Technology (0.70)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)
(2 more...)

Add feedback