AITopics | Pattern Recognition

Collaborating Authors

Pattern Recognition

"... the research area that studies the operation and design of systems that recognize patterns in data." It includes statistical methods like discriminant analysis, feature extraction, error estimation, cluster analysis.
– Pattern Recognition Laboratory at Delft University of Technology

News Overviews Instructional Materials AI-Alerts Classics

Accurate Scene Text Recognition with Efficient Model Scaling and Cloze Self-Distillation

Maracani, Andrea, Ozkan, Savas, Cho, Sijun, Kim, Hyowon, Noh, Eunchung, Min, Jeongwon, Min, Cho Jung, Park, Dookun, Ozay, Mete

arXiv.org Artificial IntelligenceMar-20-2025

Scaling architectures have been proven effective for improving Scene Text Recognition (STR), but the individual contribution of vision encoder and text decoder scaling remain under-explored. In this work, we present an in-depth empirical analysis and demonstrate that, contrary to previous observations, scaling the decoder yields significant performance gains, always exceeding those achieved by encoder scaling alone. We also identify label noise as a key challenge in STR, particularly in real-world data, which can limit the effectiveness of STR models. To address this, we propose Cloze Self-Distillation (CSD), a method that mitigates label noise by distilling a student model from context-aware soft predictions and pseudolabels generated by a teacher model. Additionally, we enhance the decoder architecture by introducing differential cross-attention for STR. Our methodology achieves state-of-the-art performance on 10 out of 11 benchmarks using only real data, while significantly reducing the parameter size and computational costs.

large language model, machine learning, pattern recognition, (19 more...)

arXiv.org Artificial Intelligence

2503.16184

Country: South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre: Research Report > Promising Solution (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Text Recognition (0.62)

Add feedback

Multi-Modal Gesture Recognition from Video and Surgical Tool Pose Information via Motion Invariants

Atoum, Jumanh, Johnston, Garrison L. H., Simaan, Nabil, Wu, Jie Ying

arXiv.org Artificial IntelligenceMar-19-2025

Recognizing surgical gestures in real-time is a stepping stone towards automated activity recognition, skill assessment, intra-operative assistance, and eventually surgical automation. The current robotic surgical systems provide us with rich multi-modal data such as video and kinematics. While some recent works in multi-modal neural networks learn the relationships between vision and kinematics data, current approaches treat kinematics information as independent signals, with no underlying relation between tool-tip poses. However, instrument poses are geometrically related, and the underlying geometry can aid neural networks in learning gesture representation. Therefore, we propose combining motion invariant measures (curvature and torsion) with vision and kinematics data using a relational graph network to capture the underlying relations between different data streams. We show that gesture recognition improves when combining invariant signals with tool position, achieving 90.3\% frame-wise accuracy on the JIGSAWS suturing dataset. Our results show that motion invariant signals coupled with position are better representations of gesture motion compared to traditional position and quaternion representations. Our results highlight the need for geometric-aware modeling of kinematics for gesture recognition.

artificial intelligence, machine learning, pattern recognition, (3 more...)

arXiv.org Artificial Intelligence

2503.15647

Genre: Research Report > New Finding (0.73)

Industry:

Health & Medicine > Surgery (0.40)
Health & Medicine > Health Care Technology (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision > Gesture Recognition (0.80)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.80)

Add feedback

EgoEvGesture: Gesture Recognition Based on Egocentric Event Camera

Wang, Luming, Shi, Hao, Yin, Xiaoting, Yang, Kailun, Wang, Kaiwei

arXiv.org Artificial IntelligenceMar-16-2025

-- Egocentric gesture recognition is a pivotal technology for enhancing natural human-computer interaction, yet traditional RGB-based solutions suffer from motion blur and illumination variations in dynamic scenarios. While event cameras show distinct advantages in handling high dynamic range with ultra-low power consumption, existing RGB-based architectures face inherent limitations in processing asynchronous event streams due to their synchronous frame-based nature. Moreover, from an egocentric perspective, event cameras record data that includes events generated by both head movements and hand gestures, thereby increasing the complexity of gesture recognition. T o address this, we propose a novel network architecture specifically designed for event data processing, incorporating (1) a lightweight CNN with asymmetric depthwise convolutions to reduce parameters while preserving spatiotemporal features, (2) a plug-and-play state-space model as context block that decouples head movement noise from gesture dynamics, and (3) a parameter-free Bins-T emporal Shift Module (BSTM) that shifts features along bins and temporal dimensions to fuse sparse events efficiently. We further establish the EgoEvGesture dataset, the first large-scale dataset for egocentric gesture recognition using event cameras. Experimental results demonstrate that our method achieves 62.7% accuracy tested on unseen subjects with only 7M parameters, 3.1% higher than state-of-the-art approaches. Notable misclassifications in freestyle motions stem from high interpersonal variability and unseen test patterns differing from training data. Moreover, our approach achieved a remarkable accuracy of 97.0% on the DVS128 Gesture, demonstrating the effectiveness and generalization capability of our method on public datasets.

machine learning, pattern recognition, recognition, (18 more...)

arXiv.org Artificial Intelligence

2503.12419

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre:

Research Report > Promising Solution (0.48)
Overview > Innovation (0.34)

Industry: Information Technology (0.49)

Technology:

Information Technology > Artificial Intelligence > Vision > Gesture Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)

Add feedback

Enhancing Hand Palm Motion Gesture Recognition by Eliminating Reference Frame Bias via Frame-Invariant Similarity Measures

Verduyn, Arno, Vochten, Maxim, De Schutter, Joris

arXiv.org Artificial IntelligenceMar-14-2025

The ability of robots to recognize human gestures facilitates a natural and accessible human-robot collaboration. However, most work in gesture recognition remains rooted in reference frame-dependent representations. This poses a challenge when reference frames vary due to different work cell layouts, imprecise frame calibrations, or other environmental changes. This paper investigated the use of invariant trajectory descriptors for robust hand palm motion gesture recognition under reference frame changes. First, a novel dataset of recorded Hand Palm Motion (HPM) gestures is introduced. The motion gestures in this dataset were specifically designed to be distinguishable without dependence on specific reference frames or directional cues. Afterwards, multiple invariant trajectory descriptor approaches were benchmarked to assess how their performances generalize to this novel HPM dataset. After this offline benchmarking, the best scoring approach is validated for online recognition by developing a real-time Proof of Concept (PoC). In this PoC, hand palm motion gestures were used to control the real-time movement of a manipulator arm. The PoC demonstrated a high recognition reliability in real-time operation, achieving an $F_1$-score of 92.3%. This work demonstrates the effectiveness of the invariant descriptor approach as a standalone solution. Moreover, we believe that the invariant descriptor approach can also be utilized within other state-of-the-art pattern recognition and learning systems to improve their robustness against reference frame variations.

dataset, recognition, variation, (16 more...)

arXiv.org Artificial Intelligence

2503.11352

Country:

Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Vision > Gesture Recognition (0.92)

Add feedback

From Dionysius Emerges Apollo -- Learning Patterns and Abstractions from Perceptual Sequences

Wu, Shuchen

arXiv.org Artificial IntelligenceMar-13-2025

Cognition swiftly breaks high-dimensional sensory streams into familiar parts and uncovers their relations. Why do structures emerge, and how do they enable learning, generalization, and prediction? What computational principles underlie this core aspect of perception and intelligence? A sensory stream, simplified, is a one-dimensional sequence. In learning such sequences, we naturally segment them into parts -- a process known as chunking. In the first project, I investigated factors influencing chunking in a serial reaction time task and showed that humans adapt to underlying chunks while balancing speed and accuracy. Building on this, I developed models that learn chunks and parse sequences chunk by chunk. Normatively, I proposed chunking as a rational strategy for discovering recurring patterns and nested hierarchies, enabling efficient sequence factorization. Learned chunks serve as reusable primitives for transfer, composition, and mental simulation -- letting the model compose the new from the known. I demonstrated this model's ability to learn hierarchies in single and multi-dimensional sequences and highlighted its utility for unsupervised pattern discovery. The second part moves from concrete to abstract sequences. I taxonomized abstract motifs and examined their role in sequence memory. Behavioral evidence suggests that humans exploit pattern redundancies for compression and transfer. I proposed a non-parametric hierarchical variable model that learns both chunks and abstract variables, uncovering invariant symbolic patterns. I showed its similarity to human learning and compared it to large language models. Taken together, this thesis suggests that chunking and abstraction as simple computational principles enable structured knowledge acquisition in hierarchically organized sequences, from simple to complex, concrete to abstract.

abstraction, cit, sequence, (16 more...)

arXiv.org Artificial Intelligence

2503.10973

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(7 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Education > Curriculum (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(5 more...)

Add feedback

Moss: Proxy Model-based Full-Weight Aggregation in Federated Learning with Heterogeneous Models

Cai, Yifeng, Zhang, Ziqi, Li, Ding, Guo, Yao, Chen, Xiangqun

arXiv.org Artificial IntelligenceMar-13-2025

Modern Federated Learning (FL) has become increasingly essential for handling highly heterogeneous mobile devices. Current approaches adopt a partial model aggregation paradigm that leads to sub-optimal model accuracy and higher training overhead. In this paper, we challenge the prevailing notion of partial-model aggregation and propose a novel "full-weight aggregation" method named Moss, which aggregates all weights within heterogeneous models to preserve comprehensive knowledge. Evaluation across various applications demonstrates that Moss significantly accelerates training, reduces on-device training time and energy consumption, enhances accuracy, and minimizes network bandwidth utilization when compared to state-of-the-art baselines.

architecture, heterogeneous model, moss, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3712274

2503.10218

Country:

North America > United States > Illinois > Champaign County > Urbana (0.14)
Asia > China > Beijing > Beijing (0.04)
North America > United States > Virginia (0.04)
(3 more...)

Genre: Research Report > New Finding (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.46)

Add feedback

Event-Driven Implementation of a Physical Reservoir Computing Framework for superficial EMG-based Gesture Recognition

Ding, Yuqi, Donati, Elisa, Li, Haobo, Heidari, Hadi

arXiv.org Artificial IntelligenceMar-10-2025

Wearable health devices have a strong demand in real-time biomedical signal processing. However traditional methods often require data transmission to centralized processing unit with substantial computational resources after collecting it from edge devices. Neuromorphic computing is an emerging field that seeks to design specialized hardware for computing systems inspired by the structure, function, and dynamics of the human brain, offering significant advantages in latency and power consumption. This paper explores a novel neuromorphic implementation approach for gesture recognition by extracting spatiotemporal spiking information from surface electromyography (sEMG) data in an event-driven manner. At the same time, the network was designed by implementing a simple-structured and hardware-friendly Physical Reservoir Computing (PRC) framework called Rotating Neuron Reservoir (RNR) within the domain of Spiking neural network (SNN). The spiking RNR (sRNR) is promising to pipeline an innovative solution to compact embedded wearable systems, enabling low-latency, real-time processing directly at the sensor level. The proposed system was validated by an open-access large-scale sEMG database and achieved an average classification accuracy of 74.6\% and 80.3\% using a classical machine learning classifier and a delta learning rule algorithm respectively. While the delta learning rule could be fully spiking and implementable on neuromorphic chips, the proposed gesture recognition system demonstrates the potential for near-sensor low-latency processing.

accuracy, neuron, spike train, (12 more...)

arXiv.org Artificial Intelligence

2503.13492

Country:

Europe > United Kingdom (0.14)
Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report > Promising Solution (0.48)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Gesture Recognition (0.92)
(2 more...)

Add feedback

Robustness of Generalized Median Computation for Consensus Learning in Arbitrary Spaces

Nienkötter, Andreas, Vega-Pons, Sandro, Jiang, Xiaoyi

arXiv.org Artificial IntelligenceMar-7-2025

Robustness in terms of outliers is an important topic and has been formally studied for a variety of problems in machine learning and computer vision. Generalized median computation is a special instance of consensus learning and a common approach to finding prototypes. Related research can be found in numerous problem domains with a broad range of applications. So far, however, robustness of generalized median has only been studied in a few specific spaces. To our knowledge, there is no robustness characterization in a general setting, i.e. for arbitrary spaces. We address this open issue in our work. The breakdown point >=0.5 is proved for generalized median with metric distance functions in general. We also study the detailed behavior in case of outliers from different perspectives. In addition, we present robustness results for weighted generalized median computation and non-metric distance functions. Given the importance of robustness, our work contributes to closing a gap in the literature. The presented results have general impact and applicability, e.g. providing deeper understanding of generalized median computation and practical guidance to avoid non-robust computation.

breakdown point, displacement, distance function, (16 more...)

arXiv.org Artificial Intelligence

2503.05215

Country:

North America > United States > Michigan (0.04)
North America > United States > Florida > Palm Beach County > Lake Worth (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)

Add feedback

LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese Receipts

Le, Thanh-Phong, Phan, Trung Le Chi, Nguyen, Nghia Hieu, Van Nguyen, Kiet

arXiv.org Artificial IntelligenceMar-7-2025

Document Visual Question Answering (Document VQA) challenges multimodal systems to holistically handle textual, layout, and visual modalities to provide appropriate answers. Document VQA has gained popularity in recent years due to the increasing amount of documents and the high demand for digitization. Nonetheless, most of document VQA datasets are developed in high-resource languages such as English. In this paper, we present ReceiptVQA (\textbf{Receipt} \textbf{V}isual \textbf{Q}uestion \textbf{A}nswering), the initial large-scale document VQA dataset in Vietnamese dedicated to receipts, a document kind with high commercial potentials. The dataset encompasses \textbf{9,000+} receipt images and \textbf{60,000+} manually annotated question-answer pairs. In addition to our study, we introduce LiGT (\textbf{L}ayout-\textbf{i}nfused \textbf{G}enerative \textbf{T}ransformer), a layout-aware encoder-decoder architecture designed to leverage embedding layers of language models to operate layout embeddings, minimizing the use of additional neural modules. Experiments on ReceiptVQA show that our architecture yielded promising performance, achieving competitive results compared with outstanding baselines. Furthermore, throughout analyzing experimental results, we found evident patterns that employing encoder-only model architectures has considerable disadvantages in comparison to architectures that can generate answers. We also observed that it is necessary to combine multiple modalities to tackle our dataset, despite the critical role of semantic understanding from language models. We hope that our work will encourage and facilitate future development in Vietnamese document VQA, contributing to a diverse multimodal research community in the Vietnamese language.

architecture, dataset, proceedings, (15 more...)

arXiv.org Artificial Intelligence

2502.19202

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
South America > Colombia > Meta Department > Villavicencio (0.04)
North America > United States > New York > New York County > New York City (0.04)
(7 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.71)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.46)
(2 more...)

Add feedback

SoftMatcha: A Soft and Fast Pattern Matcher for Billion-Scale Corpus Searches

Deguchi, Hiroyuki, Kamoda, Go, Matsushita, Yusuke, Taguchi, Chihiro, Suenaga, Kohei, Waga, Masaki, Yokoi, Sho

arXiv.org Artificial IntelligenceMar-5-2025

Researchers and practitioners in natural language processing and computational linguistics frequently observe and analyze the real language usage in large-scale corpora. For that purpose, they often employ off-the-shelf pattern-matching tools, such as grep, and keyword-in-context concordancers, which is widely used in corpus linguistics for gathering examples. Nonetheless, these existing techniques rely on surface-level string matching, and thus they suffer from the major limitation of not being able to handle orthographic variations and paraphrasing -- notable and common phenomena in any natural language. In addition, existing continuous approaches such as dense vector search tend to be overly coarse, often retrieving texts that are unrelated but share similar topics. Given these challenges, we propose a novel algorithm that achieves \emph{soft} (or semantic) yet efficient pattern matching by relaxing a surface-level matching with word embeddings. Our algorithm is highly scalable with respect to the size of the corpus text utilizing inverted indexes. We have prepared an efficient implementation, and we provide an accessible web tool. Our experiments demonstrate that the proposed method (i) can execute searches on billion-scale corpora in less than a second, which is comparable in speed to surface-level string matching and dense vector search; (ii) can extract harmful instances that semantically match queries from a large set of English and Japanese Wikipedia articles; and (iii) can be effectively applied to corpus-linguistic analyses of Latin, a language with highly diverse inflections.

computational linguistic, conference paper, corpora, (15 more...)

arXiv.org Artificial Intelligence

2503.03703

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
North America > United States > New York > New York County > New York City (0.04)
(22 more...)

Genre: Research Report (0.65)

Industry:

Leisure & Entertainment (1.00)
Information Technology (1.00)
Media > Television (0.46)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
(2 more...)

Add feedback