AITopics | Pattern Recognition

Collaborating Authors

Pattern Recognition

"... the research area that studies the operation and design of systems that recognize patterns in data." It includes statistical methods like discriminant analysis, feature extraction, error estimation, cluster analysis.
– Pattern Recognition Laboratory at Delft University of Technology

News Overviews Instructional Materials AI-Alerts Classics

Dynamic Gesture Recognition in Ultra-Range Distance for Effective Human-Robot Interaction

Beeri, Eran Bamani, Nissinman, Eden, Sintov, Avishai

arXiv.org Artificial IntelligenceJul-31-2024

This paper presents a novel approach for ultra-range gesture recognition, addressing Human-Robot Interaction (HRI) challenges over extended distances. By leveraging human gestures in video data, we propose the Temporal-Spatiotemporal Fusion Network (TSFN) model that surpasses the limitations of current methods, enabling robots to understand gestures from long distances. With applications in service robots, search and rescue operations, and drone-based interactions, our approach enhances HRI in expansive environments. Experimental validation demonstrates significant advancements in gesture recognition accuracy, particularly in prolonged gesture sequences.

convolution, gesture recognition, recognition, (12 more...)

arXiv.org Artificial Intelligence

2407.21374

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.07)
Europe > Netherlands > North Holland > Amsterdam (0.05)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Vision > Gesture Recognition (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.99)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.65)

Add feedback

GesturePrint: Enabling User Identification for mmWave-based Gesture Recognition Systems

Xu, Lilin, Wang, Keyi, Gu, Chaojie, Guo, Xiuzhen, He, Shibo, Chen, Jiming

arXiv.org Artificial IntelligenceJul-25-2024

The millimeter-wave (mmWave) radar has been exploited for gesture recognition. However, existing mmWave-based gesture recognition methods cannot identify different users, which is important for ubiquitous gesture interaction in many applications. In this paper, we propose GesturePrint, which is the first to achieve gesture recognition and gesture-based user identification using a commodity mmWave radar sensor. GesturePrint features an effective pipeline that enables the gesture recognition system to identify users at a minor additional cost. By introducing an efficient signal preprocessing stage and a network architecture GesIDNet, which employs an attention-based multilevel feature fusion mechanism, GesturePrint effectively extracts unique gesture features for gesture recognition and personalized motion pattern features for user identification. We implement GesturePrint and collect data from 17 participants performing 15 gestures in a meeting room and an office, respectively. GesturePrint achieves a gesture recognition accuracy (GRA) of 98.87% with a user identification accuracy (UIA) of 99.78% in the meeting room, and 98.22% GRA with 99.26% UIA in the office. Extensive experiments on three public datasets and a new gesture dataset show GesturePrint's superior performance in enabling effective user identification for gesture recognition systems.

gesture recognition, gestureprint, identification, (15 more...)

arXiv.org Artificial Intelligence

2408.05358

Country:

North America > United States > Texas (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > China > Zhejiang Province (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology (1.00)
Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision > Gesture Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition

Bhatia, Gagan, Nagoudi, El Moatez Billah, Alwajih, Fakhraddin, Abdul-Mageed, Muhammad

arXiv.org Artificial IntelligenceJul-18-2024

Arabic Optical Character Recognition (OCR) and Handwriting Recognition (HWR) pose unique challenges due to the cursive and context-sensitive nature of the Arabic script. This study introduces Qalam, a novel foundation model designed for Arabic OCR and HWR, built on a SwinV2 encoder and RoBERTa decoder architecture. Our model significantly outperforms existing methods, achieving a Word Error Rate (WER) of just 0.80% in HWR tasks and 1.18% in OCR tasks. We train Qalam on a diverse dataset, including over 4.5 million images from Arabic manuscripts and a synthetic dataset comprising 60k image-text pairs. Notably, Qalam demonstrates exceptional handling of Arabic diacritics, a critical feature in Arabic scripts. Furthermore, it shows a remarkable ability to process high-resolution inputs, addressing a common limitation in current OCR systems. These advancements underscore Qalam's potential as a leading solution for Arabic script recognition, offering a significant leap in accuracy and efficiency.

dataset, qalam, recognition, (12 more...)

arXiv.org Artificial Intelligence

2407.13559

Country:

North America > United States (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > Canada > British Columbia (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Vision > Handwriting Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition

Liu, Chenyu, Pan, Jia, Hu, Jinshui, Yin, Baocai, Yin, Bing, Chen, Mingjun, Liu, Cong, Du, Jun, Liu, Qingfeng

arXiv.org Artificial IntelligenceJul-16-2024

Recently, Handwritten Mathematical Expression Recognition (HMER) has gained considerable attention in pattern recognition for its diverse applications in document understanding. Current methods typically approach HMER as an image-to-sequence generation task within an autoregressive (AR) encoder-decoder framework. However, these approaches suffer from several drawbacks: 1) a lack of overall language context, limiting information utilization beyond the current decoding step; 2) error accumulation during AR decoding; and 3) slow decoding speed. To tackle these problems, this paper makes the first attempt to build a novel bottom-up Non-AutoRegressive Modeling approach for HMER, called NAMER. NAMER comprises a Visual Aware Tokenizer (VAT) and a Parallel Graph Decoder (PGD). Initially, the VAT tokenizes visible symbols and local relations at a coarse level. Subsequently, the PGD refines all tokens and establishes connectivities in parallel, leveraging comprehensive visual and linguistic contexts. Experiments on CROHME 2014/2016/2019 and HME100K datasets demonstrate that NAMER not only outperforms the current state-of-the-art (SOTA) methods on ExpRate by 1.93%/2.35%/1.49%/0.62%, but also achieves significant speedups of 13.7x and 6.7x faster in decoding time and overall FPS, proving the effectiveness and efficiency of NAMER.

mathematical expression recognition, namer, recognition, (13 more...)

arXiv.org Artificial Intelligence

2407.1138

Country:

Asia > China > Anhui Province > Hefei (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
(2 more...)

Add feedback

Helios: An extremely low power event-based gesture recognition for always-on smart eyewear

Bhattacharyya, Prarthana, Mitton, Joshua, Page, Ryan, Morgan, Owen, Menzies, Ben, Homewood, Gabriel, Jacobs, Kemi, Baesso, Paolo, Trickett, Dave, Mair, Chris, Muhonen, Taru, Clark, Rory, Berridge, Louis, Vigars, Richard, Wallace, Iain

arXiv.org Artificial IntelligenceJul-11-2024

This paper introduces Helios, the first extremely low-power, real-time, event-based hand gesture recognition system designed for all-day on smart eyewear. As augmented reality (AR) evolves, current smart glasses like the Meta Ray-Bans prioritize visual and wearable comfort at the expense of functionality. Existing human-machine interfaces (HMIs) in these devices, such as capacitive touch and voice controls, present limitations in ergonomics, privacy and power consumption. Helios addresses these challenges by leveraging natural hand interactions for a more intuitive and comfortable user experience. Our system utilizes a extremely low-power and compact 3mmx4mm/20mW event camera to perform natural hand-based gesture recognition for always-on smart eyewear. The camera's output is processed by a convolutional neural network (CNN) running on a NXP Nano UltraLite compute platform, consuming less than 350mW. Helios can recognize seven classes of gestures, including subtle microgestures like swipes and pinches, with 91% accuracy. We also demonstrate real-time performance across 20 users at a remarkably low latency of 60ms. Our user testing results align with the positive feedback we received during our recent successful demo at AWE-USA-2024.

computer vision and pattern recognition, helios, proceedings, (9 more...)

arXiv.org Artificial Intelligence

2407.05206

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(4 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine > Consumer Health (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision > Gesture Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.96)

Add feedback

Pixel Distillation: A New Knowledge Distillation Scheme for Low-Resolution Image Recognition

Guo, Guangyu, Zhang, Dingwen, Han, Longfei, Liu, Nian, Cheng, Ming-Ming, Han, Junwei

arXiv.org Artificial IntelligenceJul-10-2024

Abstract--Previous knowledge distillation (KD) methods mostly focus on compressing network architectures, which is not thorough enough in deployment as some costs like transmission bandwidth and imaging equipment are related to the image size. Therefore, we propose Pixel Distillation that extends knowledge distillation into the input level while simultaneously breaking architecture constraints. Such a scheme can achieve flexible cost control for deployment, as it allows the system to adjust both network architecture and image quality according to the overall requirement of resources. Specifically, we first propose an input spatial representation distillation (ISRD) mechanism to transfer spatial knowledge from large images to student's input module, which can facilitate stable knowledge transfer between CNN and ViT. Then, a Teacher-Assistant-Student (TAS) framework is further established to disentangle pixel distillation into the model compression stage and input compression stage, which significantly reduces the overall complexity of pixel distillation and the difficulty of distilling intermediate knowledge. Finally, we adapt pixel distillation to object detection via an aligned feature for preservation (AFP) strategy for TAS, which aligns output dimensions of detectors at each stage by manipulating features and anchors of the assistant. Comprehensive experiments on image classification and object detection demonstrate the effectiveness of our method. To deal with this situation, KD techniques that aim at using smaller network architectures received great attention Figure 1: (a) Compared to network architecture, input size has in the past few years--usually with fewer network an impact on more kinds of costs, including requirements for cameras and transmission bandwidth. Guangyu Guo is with Brain and Artificial Intelligence Laboratory, School of Automation, Northwestern Polytechnical University, Xi'an, China.

distillation, knowledge distillation, student, (15 more...)

arXiv.org Artificial Intelligence

2112.09532

Country:

Asia > China > Shaanxi Province > Xi'an (0.24)
Asia > China > Beijing > Beijing (0.04)
Asia > China > Anhui Province > Hefei (0.04)
(2 more...)

Genre: Research Report (0.81)

Industry: Education (0.96)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

Spanish TrOCR: Leveraging Transfer Learning for Language Adaptation

Lauar, Filipe, Laurent, Valentin

arXiv.org Artificial IntelligenceJul-9-2024

This study explores the transfer learning capabilities of the TrOCR architecture to Spanish. TrOCR is a transformer-based Optical Character Recognition (OCR) model renowned for its state-of-the-art performance in English benchmarks. Inspired by Li et al.'s assertion regarding its adaptability to multilingual text recognition, we investigate two distinct approaches to adapt the model to a new language: integrating an English TrOCR encoder with a language specific decoder and train the model on this specific language, and fine-tuning the English base TrOCR model on a new language data. Due to the scarcity of publicly available datasets, we present a resource-efficient pipeline for creating OCR datasets in any language, along with a comprehensive benchmark of the different image generation methods employed with a focus on Visual Rich Documents (VRDs). Additionally, we offer a comparative analysis of the two approaches for the Spanish language, demonstrating that fine-tuning the English TrOCR on Spanish yields superior recognition than the language specific decoder for a fixed dataset size. We evaluate our model employing character and word error rate metrics on a public available printed dataset, comparing the performance against other open-source and cloud OCR spanish models. As far as we know, these resources represent the best open-source model for OCR in Spanish. The Spanish TrOCR models are publicly available on HuggingFace [20] and the code to generate the dataset is available on Github [25].

dataset, recognition, trocr, (14 more...)

arXiv.org Artificial Intelligence

2407.0695

Country: Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

SegHist: A General Segmentation-based Framework for Chinese Historical Document Text Line Detection

Hu, Xingjian, Wei, Baole, Gao, Liangcai, Wang, Jun

arXiv.org Artificial IntelligenceJul-8-2024

Text line detection is a key task in historical document analysis facing many challenges of arbitrary-shaped text lines, dense texts, and text lines with high aspect ratios, etc. In this paper, we propose a general framework for historical document text detection (SegHist), enabling existing segmentation-based text detection methods to effectively address the challenges, especially text lines with high aspect ratios. Integrating the SegHist framework with the commonly used method DB++, we develop DB-SegHist. This approach achieves state-of-theart (SOTA) on the IACC2022CHDAC (CHDAC), MTHv2, and competitive results on ICDAR2019HDRC Chinese (HDRC) datasets, with a significant improvement of 1.19% on the most challenging CHDAC dataset which features more text lines with high aspect ratios. Moreover, our method attains SOTA on rotated MTHv2 and rotated HDRC, demonstrating its rotational robustness.

aspect ratio, detection, text line, (12 more...)

arXiv.org Artificial Intelligence

2406.15485

Country:

Asia > China > Beijing > Beijing (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.48)

Add feedback

Non-Cooperative Backdoor Attacks in Federated Learning: A New Threat Landscape

Nguyen, Tuan, Nguyen, Dung Thuy, Doan, Khoa D, Wong, Kok-Seng

arXiv.org Artificial IntelligenceJul-5-2024

Despite the promise of Federated Learning (FL) for privacy-preserving model training on distributed data, it remains susceptible to backdoor attacks. These attacks manipulate models by embedding triggers (specific input patterns) in the training data, forcing misclassification as predefined classes during deployment. Traditional single-trigger attacks and recent work on cooperative multiple-trigger attacks, where clients collaborate, highlight limitations in attack realism due to coordination requirements. We investigate a more alarming scenario: non-cooperative multiple-trigger attacks. Here, independent adversaries introduce distinct triggers targeting unique classes. These parallel attacks exploit FL's decentralized nature, making detection difficult. Our experiments demonstrate the alarming vulnerability of FL to such attacks, where individual backdoors can be successfully learned without impacting the main task. This research emphasizes the critical need for robust defenses against diverse backdoor attacks in the evolving FL landscape. While our focus is on empirical analysis, we believe it can guide backdoor research toward more realistic settings, highlighting the crucial role of FL in building robust defenses against diverse backdoor threats. The code is available at \url{https://anonymous.4open.science/r/nba-980F/}.

accuracy, backdoor accuracy, backdoor attack, (12 more...)

arXiv.org Artificial Intelligence

2407.07917

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Vietnam > Hanoi > Hanoi (0.04)
North America > United States > Tennessee > Davidson County > Nashville (0.04)
North America > United States > Illinois (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Complex Event Recognition with Symbolic Register Transducers: Extended Technical Report

Alevizos, Elias, Artikis, Alexander, Paliouras, Georgios

arXiv.org Artificial IntelligenceJul-3-2024

We present a system for Complex Event Recognition (CER) based on automata. While multiple such systems have been described in the literature, they typically suffer from a lack of clear and denotational semantics, a limitation which often leads to confusion with respect to their expressive power. In order to address this issue, our system is based on an automaton model which is a combination of symbolic and register automata. We extend previous work on these types of automata, in order to construct a formalism with clear semantics and a corresponding automaton model whose properties can be formally investigated. We call such automata Symbolic Register Transducers (SRT). We show that SRT are closed under various operators, but are not in general closed under complement and they are not determinizable. However, they are closed under these operations when a window operator, quintessential in Complex Event Recognition, is used. We show how SRT can be used in CER in order to detect patterns upon streams of events, using our framework that provides declarative and compositional semantics, and that allows for a systematic treatment of such automata. For SRT to work in pattern detection, we allow them to mark events from the input stream as belonging to a complex event or not, hence the name "transducers". We also present an implementation of SRT which can perform CER. We compare our SRT-based CER engine against other state-of-the-art CER systems and show that it is both more expressive and more efficient.

operator, srt, transition, (11 more...)

arXiv.org Artificial Intelligence

2407.02884

Country:

Europe > Greece (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Indonesia > Java > Yogyakarta > Yogyakarta (0.04)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.87)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.67)

Add feedback