AITopics | Pattern Recognition

Collaborating Authors

Pattern Recognition

"... the research area that studies the operation and design of systems that recognize patterns in data." It includes statistical methods like discriminant analysis, feature extraction, error estimation, cluster analysis.
– Pattern Recognition Laboratory at Delft University of Technology

News Overviews Instructional Materials AI-Alerts Classics

Anonymization of Documents for Law Enforcement with Machine Learning

Eberhardinger, Manuel, Takenaka, Patrick, Grießhaber, Daniel, Maucher, Johannes

arXiv.org Artificial IntelligenceJan-13-2025

The steadily increasing utilization of data-driven methods and approaches in areas that handle sensitive personal information such as in law enforcement mandates an ever increasing effort in these institutions to comply with data protection guidelines. In this work, we present a system for automatically anonymizing images of scanned documents, reducing manual effort while ensuring data protection compliance. Our method considers the viability of further forensic processing after anonymization by minimizing automatically redacted areas by combining automatic detection of sensitive regions with knowledge from a manually anonymized reference document. Using a self-supervised image model for instance retrieval of the reference document, our approach requires only one anonymized example to efficiently redact all documents of the same type, significantly reducing processing time. We show that our approach outperforms both a purely automatic redaction system and also a naive copy-paste scheme of the reference anonymization to other documents on a hand-crafted dataset of ground truth redactions.

machine learning, pattern recognition, reference document, (17 more...)

arXiv.org Artificial Intelligence

2501.07334

Country: Europe > Germany > Baden-Württemberg (0.16)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(2 more...)

Add feedback

InstructOCR: Instruction Boosting Scene Text Spotting

Duan, Chen, Jiang, Qianyi, Fu, Pei, Chen, Jiamin, Li, Shengxi, Wang, Zining, Guo, Shan, Luo, Junfeng

arXiv.org Artificial IntelligenceJan-13-2025

In the field of scene text spotting, previous OCR methods primarily relied on image encoders and pre-trained text information, but they often overlooked the advantages of incorporating human language instructions. To address this gap, we propose InstructOCR, an innovative instruction-based scene text spotting model that leverages human language instructions to enhance the understanding of text within images. Our framework employs both text and image encoders during training and inference, along with instructions meticulously designed based on text attributes. This approach enables the model to interpret text more accurately and flexibly. Extensive experiments demonstrate the effectiveness of our model and we achieve state-of-the-art results on widely used benchmarks. Furthermore, the proposed framework can be seamlessly applied to scene text VQA tasks. By leveraging instruction strategies during pre-training, the performance on downstream VQA tasks can be significantly improved, with a 2.6% increase on the TextVQA dataset and a 2.1% increase on the ST-VQA dataset. These experimental results provide insights into the benefits of incorporating human language instructions for OCR-related tasks.

machine learning, natural language, pattern recognition, (14 more...)

arXiv.org Artificial Intelligence

2412.15523

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.70)

Add feedback

eKalibr: Dynamic Intrinsic Calibration for Event Cameras From First Principles of Events

Chen, Shuolong, Li, Xingxing, Yuan, Liu, Liu, Ziao

arXiv.org Artificial IntelligenceJan-9-2025

The bio-inspired event camera has garnered extensive research attention in recent years, owing to its significant potential derived from its high dynamic range and low latency characteristics. Similar to the standard camera, the event camera requires precise intrinsic calibration to facilitate further high-level visual applications, such as pose estimation and mapping. While several calibration methods for event cameras have been proposed, most of them are either (i) engineering-driven, heavily relying on conventional image-based calibration pipelines, or (ii) inconvenient, requiring complex instrumentation. To this end, we propose an accurate and convenient intrinsic calibration method for event cameras, named eKalibr, which builds upon a carefully designed event-based circle grid pattern recognition algorithm. To extract target patterns from events, we perform event-based normal flow estimation to identify potential events generated by circle edges, and cluster them spatially. Subsequently, event clusters associated with the same grid circles are matched and grouped using normal flows, for subsequent time-varying ellipse estimation. Fitted ellipse centers are time-synchronized, for final grid pattern recognition. We conducted extensive experiments to evaluate the performance of eKalibr in terms of pattern extraction and intrinsic calibration. The implementation of eKalibr is open-sourced at (https://github.com/Unsigned-Long/eKalibr) to benefit the research community.

calibration, machine learning, pattern recognition, (16 more...)

arXiv.org Artificial Intelligence

2501.05688

Country: Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.76)

Add feedback

Geometry Restoration and Dewarping of Camera-Captured Document Images

Istomin, Valery, Pereziabov, Oleg, Afanasyev, Ilya

arXiv.org Artificial IntelligenceJan-9-2025

This research focuses on developing a method for restoring the topology of digital images of paper documents captured by a camera, using algorithms for detection, segmentation, geometry restoration, and dewarping. Our methodology employs deep learning (DL) for document outline detection, followed by computer vision (CV) to create a topological 2D grid using cubic polynomial interpolation and correct nonlinear distortions by remapping the image. Using classical CV methods makes the document topology restoration process more efficient and faster, as it requires significantly fewer computational resources and memory. We developed a new pipeline for automatic document dewarping and reconstruction, along with a framework and annotated dataset to demonstrate its efficiency. Our experiments confirm the promise of our methodology and its superiority over existing benchmarks (including mobile apps and popular DL solutions, such as RectiNet, DocGeoNet, and DocTr++) both visually and in terms of document readability via Optical Character Recognition (OCR) and geometry restoration metrics. This paves the way for creating high-quality digital copies of paper documents and enhancing the efficiency of OCR systems. Project page: https://github.com/HorizonParadox/DRCCBI

algorithm, document image, geometry restoration, (15 more...)

arXiv.org Artificial Intelligence

2501.03145

Country:

Asia > Russia (0.04)
Europe > Russia > Northwestern Federal District > Leningrad Oblast > Saint Petersburg (0.04)
Asia > India (0.04)
(8 more...)

Genre: Research Report (0.81)

Industry:

Health & Medicine (0.67)
Media > Photography (0.48)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.89)

Add feedback

Deep Learning in Palmprint Recognition-A Comprehensive Survey

Gao, Chengrui, Yang, Ziyuan, Jia, Wei, Leng, Lu, Zhang, Bob, Teoh, Andrew Beng Jin

arXiv.org Artificial IntelligenceJan-2-2025

Palmprint recognition has emerged as a prominent biometric technology, widely applied in diverse scenarios. Traditional handcrafted methods for palmprint recognition often fall short in representation capability, as they heavily depend on researchers' prior knowledge. Deep learning (DL) has been introduced to address this limitation, leveraging its remarkable successes across various domains. While existing surveys focus narrowly on specific tasks within palmprint recognition-often grounded in traditional methodologies-there remains a significant gap in comprehensive research exploring DL-based approaches across all facets of palmprint recognition. This paper bridges that gap by thoroughly reviewing recent advancements in DL-powered palmprint recognition. The paper systematically examines progress across key tasks, including region-of-interest segmentation, feature extraction, and security/privacy-oriented challenges. Beyond highlighting these advancements, the paper identifies current challenges and uncovers promising opportunities for future research. By consolidating state-of-the-art progress, this review serves as a valuable resource for researchers, enabling them to stay abreast of cutting-edge technologies and drive innovation in palmprint recognition.

palmprint recognition, recognition, zhang, (14 more...)

arXiv.org Artificial Intelligence

2501.01166

Country:

Asia > Macao (0.14)
Asia > China > Anhui Province > Hefei (0.04)
Asia > China > Jiangxi Province > Nanchang (0.04)
(16 more...)

Genre:

Research Report > Promising Solution (0.93)
Overview > Innovation (0.66)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

Fu, Ling, Yang, Biao, Kuang, Zhebin, Song, Jiajun, Li, Yuzhe, Zhu, Linghao, Luo, Qidi, Wang, Xinyu, Lu, Hao, Huang, Mingxin, Li, Zhang, Tang, Guozhi, Shan, Bin, Lin, Chunhui, Liu, Qi, Wu, Binghong, Feng, Hao, Liu, Hao, Huang, Can, Tang, Jingqun, Chen, Wei, Jin, Lianwen, Liu, Yuliang, Bai, Xiang

arXiv.org Artificial IntelligenceDec-31-2024

Scoring the Optical Character Recognition (OCR) capabilities of Large Multimodal Models (LMMs) has witnessed growing interest recently. Existing benchmarks have highlighted the impressive performance of LMMs in text recognition; however, their abilities on certain challenging tasks, such as text localization, handwritten content extraction, and logical reasoning, remain underexplored. To bridge this gap, we introduce OCRBench v2, a large-scale bilingual text-centric benchmark with currently the most comprehensive set of tasks (4x more tasks than the previous multi-scene benchmark OCRBench), the widest coverage of scenarios (31 diverse scenarios including street scene, receipt, formula, diagram, and so on), and thorough evaluation metrics, with a total of 10,000 human-verified question-answering pairs and a high proportion of difficult samples. After carefully benchmarking state-of-the-art LMMs on OCRBench v2, we find that 20 out of 22 LMMs score below 50 (100 in total) and suffer from five-type limitations, including less frequently encountered text recognition, fine-grained perception, layout perception, complex element parsing, and logical reasoning. The benchmark and evaluation scripts are available at https://github.com/Yuliang-liu/MultimodalOCR.

large language model, machine learning, pattern recognition, (20 more...)

arXiv.org Artificial Intelligence

2501.00321

Country:

South America (0.04)
Africa (0.04)
North America > United States > Wisconsin > Marathon County > Wausau (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry:

Education (0.67)
Banking & Finance (0.45)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.91)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.68)

Add feedback

VoxVietnam: a Large-Scale Multi-Genre Dataset for Vietnamese Speaker Recognition

Vu, Hoang Long, Dat, Phuong Tuan, Nhi, Pham Thao, Hao, Nguyen Song, Trang, Nguyen Thi Thu

arXiv.org Artificial IntelligenceDec-31-2024

Recent research in speaker recognition aims to address vulnerabilities due to variations between enrolment and test utterances, particularly in the multi-genre phenomenon where the utterances are in different speech genres. Previous resources for Vietnamese speaker recognition are either limited in size or do not focus on genre diversity, leaving studies in multi-genre effects unexplored. This paper introduces VoxVietnam, the first multi-genre dataset for Vietnamese speaker recognition with over 187,000 utterances from 1,406 speakers and an automated pipeline to construct a dataset on a large scale from public sources. Our experiments show the challenges posed by the multi-genre phenomenon to models trained on a single-genre dataset, and demonstrate a significant increase in performance upon incorporating the VoxVietnam into the training process. Our experiments are conducted to study the challenges of the multi-genre phenomenon in speaker recognition and the performance gain when the proposed dataset is used for multi-genre training.

machine learning, pattern recognition, utterance, (17 more...)

arXiv.org Artificial Intelligence

2501.00328

Country: Asia > Vietnam (0.21)

Genre: Research Report (0.64)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Speech Recognition (1.00)

Add feedback

Dual Channel Multi-Attention in ViT for Biometric Authentication using Forehead Subcutaneous Vein Pattern and Periocular Pattern

Sharma, Arun K., Bhattacharya, Shubhobrata, Reza, Motahar

arXiv.org Artificial IntelligenceDec-26-2024

Traditional biometric systems, like face and fingerprint recognition, have encountered significant setbacks due to wearing face masks and hygiene concerns. To meet the challenges of the partially covered face due to face masks and hygiene concerns of fingerprint recognition, this paper proposes a novel dual-channel multi-attention Vision Transformer (ViT) framework for biometric authentication using forehead subcutaneous vein patterns and periocular patterns, offering a promising alternative to traditional methods, capable of performing well even with face masks and without any physical touch. The proposed framework leverages a dual-channel ViT architecture, designed to handle two distinct biometric traits. It can capture long-range dependencies of independent features from the vein and periocular patterns. A custom classifier is then designed to integrate the independently extracted features, producing a final class prediction. The performance of the proposed algorithm was rigorously evaluated using the Forehead Subcutaneous Vein Pattern and Periocular Biometric Pattern (FSVP-PBP) database. The results demonstrated the superiority of the algorithm over state-of-the-art methods, achieving remarkable classification accuracy of $99.3 \pm 0.02\%$ with the combined vein and periocular patterns.

machine learning, pattern recognition, recognition, (16 more...)

arXiv.org Artificial Intelligence

2412.1916

Country:

Asia > China (0.68)
Asia > India > West Bengal (0.28)

Genre: Research Report > Promising Solution (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(2 more...)

Add feedback

HAND: Hierarchical Attention Network for Multi-Scale Handwritten Document Recognition and Layout Analysis

Hamdan, Mohammed, Rahiche, Abderrahmane, Cheriet, Mohamed

arXiv.org Artificial IntelligenceDec-25-2024

Handwritten document recognition (HDR) is one of the most challenging tasks in the field of computer vision, due to the various writing styles and complex layouts inherent in handwritten texts. Traditionally, this problem has been approached as two separate tasks, handwritten text recognition and layout analysis, and struggled to integrate the two processes effectively. This paper introduces HAND (Hierarchical Attention Network for Multi-Scale Document), a novel end-to-end and segmentation-free architecture for simultaneous text recognition and layout analysis tasks. Our model's key components include an advanced convolutional encoder integrating Gated Depth-wise Separable and Octave Convolutions for robust feature extraction, a Multi-Scale Adaptive Processing (MSAP) framework that dynamically adjusts to document complexity and a hierarchical attention decoder with memory-augmented and sparse attention mechanisms. These components enable our model to scale effectively from single-line to triple-column pages while maintaining computational efficiency. Additionally, HAND adopts curriculum learning across five complexity levels. To improve the recognition accuracy of complex ancient manuscripts, we fine-tune and integrate a Domain-Adaptive Pre-trained mT5 model for post-processing refinement. Extensive evaluations on the READ 2016 dataset demonstrate the superior performance of HAND, achieving up to 59.8% reduction in CER for line-level recognition and 31.2% for page-level recognition compared to state-of-the-art methods. The model also maintains a compact size of 5.60M parameters while establishing new benchmarks in both text recognition and layout analysis. Source code and pre-trained models are available at : https://github.com/MHHamdan/HAND.

machine learning, pattern recognition, recognition, (21 more...)

arXiv.org Artificial Intelligence

2412.18981

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > Indonesia > Bali (0.04)

Genre: Research Report > Promising Solution (0.48)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Computing Approximate Graph Edit Distance via Optimal Transport

Cheng, Qihao, Yan, Da, Wu, Tianhao, Huang, Zhongyi, Zhang, Qin

arXiv.org Artificial IntelligenceDec-25-2024

Given a graph pair $(G^1, G^2)$, graph edit distance (GED) is defined as the minimum number of edit operations converting $G^1$ to $G^2$. GED is a fundamental operation widely used in many applications, but its exact computation is NP-hard, so the approximation of GED has gained a lot of attention. Data-driven learning-based methods have been found to provide superior results compared to classical approximate algorithms, but they directly fit the coupling relationship between a pair of vertices from their vertex features. We argue that while pairwise vertex features can capture the coupling cost (discrepancy) of a pair of vertices, the vertex coupling matrix should be derived from the vertex-pair cost matrix through a more well-established method that is aware of the global context of the graph pair, such as optimal transport. In this paper, we propose an ensemble approach that integrates a supervised learning-based method and an unsupervised method, both based on optimal transport. Our learning method, GEDIOT, is based on inverse optimal transport that leverages a learnable Sinkhorn algorithm to generate the coupling matrix. Our unsupervised method, GEDGW, models GED computation as a linear combination of optimal transport and its variant, Gromov-Wasserstein discrepancy, for node and edge operations, respectively, which can be solved efficiently without needing the ground truth. Our ensemble method, GEDHOT, combines GEDIOT and GEDGW to further boost the performance. Extensive experiments demonstrate that our methods significantly outperform the existing methods in terms of the performance of GED computation, edit path generation, and model generalizability.

machine learning, node, pattern recognition, (16 more...)

arXiv.org Artificial Intelligence

2412.18857

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.32)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.68)

Add feedback