AITopics | Pattern Recognition

Collaborating Authors

Pattern Recognition

"... the research area that studies the operation and design of systems that recognize patterns in data." It includes statistical methods like discriminant analysis, feature extraction, error estimation, cluster analysis.
– Pattern Recognition Laboratory at Delft University of Technology

News Overviews Instructional Materials AI-Alerts Classics

Reviews: Bilevel Distance Metric Learning for Robust Image Recognition

Neural Information Processing SystemsOct-7-2024, 16:01:05 GMT

Summary: The authors propose a bilevel method for metric learning, where the lower level is responsible for the extraction of discriminative features from the data based on a sparse coding scheme with graph regularization. This effectively detects their underlying geometric structure, and the upper level is a classic metric learning approach that utilizes the learned sparse coefficients. These two components are integrated into a joint optimization problem and an efficient optimization algorithm is developed accordingly. Hence, new data can be classified based on the learned dictionary and the corresponding metric. In the experiments the authors demonstrate the capabilities of the model to provide more discriminative features from high dimensional data, while being more robust to noise.

bilevel distance metric learning, optimization algorithm, robust image recognition, (9 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.52)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.40)

Add feedback

Reviews: A Simple Cache Model for Image Recognition

Neural Information Processing SystemsOct-7-2024, 12:50:30 GMT

This paper presents a cache model to be used in image recognition tasks. The authors argue that class specific information can be retrieved from earlier layers of the network to improve the accuracy of an already trained model, without having to re-train of finetune. This is achieved by extracting and caching the activations of some layers along with the class at training time. At test time a similarity measure is used to calculate how far/close the input is compared to information stored in memory. Experiments show that performance is improved in CIFAR 10/100 and ImageNet.

cache model, image recognition, simple cache model, (4 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.64)

Add feedback

Masked Autoencoder with Swin Transformer Network for Mitigating Electrode Shift in HD-EMG-based Gesture Recognition

Laamerad, Kasra, Shabanpour, Mehran, Islam, Md. Rabiul, Mohammadi, Arash

arXiv.org Artificial IntelligenceOct-6-2024

Multi-channel surface Electromyography (sEMG), also referred to as high-density sEMG (HD-sEMG), plays a crucial role in improving gesture recognition performance for myoelectric control. Pattern recognition models developed based on HD-sEMG, however, are vulnerable to changing recording conditions (e.g., signal variability due to electrode shift). This has resulted in significant degradation in performance across subjects, and sessions. In this context, the paper proposes the Masked Autoencoder with Swin Transformer (MAST) framework, where training is performed on a masked subset of HDsEMG channels. A combination of four masking strategies, i.e., random block masking; temporal masking; sensor-wise random masking, and; multi-scale masking, is used to learn latent representations and increase robustness against electrode shift. The masked data is then passed through MAST's three-path encoder-decoder structure, leveraging a multi-path Swin-Unet architecture that simultaneously captures time-domain, frequency-domain, and magnitude-based features of the underlying HD-sEMG signal. These augmented inputs are then used in a self-supervised pre-training fashion to improve the model's generalization capabilities. Experimental results demonstrate the superior performance of the proposed MAST framework in comparison to its counterparts.

machine learning, pattern recognition, recognition, (15 more...)

arXiv.org Artificial Intelligence

2410.17261

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.35)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A large-scale operational study of fingerprint quality and demographics

Galbally, Javier, Cepilovs, Aleksandrs, Blanco-Gonzalo, Ramon, Ormiston, Gillian, Miguel-Hurtado, Oscar, Racz, Istvan Sz.

arXiv.org Artificial IntelligenceOct-4-2024

Abstract--Even though a few initial works have shown on small sets of data some level of bias in the performance of fingerprint recognition technology with respect to certain demographic groups, there is still not sufficient evidence to understand the impact that certain factors such as gender, age or finger-type may have on fingerprint quality and, in turn, also on fingerprint matching accuracy. The present work addresses this still under researched topic, on a large-scale database of operational data containing 10-print impressions of almost 16,000 subjects. The results reached provide further insight into the dependency of fingerprint quality and demographics, and show that there in fact exists a certain degree of performance variability in fingerprint-based recognition systems for different segments of the population. Based on the experimental evaluation, the work points out new observations based on data-driven evidence, provides plausible hypotheses to explain such observations, and concludes with potential follow-up actions that can help to reduce the observed fingerprint quality differences. This way, the current paper can be considered as a contribution to further increase the algorithmic fairness and equality of biometric technology. "It's not the size of the dog in the fight, it's the size of demographic group, why do some segments of the population the fight in the dog" - Mark Twain However, with the exception of a few studies, comprise more information than those of young children or this inconsistency in the recognition rates has been mainly elders? Why do each of the fingers (including the thumb) of observed on small-to-medium databases under laboratory the hand provide different accuracy performance in fingerprint conditions and, therefore, it is difficult to quantify to what recognition systems?

fingerprint, fingerprint quality, information, (16 more...)

arXiv.org Artificial Intelligence

2409.19992

Country:

Europe > Sweden (0.28)
North America > United States > Michigan (0.04)
Europe > France (0.04)
Europe > Estonia > Harju County > Tallinn (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > Europe Government (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.49)

Add feedback

Understanding and Mitigating Miscalibration in Prompt Tuning for Vision-Language Models

Wang, Shuoyuan, Li, Yixuan, Wei, Hongxin

arXiv.org Artificial IntelligenceOct-3-2024

Confidence calibration is critical for the safe deployment of machine learning models in the real world. However, such issue in vision-language models like CLIP, particularly after fine-tuning, has not been fully addressed. In this work, we demonstrate that existing prompt tuning methods usually lead to a trade-off of calibration between base and new classes: the cross-entropy loss in CoOp causes overconfidence in new classes by increasing textual label divergence, whereas the regularization of KgCoOp maintains the confidence level but results in underconfidence in base classes due to the improved accuracy. Inspired by the observations, we introduce Dynamic Outlier Regularization (DOR) to ensure the confidence calibration on both base and new classes after fine-tuning. In particular, we propose to minimize the feature deviation of novel textual labels (instead of base classes) sampled from a large vocabulary. In effect, DOR prevents the increase in textual divergence for new labels while easing restrictions on base classes. Extensive experiments demonstrate that DOR can enhance the calibration performance of current fine-tuning methods on base and new classes. Large pre-trained vision-language models (VLMs) like CLIP (Radford et al., 2021) have become the de facto standard in today's zero-shot tasks including image recognition (Wortsman et al., 2022), open-vocabulary segmentation (Liang et al., 2023) and knowledge-augmented retrieval (Ming & Li, 2024). To transfer pre-trained CLIP knowledge to domain-specific downstream tasks efficiently, various parameter-efficient fine-tuning (PEFT) techniques including prompt tuning (Zhou et al., 2022b) and adapter (Gao et al., 2024) have been proposed. Despite the promising improvement in accuracy, the reliability issue such as confidence calibration in fine-tuned VLMs has been largely overlooked. Without fully understanding the miscalibration in fine-tuned VLMs, it can exacerbate safety concerns in high-stakes applications like medical diagnosis and autonomous driving.

base class, calibration, new class, (15 more...)

arXiv.org Artificial Intelligence

2410.02681

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > Canada > Newfoundland and Labrador > Labrador (0.04)
Europe > France (0.04)

Genre: Research Report (0.50)

Industry:

Transportation > Passenger (1.00)
Transportation > Air (1.00)
Automobiles & Trucks (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.35)

Add feedback

HATFormer: Historic Handwritten Arabic Text Recognition with Transformers

Chan, Adrian, Mijar, Anupam, Saeed, Mehreen, Wong, Chau-Wai, Khater, Akram

arXiv.org Artificial IntelligenceOct-2-2024

Arabic handwritten text recognition (HTR) is challenging, especially for historical texts, due to diverse writing styles and the intrinsic features of Arabic script. Additionally, Arabic handwriting datasets are smaller compared to English ones, making it difficult to train generalizable Arabic HTR models. To address these challenges, we propose HATFormer, a transformer-based encoder-decoder architecture that builds on a state-of-the-art English HTR model. By leveraging the transformer's attention mechanism, HATFormer captures spatial contextual information to address the intrinsic challenges of Arabic script through differentiating cursive characters, decomposing visual representations, and identifying diacritics. Our customization to historical handwritten Arabic includes an image processor for effective ViT information preprocessing, a text tokenizer for compact Arabic text representation, and a training pipeline that accounts for a limited amount of historic Arabic handwriting data. HATFormer achieves a character error rate (CER) of 8.6% on the largest public historical handwritten Arabic dataset, with a 51% improvement over the best baseline in the literature. HATFormer also attains a comparable CER of 4.2% on the largest private non-historical dataset. Our work demonstrates the feasibility of adapting an English HTR method to a low-resource language with complex, language-specific challenges, contributing to advancements in document digitization, information retrieval, and cultural preservation.

dataset, international conference, recognition, (14 more...)

arXiv.org Artificial Intelligence

2410.02179

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States (0.04)
Europe > Middle East (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Handwriting Recognition (0.88)

Add feedback

Knowledge Discovery using Unsupervised Cognition

Ibias, Alfredo, Antona, Hector, Ramirez-Miranda, Guillem, Guinovart, Enric

arXiv.org Artificial IntelligenceSep-30-2024

Knowledge discovery is key to understand and interpret a dataset, as well as to find the underlying relationships between its components. Unsupervised Cognition is a novel unsupervised learning algorithm that focus on modelling the learned data. This paper presents three techniques to perform knowledge discovery over an already trained Unsupervised Cognition model. Specifically, we present a technique for pattern mining, a technique for feature selection based on the previous pattern mining technique, and a technique for dimensionality reduction based on the previous feature selection technique. The final goal is to distinguish between relevant and irrelevant features and use them to build a model from which to extract meaningful patterns. We evaluated our proposals with empirical experiments and found that they overcome the state-of-the-art in knowledge discovery.

algorithm, representation, unsupervised cognition model, (14 more...)

arXiv.org Artificial Intelligence

2409.20064

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
North America > United States > Hawaii (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Data Science > Data Mining > Knowledge Discovery (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)

Add feedback

JaPOC: Japanese Post-OCR Correction Benchmark using Vouchers

Fujitake, Masato

arXiv.org Artificial IntelligenceSep-30-2024

In this paper, we create benchmarks and assess the effectiveness of error correction methods for Japanese vouchers in OCR (Optical Character Recognition) systems. It is essential for automation processing to correctly recognize scanned voucher text, such as the company name on invoices. However, perfect recognition is complex due to the noise, such as stamps. Therefore, it is crucial to correctly rectify erroneous OCR results. However, no publicly available OCR error correction benchmarks for Japanese exist, and methods have not been adequately researched. In this study, we measured text recognition accuracy by existing services on Japanese vouchers and developed a post-OCR correction benchmark. Then, we proposed simple baselines for error correction using language models and verified whether the proposed method could effectively correct these errors. In the experiments, the proposed error correction algorithm significantly improved overall recognition accuracy.

accuracy, benchmark, correction, (10 more...)

arXiv.org Artificial Intelligence

2409.19948

Country: Asia > Japan (0.05)

Genre: Research Report > New Finding (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.72)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.70)

Add feedback

Gesture Recognition for Feedback Based Mixed Reality and Robotic Fabrication: A Case Study of the UnLog Tower

Kyaw, Alexander Htet, Spencer, Lawson, Zivkovic, Sasa, Lok, Leslie

arXiv.org Artificial IntelligenceSep-28-2024

Mixed Reality (MR) platforms enable users to interact with three-dimensional holographic instructions during the assembly and fabrication of highly custom and parametric architectural constructions without the necessity of two-dimensional drawings. Previous MR fabrication projects have primarily relied on digital menus and custom buttons as the interface for user interaction with the MR environment. Despite this approach being widely adopted, it is limited in its ability to allow for direct human interaction with physical objects to modify fabrication instructions within the MR environment. This research integrates user interactions with physical objects through real-time gesture recognition as input to modify, update or generate new digital information enabling reciprocal stimuli between the physical and the virtual environment. Consequently, the digital environment is generative of the user's provided interaction with physical objects to allow seamless feedback in the fabrication process. This research investigates gesture recognition for feedback-based MR workflows for robotic fabrication, human assembly, and quality control in the construction of the UnLog Tower.

gesture recognition, interaction, workflow, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-981-99-8405-3_28

2409.19281

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > New York > Tompkins County > Ithaca (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
(4 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Vision > Gesture Recognition (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.83)

Add feedback

CodeSCAN: ScreenCast ANalysis for Video Programming Tutorials

Naumann, Alexander, Hertlein, Felix, Höllig, Jacqueline, Cazzonelli, Lucas, Thoma, Steffen

arXiv.org Artificial IntelligenceSep-27-2024

Programming tutorials in the form of coding screencasts play a crucial role in programming education, serving both novices and experienced developers. However, the video format of these tutorials presents a challenge due to the difficulty of searching for and within videos. Addressing the absence of large-scale and diverse datasets for screencast analysis, we introduce the CodeSCAN dataset. It comprises 12,000 screenshots captured from the Visual Studio Code environment during development, featuring 24 programming languages, 25 fonts, and over 90 distinct themes, in addition to diverse layout changes and realistic user interactions. Moreover, we conduct detailed quantitative and qualitative evaluations to benchmark the performance of Integrated Development Environment (IDE) element detection, color-to-black-and-white conversion, and Optical Character Recognition (OCR). We hope that our contributions facilitate more research in coding screencast analysis, and we make the source code for creating the dataset and the benchmark publicly available at a-nau.github.io/codescan.

machine learning, pattern recognition, programming language, (20 more...)

arXiv.org Artificial Intelligence

2409.18556

Country:

Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.05)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Software > Programming Languages (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.69)
(2 more...)

Add feedback