AITopics | Pattern Recognition

Collaborating Authors

Pattern Recognition

"... the research area that studies the operation and design of systems that recognize patterns in data." It includes statistical methods like discriminant analysis, feature extraction, error estimation, cluster analysis.
– Pattern Recognition Laboratory at Delft University of Technology

News Overviews Instructional Materials AI-Alerts Classics

Image Based Character Recognition, Documentation System To Decode Inscription From Temple

G, Velmathi, M, Shangavelan, D, Harish, S, Krithikshun M

arXiv.org Artificial IntelligenceMay-21-2024

This project undertakes the training and analysis of optical character recognition OCR methods applied to 10th century ancient Tamil inscriptions discovered on the walls of the Brihadeeswarar Temple.The chosen OCR methods include Tesseract,a widely used OCR engine,using modern ICR techniques to pre process the raw data and a box editing software to finetune our model.The analysis with Tesseract aims to evaluate their effectiveness in accurately deciphering the nuances of the ancient Tamil characters.The performance of our model for the dataset are determined by their accuracy rate where the evaluated dataset divided into training set and testing set.By addressing the unique challenges posed by the script's historical context,this study seeks to contribute valuable insights to the broader field of OCR,facilitating improved preservation and interpretation of ancient inscriptions

inscription, pixel, temple inscription, (12 more...)

arXiv.org Artificial Intelligence

2405.17449

Country: Asia > India > Tamil Nadu > Chennai (0.06)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.85)

Add feedback

RetinaRegNet: A Versatile Approach for Retinal Image Registration

Sivaraman, Vishal Balaji, Imran, Muhammad, Wei, Qingyue, Muralidharan, Preethika, Tamplin, Michelle R., Grumbach, Isabella M ., Kardon, Randy H., Wang, Jui-Kai, Zhou, Yuyin, Shao, Wei

arXiv.org Artificial IntelligenceMay-20-2024

We introduce the RetinaRegNet model, which can achieve state-of-the-art performance across various retinal image registration tasks. RetinaRegNet does not require training on any retinal images. It begins by establishing point correspondences between two retinal images using image features derived from diffusion models. This process involves the selection of feature points from the moving image using the SIFT algorithm alongside random point sampling. For each selected feature point, a 2D correlation map is computed by assessing the similarity between the feature vector at that point and the feature vectors of all pixels in the fixed image. The pixel with the highest similarity score in the correlation map corresponds to the feature point in the moving image. To remove outliers in the estimated point correspondences, we first applied an inverse consistency constraint, followed by a transformation-based outlier detector. This method proved to outperform the widely used random sample consensus (RANSAC) outlier detector by a significant margin. To handle large deformations, we utilized a two-stage image registration framework. A homography transformation was used in the first stage and a more accurate third-order polynomial transformation was used in the second stage. The model's effectiveness was demonstrated across three retinal image datasets: color fundus images, fluorescein angiography images, and laser speckle flowgraphy images. RetinaRegNet outperformed current state-of-the-art methods in all three datasets. It was especially effective for registering image pairs with large displacement and scaling deformations. This innovation holds promise for various applications in retinal image analysis. Our code is publicly available at https://github.com/mirthAI/RetinaRegNet .

artificial intelligence, machine learning, pattern recognition, (19 more...)

arXiv.org Artificial Intelligence

2404.16017

Country:

North America > United States > Florida > Alachua County > Gainesville (0.14)
North America > United States > California > Santa Cruz County > Santa Cruz (0.14)
North America > United States > Iowa (0.05)
(5 more...)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

FFF: Fixing Flawed Foundations in contrastive pre-training results in very strong Vision-Language models

Bulat, Adrian, Ouali, Yassine, Tzimiropoulos, Georgios

arXiv.org Artificial IntelligenceMay-16-2024

Despite noise and caption quality having been acknowledged as important factors impacting vision-language contrastive pre-training, in this paper, we show that the full potential of improving the training process by addressing such issues is yet to be realized. Specifically, we firstly study and analyze two issues affecting training: incorrect assignment of negative pairs, and low caption quality and diversity. Then, we devise effective solutions for addressing both problems, which essentially require training with multiple true positive pairs. Finally, we propose training with sigmoid loss to address such a requirement. We show very large gains over the current state-of-the-art for both image recognition ($\sim +6\%$ on average over 11 datasets) and image retrieval ($\sim +19\%$ on Flickr30k and $\sim +15\%$ on MSCOCO).

augmentation, caption, dataset, (13 more...)

arXiv.org Artificial Intelligence

2405.10286

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Sports > Motorsports (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.34)

Add feedback

HAAP: Vision-context Hierarchical Attention Autoregressive with Adaptive Permutation for Scene Text Recognition

Chen, Honghui, Qiu, Yuhang, Wang, Jiabao, Chen, Pingping, Ling, Nam

arXiv.org Artificial IntelligenceMay-15-2024

Internal Language Model (LM)-based methods use permutation language modeling (PLM) to solve the error correction caused by conditional independence in external LM-based methods. However, random permutations of human interference cause fit oscillations in the model training, and Iterative Refinement (IR) operation to improve multimodal information decoupling also introduces additional overhead. To address these issues, this paper proposes the Hierarchical Attention autoregressive Model with Adaptive Permutation (HAAP) to enhance the location-context-image interaction capability, improving autoregressive generalization with internal LM. First, we propose Implicit Permutation Neurons (IPN) to generate adaptive attention masks to dynamically exploit token dependencies. The adaptive masks increase the diversity of training data and prevent model dependency on a specific order. It reduces the training overhead of PLM while avoiding training fit oscillations. Second, we develop Cross-modal Hierarchical Attention mechanism (CHA) to couple context and image features. This processing establishes rich positional semantic dependencies between context and image while avoiding IR. Extensive experimental results show the proposed HAAP achieves state-of-the-art (SOTA) performance in terms of accuracy, complexity, and latency on several datasets.

recognition, scene text recognition, text recognition, (12 more...)

arXiv.org Artificial Intelligence

2405.09125

Country:

Asia > China > Fujian Province > Fuzhou (0.04)
Oceania > Australia (0.04)
North America > United States > California > Santa Clara County > Santa Clara (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Multi-Cell Decoder and Mutual Learning for Table Structure and Character Recognition

Kawakatsu, Takaya

arXiv.org Artificial IntelligenceMay-12-2024

Extracting table contents from documents such as scientific papers and financial reports and converting them into a format that can be processed by large language models is an important task in knowledge information processing. End-to-end approaches, which recognize not only table structure but also cell contents, achieved performance comparable to state-of-the-art models using external character recognition systems, and have potential for further improvements. In addition, these models can now recognize long tables with hundreds of cells by introducing local attention. However, the models recognize table structure in one direction from the header to the footer, and cell content recognition is performed independently for each cell, so there is no opportunity to retrieve useful information from the neighbor cells. In this paper, we propose a multi-cell content decoder and bidirectional mutual learning mechanism to improve the end-to-end approach. The effectiveness is demonstrated on two large datasets, and the experimental results show comparable performance to state-of-the-art models, even for long tables with large numbers of cells.

decoder, recognition, structure recognition, (10 more...)

arXiv.org Artificial Intelligence

2404.13268

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Banking & Finance (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.84)

Add feedback

The Sparse Tsetlin Machine: Sparse Representation with Active Literals

Østby, Sebastian, Brambo, Tobias M., Glimsdal, Sondre

arXiv.org Artificial IntelligenceMay-11-2024

This paper introduces the Sparse Tsetlin Machine (STM), a novel Tsetlin Machine (TM) that processes sparse data efficiently. Traditionally, the TM does not consider data characteristics such as sparsity, commonly seen in NLP applications and other bag-of-word-based representations. Consequently, a TM must initialize, store, and process a significant number of zero values, resulting in excessive memory usage and computational time. Previous attempts at creating a sparse TM have predominantly been unsuccessful, primarily due to their inability to identify which literals are sufficient for TM training. By introducing Active Literals (AL), the STM can focus exclusively on literals that actively contribute to the current data representation, significantly decreasing memory footprint and computational time while demonstrating competitive classification performance.

dataset, representation, stm, (14 more...)

arXiv.org Artificial Intelligence

2405.02375

Country: Europe > Norway (0.04)

Genre: Research Report (0.64)

Industry:

Information Technology (0.69)
Media (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.47)

Add feedback

Architecture of a Cortex Inspired Hierarchical Event Recaller

Varona, Valentin Puente

arXiv.org Artificial IntelligenceMay-3-2024

This paper proposes a new approach to Machine Learning (ML) that focuses on unsupervised continuous context-dependent learning of complex patterns. Although the proposal is partly inspired by some of the current knowledge about the structural and functional properties of the mammalian brain, we do not claim that biological systems work in an analogous way (nor the opposite). Based on some properties of the cerebellar cortex and adjacent structures, a proposal suitable for practical problems is presented. A synthetic structure capable of identifying and predicting complex temporal series will be defined and experimentally tested. The system relies heavily on prediction to help identify and learn patterns based on previously acquired contextual knowledge. As a proof of concept, the proposed system is shown to be able to learn, identify and predict a remarkably complex temporal series such as human speech, with no prior knowledge. From raw data, without any adaptation in the core algorithm, the system is able to identify certain speech structures from a set of Spanish sentences. Unlike conventional ML, the proposal can learn with a reduced training set. Although the idea can be applied to a constrained problem, such as the detection of unknown vocabulary in a speech, it could be used in more applications, such as vision, or (by incorporating the missing biological periphery) fit into other ML techniques. Given the trivial computational primitives used, a potential hardware implementation will be remarkably frugal. Coincidentally, the proposed model not only conforms to a plausible functional framework for biological systems but may also explain many elusive cognitive phenomena.

cortex, prediction, sequence, (17 more...)

arXiv.org Artificial Intelligence

2405.02371

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Spain > Cantabria > Santander (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Neuroscience (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.87)

Add feedback

Self-Supervised Learning for Interventional Image Analytics: Towards Robust Device Trackers

Islam, Saahil, Murthy, Venkatesh N., Neumann, Dominik, Das, Badhan Kumar, Sharma, Puneet, Maier, Andreas, Comaniciu, Dorin, Ghesu, Florin C.

arXiv.org Artificial IntelligenceMay-2-2024

An accurate detection and tracking of devices such as guiding catheters in live X-ray image acquisitions is an essential prerequisite for endovascular cardiac interventions. This information is leveraged for procedural guidance, e.g., directing stent placements. To ensure procedural safety and efficacy, there is a need for high robustness no failures during tracking. To achieve that, one needs to efficiently tackle challenges, such as: device obscuration by contrast agent or other external devices or wires, changes in field-of-view or acquisition angle, as well as the continuous movement due to cardiac and respiratory motion. To overcome the aforementioned challenges, we propose a novel approach to learn spatio-temporal features from a very large data cohort of over 16 million interventional X-ray frames using self-supervision for image sequence data. Our approach is based on a masked image modeling technique that leverages frame interpolation based reconstruction to learn fine inter-frame temporal correspondences. The features encoded in the resulting model are fine-tuned downstream. Our approach achieves state-of-the-art performance and in particular robustness compared to ultra optimized reference solutions (that use multi-stage feature fusion, multi-task and flow regularization). The experiments show that our method achieves 66.31% reduction in maximum tracking error against reference solutions (23.20% when flow regularization is used); achieving a success score of 97.95% at a 3x faster inference speed of 42 frames-per-second (on GPU). The results encourage the use of our approach in various other tasks within interventional image analytics that require effective understanding of spatio-temporal semantics.

proceedings, sequence, tracking, (11 more...)

arXiv.org Artificial Intelligence

2405.01156

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
South America > Peru > Lima Department > Lima Province > Lima (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(2 more...)

Genre:

Research Report > Promising Solution (0.48)
Overview > Innovation (0.34)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.64)

Add feedback

Self-supervised Pre-training of Text Recognizers

Kišš, Martin, Hradiš, Michal

arXiv.org Artificial IntelligenceMay-1-2024

In this paper, we investigate self-supervised pre-training methods for document text recognition. Nowadays, large unlabeled datasets can be collected for many research tasks, including text recognition, but it is costly to annotate them. Therefore, methods utilizing unlabeled data are researched. We study self-supervised pre-training methods based on masked label prediction using three different approaches -- Feature Quantization, VQ-VAE, and Post-Quantized AE. We also investigate joint-embedding approaches with VICReg and NT-Xent objectives, for which we propose an image shifting technique to prevent model collapse where it relies solely on positional encoding while completely ignoring the input image. We perform our experiments on historical handwritten (Bentham) and historical printed datasets mainly to investigate the benefits of the self-supervised pre-training techniques with different amounts of annotated target domain data. We use transfer learning as strong baselines. The evaluation shows that the self-supervised pre-training on data from the target domain is very effective, but it struggles to outperform transfer learning from closely related domains. This paper is one of the first researches exploring self-supervised pre-training in document text recognition, and we believe that it will become a cornerstone for future research in this area. We made our implementation of the investigated methods publicly available at https://github.com/DCGM/pero-pretraining.

dataset, self-supervised pre-training, text recognition, (14 more...)

arXiv.org Artificial Intelligence

2405.0042

Country: Europe > Czechia > South Moravian Region > Brno (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Certification of Speaker Recognition Models to Additive Perturbations

Korzh, Dmitrii, Karimov, Elvir, Pautov, Mikhail, Rogov, Oleg Y., Oseledets, Ivan

arXiv.org Artificial IntelligenceApr-29-2024

Speaker recognition technology is applied in various tasks ranging from personal virtual assistants to secure access systems. However, the robustness of these systems against adversarial attacks, particularly to additive perturbations, remains a significant challenge. In this paper, we pioneer applying robustness certification techniques to speaker recognition, originally developed for the image domain. In our work, we cover this gap by transferring and improving randomized smoothing certification techniques against norm-bounded additive perturbations for classification and few-shot learning tasks to speaker recognition. We demonstrate the effectiveness of these methods on VoxCeleb 1 and 2 datasets for several models. We expect this work to improve voice-biometry robustness, establish a new certification benchmark, and accelerate research of certification methods in the audio domain.

attack radius attack, audio length, dependency, (13 more...)

arXiv.org Artificial Intelligence

2404.18791

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Russia (0.04)

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback