AITopics | Pattern Recognition

Collaborating Authors

Pattern Recognition

"... the research area that studies the operation and design of systems that recognize patterns in data." It includes statistical methods like discriminant analysis, feature extraction, error estimation, cluster analysis.
– Pattern Recognition Laboratory at Delft University of Technology

News Overviews Instructional Materials AI-Alerts Classics

On the Generalization of Handwritten Text Recognition Models

Garrido-Munoz, Carlos, Calvo-Zaragoza, Jorge

arXiv.org Artificial IntelligenceNov-26-2024

Recent advances in Handwritten Text Recognition (HTR) have led to significant reductions in transcription errors on standard benchmarks under the i.i.d. assumption, thus focusing on minimizing in-distribution (ID) errors. However, this assumption does not hold in real-world applications, which has motivated HTR research to explore Transfer Learning and Domain Adaptation techniques. In this work, we investigate the unaddressed limitations of HTR models in generalizing to out-of-distribution (OOD) data. We adopt the challenging setting of Domain Generalization, where models are expected to generalize to OOD data without any prior access. To this end, we analyze 336 OOD cases from eight state-of-the-art HTR models across seven widely used datasets, spanning five languages. Additionally, we study how HTR models leverage synthetic data to generalize. We reveal that the most significant factor for generalization lies in the textual divergence between domains, followed by visual divergence. We demonstrate that the error of HTR models in OOD scenarios can be reliably estimated, with discrepancies falling below 10 points in 70\% of cases. We identify the underlying limitations of HTR models, laying the foundation for future research to address this challenge.

divergence, generalization, recognition, (14 more...)

arXiv.org Artificial Intelligence

2411.17332

Country:

North America > United States > New York > New York County > New York City (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Tennessee > Davidson County > Nashville (0.04)
(8 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Handwriting Recognition (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Text Recognition (0.63)

Add feedback

ENCLIP: Ensembling and Clustering-Based Contrastive Language-Image Pretraining for Fashion Multimodal Search with Limited Data and Low-Quality Images

Naik, Prithviraj Purushottam, Agarwal, Rohit

arXiv.org Artificial IntelligenceNov-25-2024

Multimodal search has revolutionized the fashion industry, providing a seamless and intuitive way for users to discover and explore fashion items. Based on their preferences, style, or specific attributes, users can search for products by combining text and image information. Text-to-image searches enable users to find visually similar items or describe products using natural language. This paper presents an innovative approach called ENCLIP, for enhancing the performance of the Contrastive Language-Image Pretraining (CLIP) model, specifically in Multimodal Search targeted towards the domain of fashion intelligence. This method focuses on addressing the challenges posed by limited data availability and low-quality images. This paper proposes an algorithm that involves training and ensembling multiple instances of the CLIP model, and leveraging clustering techniques to group similar images together. The experimental findings presented in this study provide evidence of the effectiveness of the methodology. This approach unlocks the potential of CLIP in the domain of fashion intelligence, where data scarcity and image quality issues are prevalent. Overall, the ENCLIP method represents a valuable contribution to the field of fashion intelligence and provides a practical solution for optimizing the CLIP model in scenarios with limited data and low-quality images.

contrastive language-image pretraining, fine-tuned model, query, (12 more...)

arXiv.org Artificial Intelligence

2411.16096

Genre: Research Report > New Finding (0.35)

Industry: Textiles, Apparel & Luxury Goods (0.37)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Proceedings of the 6th International Workshop on Reading Music Systems

Calvo-Zaragoza, Jorge, Pacha, Alexander, Shatri, Elona

arXiv.org Artificial IntelligenceNov-24-2024

The International Workshop on Reading Music Systems (WoRMS) is a workshop that tries to connect researchers who develop systems for reading music, such as in the field of Optical Music Recognition, with other researchers and practitioners that could benefit from such systems, like librarians or musicologists. The relevant topics of interest for the workshop include, but are not limited to: Music reading systems; Optical music recognition; Datasets and performance evaluation; Image processing on music scores; Writer identification; Authoring, editing, storing and presentation systems for music scores; Multi-modal systems; Novel input-methods for music to produce written music; Web-based Music Information Retrieval services; Applications and projects; Use-cases related to written music. These are the proceedings of the 6th International Workshop on Reading Music Systems, held Online on November 22nd 2024.

large language model, machine learning, pattern recognition, (22 more...)

arXiv.org Artificial Intelligence

2411.15741

Country:

North America > Canada > Quebec > Montreal (0.14)
Europe > United Kingdom > England > Greater London > London (0.14)
Europe > France > Île-de-France > Paris > Paris (0.04)
(25 more...)

Genre:

Research Report > New Finding (1.00)
Workflow (0.93)
Research Report > Promising Solution (0.92)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Education > Curriculum > Subject-Specific Education (0.65)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Information Management > Search (1.00)
Information Technology > Human Computer Interaction (1.00)
(11 more...)

Add feedback

VisGraphVar: A Benchmark Generator for Assessing Variability in Graph Analysis Using Large Vision-Language Models

Sartori, Camilo Chacón, Blum, Christian, Bistaffa, Filippo

arXiv.org Artificial IntelligenceNov-22-2024

The fast advancement of Large Vision-Language Models (LVLMs) has shown immense potential. These models are increasingly capable of tackling abstract visual tasks. Geometric structures, particularly graphs with their inherent flexibility and complexity, serve as an excellent benchmark for evaluating these models' predictive capabilities. While human observers can readily identify subtle visual details and perform accurate analyses, our investigation reveals that state-of-the-art LVLMs exhibit consistent limitations in specific visual graph scenarios, especially when confronted with stylistic variations. In response to these challenges, we introduce VisGraphVar (Visual Graph Variability), a customizable benchmark generator able to produce graph images for seven distinct task categories (detection, classification, segmentation, pattern recognition, link prediction, reasoning, matching), designed to systematically evaluate the strengths and limitations of individual LVLMs. We use VisGraphVar to produce 990 graph images and evaluate six LVLMs, employing two distinct prompting strategies, namely zero-shot and chain-of-thought. The findings demonstrate that variations in visual attributes of images (e.g., node labeling and layout) and the deliberate inclusion of visual imperfections, such as overlapping nodes, significantly affect model performance. This research emphasizes the importance of a comprehensive evaluation across graph-related tasks, extending beyond reasoning alone. VisGraphVar offers valuable insights to guide the development of more reliable and robust systems capable of performing advanced visual graph analysis.

large language model, machine learning, pattern recognition, (21 more...)

arXiv.org Artificial Intelligence

2411.14832

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > New York > Nassau County > Mineola (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Health Care Technology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

A comprehensive survey of oracle character recognition: challenges, benchmarks, and beyond

Li, Jing, Chi, Xueke, Wang, Qiufeng, Wang, Dahan, Huang, Kaizhu, Liu, Yongge, Liu, Cheng-lin

arXiv.org Artificial IntelligenceNov-18-2024

Oracle character recognition-an analysis of ancient Chinese inscriptions found on oracle bones-has become a pivotal field intersecting archaeology, paleography, and historical cultural studies. Traditional methods of oracle character recognition have relied heavily on manual interpretation by experts, which is not only labor-intensive but also limits broader accessibility to the general public. With recent breakthroughs in pattern recognition and deep learning, there is a growing movement towards the automation of oracle character recognition (OrCR), showing considerable promise in tackling the challenges inherent to these ancient scripts. However, a comprehensive understanding of OrCR still remains elusive. Therefore, this paper presents a systematic and structured survey of the current landscape of OrCR research. We commence by identifying and analyzing the key challenges of OrCR. Then, we provide an overview of the primary benchmark datasets and digital resources available for OrCR. A review of contemporary research methodologies follows, in which their respective efficacies, limitations, and applicability to the complex nature of oracle characters are critically highlighted and examined. Additionally, our review extends to ancillary tasks associated with OrCR across diverse disciplines, providing a broad-spectrum analysis of its applications. We conclude with a forward-looking perspective, proposing potential avenues for future investigations that could yield significant advancements in the field.

character recognition, oracle character, recognition, (16 more...)

arXiv.org Artificial Intelligence

2411.11354

Country:

Asia > China > Zhejiang Province > Ningbo (0.04)
Asia > China > Fujian Province > Xiamen (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TS-ACL: A Time Series Analytic Continual Learning Framework for Privacy-Preserving and Class-Incremental Pattern Recognition

Fan, Kejia, Li, Jiaxu, Lai, Songning, Lv, Linpu, Liu, Anfeng, Tang, Jianheng, Song, Houbing Herbert, Yue, Yutao, Zhuang, Huiping

arXiv.org Artificial IntelligenceNov-18-2024

Class-incremental pattern recognition in time series is a significant problem, which aims to learn from continually arriving streaming data examples with incremental classes. A primary challenge in this problem is catastrophic forgetting, where the incorporation of new data samples causes the models to forget previously learned information. While the replay-based methods achieve promising results by storing historical data to address catastrophic forgetting, they come with the invasion of data privacy. On the other hand, the exemplar-free methods preserve privacy but suffer from significantly decreased accuracy. To address these challenges, we proposed TS-ACL, a novel Time Series Analytic Continual Learning framework for privacy-preserving and class-incremental pattern recognition. Identifying gradient descent as the root of catastrophic forgetting, TS-ACL transforms each update of the model into a gradient-free analytical learning process with a closed-form solution. By leveraging a pre-trained frozen encoder for embedding extraction, TS-ACL only needs to recursively update an analytic classifier in a lightweight manner. This way, TS-ACL simultaneously achieves non-forgetting, privacy preservation, and lightweight consumption, making it widely suitable for various applications, particularly in edge computing scenarios. Extensive experiments on five benchmark datasets confirm the superior and robust performance of TS-ACL compared to existing advanced methods. Code is available at https://github.com/asdasdczxczq/TS-ACL.

dataset, encoder, learning, (12 more...)

arXiv.org Artificial Intelligence

2410.15954

Country:

North America > United States > Maryland > Baltimore County (0.04)
North America > United States > Maryland > Baltimore (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Introduction to AI Safety, Ethics, and Society

Hendrycks, Dan

arXiv.org Artificial IntelligenceNov-15-2024

Artificial Intelligence is rapidly embedding itself within militaries, economies, and societies, reshaping their very foundations. Given the depth and breadth of its consequences, it has never been more pressing to understand how to ensure that AI systems are safe, ethical, and have a positive societal impact. This book aims to provide a comprehensive approach to understanding AI risk. Our primary goals include consolidating fragmented knowledge on AI risk, increasing the precision of core ideas, and reducing barriers to entry by making content simpler and more comprehensible. The book has been designed to be accessible to readers from diverse backgrounds. You do not need to have studied AI, philosophy, or other such topics. The content is skimmable and somewhat modular, so that you can choose which chapters to read. We introduce mathematical formulas in a few places to specify claims more precisely, but readers should be able to understand the main points without these.

totalenergies se, united nations, united states department of the interior, (89 more...)

arXiv.org Artificial Intelligence

2411.01042

Country:

Asia > Russia (1.00)
Asia > Middle East (0.92)
Europe > United Kingdom > England (0.45)
(3 more...)

Genre:

Workflow (1.00)
Summary/Review (1.00)
Research Report > Promising Solution (1.00)
(5 more...)

Industry:

Transportation > Passenger (1.00)
Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
(58 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Communications > Networks (1.00)
(20 more...)

Add feedback

GraphRPM: Risk Pattern Mining on Industrial Large Attributed Graphs

Tian, Sheng, Zeng, Xintan, Hu, Yifei, Wang, Baokun, Liu, Yongchao, Jin, Yue, Meng, Changhua, Hong, Chuntao, Zhang, Tianyi, Wang, Weiqiang

arXiv.org Artificial IntelligenceNov-11-2024

Graph-based patterns are extensively employed and favored by practitioners within industrial companies due to their capacity to represent the behavioral attributes and topological relationships among users, thereby offering enhanced interpretability in comparison to blackbox models commonly utilized for classification and recognition tasks. For instance, within the scenario of transaction risk management, a graph pattern that is characteristic of a particular risk category can be readily employed to discern transactions fraught with risk, delineate networks of criminal activity, or investigate the methodologies employed by fraudsters. Nonetheless, graph data in industrial settings is often characterized by its massive scale, encompassing data sets with millions or even billions of nodes, making the manual extraction of graph patterns not only labor-intensive but also necessitating specialized knowledge in particular domains of risk. Moreover, existing methodologies for mining graph patterns encounter significant obstacles when tasked with analyzing large-scale attributed graphs. In this work, we introduce GraphRPM, an industry-purpose parallel and distributed risk pattern mining framework on large attributed graphs. The framework incorporates a novel edge-involved graph isomorphism network (EGIN) alongside optimized operations for parallel graph computation, which collectively contribute to a considerable reduction in computational complexity and resource expenditure. Moreover, the intelligent filtration of efficacious risky graph patterns is facilitated by the proposed evaluation metrics. Comprehensive experimental evaluations conducted on real-world datasets of varying sizes substantiate the capability of GraphRPM to adeptly address the challenges inherent in mining patterns from large-scale industrial attributed graphs, thereby underscoring its substantial value for industrial deployment. Keywords: Graph isomorphism network Graph neural network Largescale attributed graphs Risk pattern mining.

machine learning, pattern recognition, subgraph, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-70381-2_9

2411.06878

Country:

Asia > Nepal (0.04)
Asia > China (0.04)

Genre: Research Report (0.64)

Industry:

Banking & Finance (0.94)
Information Technology > Security & Privacy (0.66)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.54)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

NeuReg: Domain-invariant 3D Image Registration on Human and Mouse Brains

Razzaq, Taha, Iqbal, Asim

arXiv.org Artificial IntelligenceNov-9-2024

Medical brain imaging relies heavily on image registration to accurately curate structural boundaries of brain features for various healthcare applications. Deep learning models have shown remarkable performance in image registration in recent years. Still, they often struggle to handle the diversity of 3D brain volumes, challenged by their structural and contrastive variations and their imaging domains. In this work, we present NeuReg, a Neuro-inspired 3D image registration architecture with the feature of domain invariance. NeuReg generates domain-agnostic representations of imaging features and incorporates a shifting window-based Swin Transformer block as the encoder. This enables our model to capture the variations across brain imaging modalities and species. We demonstrate a new benchmark in multi-domain publicly available datasets comprising human and mouse 3D brain volumes. Extensive experiments reveal that our model (NeuReg) outperforms the existing baseline deep learning-based image registration models and provides a high-performance boost on cross-domain datasets, where models are trained on 'source-only' domain and tested on completely 'unseen' target domains. Our work establishes a new state-of-the-art for domain-agnostic 3D brain image registration, underpinned by Neuro-inspired Transformer-based architecture.

architecture, dataset, registration, (16 more...)

arXiv.org Artificial Intelligence

2411.06315

Country:

North America > United States > Washington > King County > Redmond (0.04)
Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Querying Perception Streams with Spatial Regular Expressions

Anderson, Jacob, Fainekos, Georgios, Hoxha, Bardh, Okamoto, Hideki, Prokhorov, Danil

arXiv.org Artificial IntelligenceNov-8-2024

Perception in fields like robotics, manufacturing, and data analysis generates large volumes of temporal and spatial data to effectively capture their environments. However, sorting through this data for specific scenarios is a meticulous and error-prone process, often dependent on the application, and lacks generality and reproducibility. In this work, we introduce SpREs as a novel querying language for pattern matching over perception streams containing spatial and temporal data derived from multi-modal dynamic environments. To highlight the capabilities of SpREs, we developed the STREM tool as both an offline and online pattern matching framework for perception data. We demonstrate the offline capabilities of STREM through a case study on a publicly available AV dataset (Woven Planet Perception) and its online capabilities through a case study integrating STREM in ROS with the CARLA simulator. We also conduct performance benchmark experiments on various SpRE queries. Using our matching framework, we are able to find over 20,000 matches within 296 ms making STREM applicable in runtime monitoring applications.

machine learning, pattern recognition, perception stream, (22 more...)

arXiv.org Artificial Intelligence

2411.05946

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry:

Automobiles & Trucks (0.69)
Transportation > Ground > Road (0.68)
Information Technology > Robotics & Automation (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.70)
(3 more...)

Add feedback