AITopics | Pattern Recognition

Collaborating Authors

Pattern Recognition

"... the research area that studies the operation and design of systems that recognize patterns in data." It includes statistical methods like discriminant analysis, feature extraction, error estimation, cluster analysis.
– Pattern Recognition Laboratory at Delft University of Technology

News Overviews Instructional Materials AI-Alerts Classics

Advancing Deformable Medical Image Registration with Multi-axis Cross-covariance Attention

Meng, Mingyuan, Fulham, Michael, Bi, Lei, Kim, Jinman

arXiv.org Artificial IntelligenceDec-24-2024

Deformable image registration is a fundamental requirement for medical image analysis. Recently, transformers have been widely used in deep learning-based registration methods for their ability to capture long-range dependency via self-attention (SA). However, the high computation and memory loads of SA (growing quadratically with the spatial resolution) hinder transformers from processing subtle textural information in high-resolution image features, e.g., at the full and half image resolutions. This limits deformable registration as the high-resolution textural information is crucial for finding precise pixel-wise correspondence between subtle anatomical structures. Cross-covariance Attention (XCA), as a "transposed" version of SA that operates across feature channels, has complexity growing linearly with the spatial resolution, providing the feasibility of capturing long-range dependency among high-resolution image features. However, existing XCA-based transformers merely capture coarse global long-range dependency, which are unsuitable for deformable image registration relying primarily on fine-grained local correspondence. In this study, we propose to improve existing deep learning-based registration methods by embedding a new XCA mechanism. To this end, we design an XCA-based transformer block optimized for deformable medical image registration, named Multi-Axis XCA (MAXCA). Our MAXCA serves as a general network block that can be embedded into various registration network architectures. It can capture both global and local long-range dependency among high-resolution image features by applying regional and dilated XCA in parallel via a multi-axis design. Extensive experiments on two well-benchmarked inter-/intra-patient registration tasks with seven public medical datasets demonstrate that our MAXCA block enables state-of-the-art registration performance.

machine learning, pattern recognition, registration, (17 more...)

arXiv.org Artificial Intelligence

2412.18545

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LMRPA: Large Language Model-Driven Efficient Robotic Process Automation for OCR

Abdellaif, Osama Hosam, Nader, Abdelrahman, Hamdi, Ali

arXiv.org Artificial IntelligenceDec-23-2024

This paper introduces LMRPA, a novel Large Model-Driven Robotic Process Automation (RPA) model designed to greatly improve the efficiency and speed of Optical Character Recognition (OCR) tasks. Traditional RPA platforms often suffer from performance bottlenecks when handling high-volume repetitive processes like OCR, leading to a less efficient and more time-consuming process. LMRPA allows the integration of Large Language Models (LLMs) to improve the accuracy and readability of extracted text, overcoming the challenges posed by ambiguous characters and complex text structures.Extensive benchmarks were conducted comparing LMRPA to leading RPA platforms, including UiPath and Automation Anywhere, using OCR engines like Tesseract and DocTR. The results are that LMRPA achieves superior performance, cutting the processing times by up to 52\%. For instance, in Batch 2 of the Tesseract OCR task, LMRPA completed the process in 9.8 seconds, where UiPath finished in 18.1 seconds and Automation Anywhere finished in 18.7 seconds. Similar improvements were observed with DocTR, where LMRPA outperformed other automation tools conducting the same process by completing tasks in 12.7 seconds, while competitors took over 20 seconds to do the same. These findings highlight the potential of LMRPA to revolutionize OCR-driven automation processes, offering a more efficient and effective alternative solution to the existing state-of-the-art RPA models.

large language model, machine learning, pattern recognition, (17 more...)

arXiv.org Artificial Intelligence

2412.18063

Genre:

Research Report (1.00)
Overview (0.94)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.68)

Add feedback

VSFormer: Value and Shape-Aware Transformer with Prior-Enhanced Self-Attention for Multivariate Time Series Classification

Xi, Wenjie, Zuo, Rundong, Alvarez, Alejandro, Zhang, Jie, Choi, Byron, Lin, Jessica

arXiv.org Artificial IntelligenceDec-21-2024

Multivariate time series classification is a crucial task in data mining, attracting growing research interest due to its broad applications. While many existing methods focus on discovering discriminative patterns in time series, real-world data does not always present such patterns, and sometimes raw numerical values can also serve as discriminative features. Additionally, the recent success of Transformer models has inspired many studies. However, when applying to time series classification, the self-attention mechanisms in Transformer models could introduce classification-irrelevant features, thereby compromising accuracy. To address these challenges, we propose a novel method, VSFormer, that incorporates both discriminative patterns (shape) and numerical information (value). In addition, we extract class-specific prior information derived from supervised information to enrich the positional encoding and provide classification-oriented self-attention learning, thereby enhancing its effectiveness. Extensive experiments on all 30 UEA archived datasets demonstrate the superior performance of our method compared to SOTA models. Through ablation studies, we demonstrate the effectiveness of the improved encoding layer and the proposed self-attention mechanism. Finally, We provide a case study on a real-world time series dataset without discriminative patterns to interpret our model.

information, machine learning, pattern recognition, (20 more...)

arXiv.org Artificial Intelligence

2412.16515

Country: North America > United States (1.00)

Genre: Research Report > Promising Solution (0.48)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.67)

Add feedback

A survey on FPGA-based accelerator for ML models

Yan, Feng, Koch, Andreas, Sinnen, Oliver

arXiv.org Artificial IntelligenceDec-20-2024

This paper thoroughly surveys machine learning (ML) algorithms acceleration in hardware accelerators, focusing on Field-Programmable Gate Arrays (FPGAs). It reviews 287 out of 1138 papers from the past six years, sourced from four top FPGA conferences. Such selection underscores the increasing integration of ML and FPGA technologies and their mutual importance in technological advancement. Research clearly emphasises inference acceleration (81\%) compared to training acceleration (13\%). Additionally, the findings reveals that CNN dominates current FPGA acceleration research while emerging models like GNN show obvious growth trends. The categorization of the FPGA research papers reveals a wide range of topics, demonstrating the growing relevance of ML in FPGA research. This comprehensive analysis provides valuable insights into the current trends and future directions of FPGA research in the context of ML applications.

artificial intelligence, machine learning, pattern recognition, (20 more...)

arXiv.org Artificial Intelligence

2412.15666

Country: Europe (0.67)

Genre: Research Report (1.00)

Industry:

Information Technology (1.00)
Energy (0.93)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(4 more...)

Add feedback

Machine Learning Techniques for Pattern Recognition in High-Dimensional Data Mining

Li, Pochun

arXiv.org Artificial IntelligenceDec-20-2024

This paper proposes a frequent pattern data mining algorithm based on support vector machine (SVM), aiming to solve the performance bottleneck of traditional frequent pattern mining algorithms in high-dimensional and sparse data environments. By converting the frequent pattern mining task into a classification problem, the SVM model is introduced to improve the accuracy and robustness of pattern extraction. In terms of method design, the kernel function is used to map the data to a high-dimensional feature space, so as to construct the optimal classification hyperplane, realize the nonlinear separation of patterns and the accurate mining of frequent items. In the experiment, two public datasets, Retail and Mushroom, were selected to compare and analyze the proposed algorithm with traditional FP-Growth, FP-Tree, decision tree and random forest models. The experimental results show that the algorithm in this paper is significantly better than the traditional model in terms of three key indicators: support, confidence and lift, showing strong pattern recognition ability and rule extraction effect. The study shows that the SVM model has excellent performance advantages in an environment with high data sparsity and a large number of transactions, and can effectively cope with complex pattern mining tasks. At the same time, this paper also points out the potential direction of future research, including the introduction of deep learning and ensemble learning frameworks to further improve the scalability and adaptability of the algorithm. This research not only provides a new idea for frequent pattern mining, but also provides important technical support for solving pattern discovery and association rule mining problems in practical applications.

artificial intelligence, machine learning, pattern recognition, (13 more...)

arXiv.org Artificial Intelligence

2412.15593

Genre: Research Report (0.71)

Industry: Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)

Add feedback

WildSAT: Learning Satellite Image Representations from Wildlife Observations

Daroya, Rangel, Cole, Elijah, Mac Aodha, Oisin, Van Horn, Grant, Maji, Subhransu

arXiv.org Artificial IntelligenceDec-18-2024

What does the presence of a species reveal about a geographic location? We posit that habitat, climate, and environmental preferences reflected in species distributions provide a rich source of supervision for learning satellite image representations. We introduce WildSAT, which pairs satellite images with millions of geo-tagged wildlife observations readily-available on citizen science platforms. WildSAT uses a contrastive learning framework to combine information from species distribution maps with text descriptions that capture habitat and range details, alongside satellite images, to train or fine-tune models. On a range of downstream satellite image recognition tasks, this significantly improves the performance of both randomly initialized models and pre-trained models from sources like ImageNet or specialized satellite image datasets. Additionally, the alignment with text enables zero-shot retrieval, allowing for search based on general descriptions of locations. We demonstrate that WildSAT achieves better representations than recent methods that utilize other forms of cross-modal supervision, such as aligning satellite images with ground images or wildlife photos. Finally, we analyze the impact of various design choices on downstream performance, highlighting the general applicability of our approach.

large language model, machine learning, pattern recognition, (21 more...)

arXiv.org Artificial Intelligence

2412.14428

Country:

Africa > Kenya (0.04)
North America > United States > Virginia (0.04)
South America (0.04)
(10 more...)

Genre: Research Report (0.64)

Industry:

Food & Agriculture (0.46)
Energy (0.31)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
(3 more...)

Add feedback

Nonstationary Sparse Spectral Permanental Process

Sun, Zicheng, Zhang, Yixuan, Ling, Zenan, Fan, Xuhui, Zhou, Feng

arXiv.org Machine LearningDec-18-2024

Existing permanental processes often impose constraints on kernel types or stationarity, limiting the model's expressiveness. To overcome these limitations, we propose a novel approach utilizing the sparse spectral representation of nonstationary kernels. This technique relaxes the constraints on kernel types and stationarity, allowing for more flexible modeling while reducing computational complexity to the linear level. Additionally, we introduce a deep kernel variant by hierarchically stacking multiple spectral feature mappings, further enhancing the model's expressiveness to capture complex patterns in data. Experimental results on both synthetic and real-world datasets demonstrate the effectiveness of our approach, particularly in scenarios with pronounced data nonstationarity. Additionally, ablation studies are conducted to provide insights into the impact of various hyperparameters on model performance.

artificial intelligence, machine learning, pattern recognition, (18 more...)

arXiv.org Machine Learning

2410.03581

Country: Asia > China (0.28)

Genre: Research Report > Experimental Study (0.93)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.34)

Add feedback

Pattern Matching in AI Compilers and its Formalization (Extended Version)

Cutler, Joseph W., Collins, Alex, Fan, Bin, Ravishankar, Mahesh, Grover, Vinod

arXiv.org Artificial IntelligenceDec-17-2024

PyPM is a Python-based domain specific language (DSL) for building rewrite-based optimization passes on machine learning computation graphs. Users define individual optimizations by writing (a) patterns that match subgraphs of a computation graph and (b) corresponding rules which replace a matched subgraph with an optimized kernel. PyPM is distinguished from the many other DSLs for defining rewriting passes by its complex and novel pattern language which borrows concepts from logic programming. PyPM patterns can be recursive, nondeterminstic, and can require checking domain-specific constraints such as the shapes of tensors. The PyPM implementation is thus similarly complicated, consisting of thousands of lines of C++ code. In this paper, we present our work on building PyPM, as well as formalizing and distilling and this complexity to an understandable mathematical core. We have developed a formal core calculus expressing the main operations of the PyPM pattern language. We define both a declarative semantics - describing which patterns match which terms - and an algorithmic semantics - an idealized version of the PyPM pattern interpreter - and prove their equivalence. The development is fully mechanized in the Coq proof assistant.

machine learning, pattern recognition, pypm, (19 more...)

arXiv.org Artificial Intelligence

2412.13398

Country:

North America > United States > Nevada > Clark County > Las Vegas (0.06)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(6 more...)

Genre: Research Report (0.40)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

Pattern Analogies: Learning to Perform Programmatic Image Edits by Analogy

Ganeshan, Aditya, Groueix, Thibault, Guerrero, Paul, Měch, Radomír, Fisher, Matthew, Ritchie, Daniel

arXiv.org Artificial IntelligenceDec-16-2024

Pattern images are everywhere in the digital and physical worlds, and tools to edit them are valuable. But editing pattern images is tricky: desired edits are often programmatic: structure-aware edits that alter the underlying program which generates the pattern. One could attempt to infer this underlying program, but current methods for doing so struggle with complex images and produce unorganized programs that make editing tedious. In this work, we introduce a novel approach to perform programmatic edits on pattern images. By using a pattern analogy -- a pair of simple patterns to demonstrate the intended edit -- and a learning-based generative model to execute these edits, our method allows users to intuitively edit patterns. To enable this paradigm, we introduce SplitWeave, a domain-specific language that, combined with a framework for sampling synthetic pattern analogies, enables the creation of a large, high-quality synthetic training dataset. We also present TriFuser, a Latent Diffusion Model (LDM) designed to overcome critical issues that arise when naively deploying LDMs to this task. Extensive experiments on real-world, artist-sourced patterns reveals that our method faithfully performs the demonstrated edit while also generalizing to related pattern styles beyond its training distribution.

large language model, machine learning, pattern recognition, (21 more...)

arXiv.org Artificial Intelligence

2412.12463

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Multi-head attention debiasing and contrastive learning for mitigating Dataset Artifacts in Natural Language Inference

Sivakoti, Karthik

arXiv.org Artificial IntelligenceDec-16-2024

While Natural Language Inference (NLI) models have achieved high performances on benchmark datasets, there are still concerns whether they truly capture the intended task, or largely exploit dataset artifacts. Through detailed analysis of the Stanford Natural Language Inference (SNLI) dataset, we have uncovered complex patterns of various types of artifacts and their interactions, leading to the development of our novel structural debiasing approach. Our fine-grained analysis of 9,782 validation examples reveals four major categories of artifacts: length-based patterns, lexical overlap, subset relationships, and negation patterns. Our multi-head debiasing architecture achieves substantial improvements across all bias categories: length bias accuracy improved from 86.03% to 90.06%, overlap bias from 91.88% to 93.13%, subset bias from 95.43% to 96.49%, and negation bias from 88.69% to 94.64%. Overall, our approach reduces the error rate from 14.19% to 10.42% while maintaining high performance on unbiased examples. Analysis of 1,026 error cases shows significant improvement in handling neutral relationships, traditionally one of the most challenging areas for NLI systems.

accuracy, artifact, confidence score, (12 more...)

arXiv.org Artificial Intelligence

2412.16194

Country:

Europe > Germany (0.05)
Asia > Japan (0.05)
Asia > China (0.05)
North America > United States > Texas > Travis County > Austin (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.34)

Add feedback