AITopics | Pattern Recognition

Collaborating Authors

Pattern Recognition

"... the research area that studies the operation and design of systems that recognize patterns in data." It includes statistical methods like discriminant analysis, feature extraction, error estimation, cluster analysis.
– Pattern Recognition Laboratory at Delft University of Technology

News Overviews Instructional Materials AI-Alerts Classics

Supervised Pattern Recognition Involving Skewed Feature Densities

Benatti, Alexandre, Costa, Luciano da F.

arXiv.org Artificial IntelligenceSep-2-2024

Pattern recognition constitutes a particularly important task underlying a great deal of scientific and technologica activities. At the same time, pattern recognition involves several challenges, including the choice of features to represent the data elements, as well as possible respective transformations. In the present work, the classification potential of the Euclidean distance and a dissimilarity index based on the coincidence similarity index are compared by using the k-neighbors supervised classification method respectively to features resulting from several types of transformations of one- and two-dimensional symmetric densities. Given two groups characterized by respective densities without or with overlap, different types of respective transformations are obtained and employed to quantitatively evaluate the performance of k-neighbors methodologies based on the Euclidean distance an coincidence similarity index. More specifically, the accuracy of classifying the intersection point between the densities of two adjacent groups is taken into account for the comparison. Several interesting results are described and discussed, including the enhanced potential of the dissimilarity index for classifying datasets with right skewed feature densities, as well as the identification that the sharpness of the comparison between data elements can be independent of the respective supervised classification performance.

arXiv.org Artificial Intelligence

2409.01213

Country:

South America > Brazil > São Paulo (0.04)
North America > United States > New York (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

A Permuted Autoregressive Approach to Word-Level Recognition for Urdu Digital Text

Mustafa, Ahmed, Rafique, Muhammad Tahir, Baig, Muhammad Ijlal, Sajid, Hasan, Khan, Muhammad Jawad, Kallu, Karam Dad

arXiv.org Artificial IntelligenceAug-30-2024

This research paper introduces a novel word-level Optical Character Recognition (OCR) model specifically designed for digital Urdu text, leveraging transformer-based architectures and attention mechanisms to address the distinct challenges of Urdu script recognition, including its diverse text styles, fonts, and variations. The model employs a permuted autoregressive sequence (PARSeq) architecture, which enhances its performance by enabling context-aware inference and iterative refinement through the training of multiple token permutations. This method allows the model to adeptly manage character reordering and overlapping characters, commonly encountered in Urdu script. Trained on a dataset comprising approximately 160,000 Urdu text images, the model demonstrates a high level of accuracy in capturing the intricacies of Urdu script, achieving a CER of 0.178. Despite ongoing challenges in handling certain text variations, the model exhibits superior accuracy and effectiveness in practical applications. Future work will focus on refining the model through advanced data augmentation techniques and the integration of context-aware language models to further enhance its performance and robustness in Urdu text recognition.

machine learning, pattern recognition, recognition, (20 more...)

arXiv.org Artificial Intelligence

2408.15119

Country:

Asia > Pakistan > Islamabad Capital Territory > Islamabad (0.06)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)

Genre:

Research Report (1.00)
Instructional Material > Online (0.41)
Instructional Material > Course Syllabus & Notes (0.41)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.91)

Add feedback

The VoxCeleb Speaker Recognition Challenge: A Retrospective

Huh, Jaesung, Chung, Joon Son, Nagrani, Arsha, Brown, Andrew, Jung, Jee-weon, Garcia-Romero, Daniel, Zisserman, Andrew

arXiv.org Artificial IntelligenceAug-27-2024

The VoxCeleb Speaker Recognition Challenges (VoxSRC) were a series of challenges and workshops that ran annually from 2019 to 2023. The challenges primarily evaluated the tasks of speaker recognition and diarisation under various settings including: closed and open training data; as well as supervised, self-supervised, and semi-supervised training for domain adaptation. The challenges also provided publicly available training and evaluation datasets for each task and setting, with new test sets released each year. In this paper, we provide a review of these challenges that covers: what they explored; the methods developed by the challenge participants and how these evolved; and also the current state of the field for speaker verification and diarisation. We chart the progress in performance over the five installments of the challenge on a common evaluation dataset and provide a detailed analysis of how each year's special focus affected participants' performance. This paper is aimed both at researchers who want an overview of the speaker recognition and diarisation field, and also at challenge organisers who want to benefit from the successes and avoid the mistakes of the VoxSRC challenges. We end with a discussion of the current strengths of the field and open challenges. Project page : https://mm.kaist.ac.kr/datasets/voxceleb/voxsrc/workshop.html

artificial intelligence, machine learning, pattern recognition, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TASLP.2024.3444456

2408.14886

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > Pennsylvania (0.04)
(11 more...)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.46)

Industry:

Information Technology > Security & Privacy (0.92)
Media (0.67)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.91)
(2 more...)

Add feedback

Artificial Intelligence in Landscape Architecture: A Survey

Xing, Yue, Gan, Wensheng, Chen, Qidi

arXiv.org Artificial IntelligenceAug-26-2024

The development history of landscape architecture (LA) reflects the human pursuit of environmental beautification and ecological balance. With the advancement of artificial intelligence (AI) technologies that simulate and extend human intelligence, immense opportunities have been provided for LA, offering scientific and technological support throughout the entire workflow. In this article, we comprehensively review the applications of AI technology in the field of LA. First, we introduce the many potential benefits that AI brings to the design, planning, and management aspects of LA. Secondly, we discuss how AI can assist the LA field in solving its current development problems, including urbanization, environmental degradation and ecological decline, irrational planning, insufficient management and maintenance, and lack of public participation. Furthermore, we summarize the key technologies and practical cases of applying AI in the LA domain, from design assistance to intelligent management, all of which provide innovative solutions for the planning, design, and maintenance of LA. Finally, we look ahead to the problems and opportunities in LA, emphasizing the need to combine human expertise and judgment for rational decision-making. This article provides both theoretical and practical guidance for LA designers, researchers, and technology developers. The successful integration of AI technology into LA holds great promise for enhancing the field's capabilities and achieving more sustainable, efficient, and user-friendly outcomes.

application, artificial intelligence, construction, (13 more...)

arXiv.org Artificial Intelligence

2408.147

Country:

Oceania > Australia > South Australia (0.04)
North America > United States > New York (0.04)
Asia > China > Guangdong Province > Shaoguan (0.04)
(7 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Food & Agriculture > Agriculture (1.00)
Energy (1.00)
(2 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Data Science > Data Mining (1.00)
(10 more...)

Add feedback

LowCLIP: Adapting the CLIP Model Architecture for Low-Resource Languages in Multimodal Image Retrieval Task

Asgarov, Ali, Rustamov, Samir

arXiv.org Artificial IntelligenceAug-25-2024

This research explores the development of multimodal vision-language models for image retrieval in low-resource languages, specifically Azerbaijani. Existing vision-language models primarily support high-resource languages, and fine-tuning them remains computationally demanding. To address challenges in vision-language retrieval for low-resource languages, we integrated the CLIP model architecture and employed several techniques to balance computational efficiency with performance. These techniques include synthetic data generation through machine translation, image augmentation, and further training the attention mechanisms of transformer-based models with domain-specific data. We integrated Multilingual BERT as a text encoder with image encoders like ResNet50, EfficientNet0, Vision Transformer (ViT), and Tiny Swin Transformer. Our study found that models like EfficientNet0 and Tiny Swin Transformer perform best on the datasets they were trained on, such as COCO, Flickr30k, and Flickr8k. Augmentation techniques boosted EfficientNet0 MAP on Flickr30k from 0.84 to 0.87 and ResNet50 MAP on MSCOCO from 0.70 to 0.80, contributing to a new state of the art in vision-language retrieval. We share our configurations and results to support further research. Code and pre-trained models are available at https://github.com/aliasgerovs/azclip.

dataset, query, relevant image, (15 more...)

arXiv.org Artificial Intelligence

2408.13909

Country: North America > United States (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Ancient but Digitized: Developing Handwritten Optical Character Recognition for East Syriac Script Through Creating KHAMIS Dataset

Majeed, Ameer, Hassani, Hossein

arXiv.org Artificial IntelligenceAug-24-2024

Many languages have vast amounts of handwritten texts, such as ancient scripts about folktale stories and historical narratives or contemporary documents and letters. Digitization of those texts has various applications, such as daily tasks, cultural studies, and historical research. Syriac is an ancient, endangered, and low-resourced language that has not received the attention it requires and deserves. This paper reports on a research project aimed at developing a optical character recognition (OCR) model based on the handwritten Syriac texts as a starting point to build more digital services for this endangered language. A dataset was created, KHAMIS (inspired by the East Syriac poet, Khamis bar Qardahe), which consists of handwritten sentences in the East Syriac script. We used it to fine-tune the Tesseract-OCR engine's pretrained Syriac model on handwritten data. The data was collected from volunteers capable of reading and writing in the language to create KHAMIS. KHAMIS currently consists of 624 handwritten Syriac sentences collected from 31 university students and one professor, and it will be partially available online and the whole dataset available in the near future for development and research purposes. As a result, the handwritten OCR model was able to achieve a character error rate of 1.097-1.610% and 8.963-10.490% on both training and evaluation sets, respectively, and both a character error rate of 18.89-19.71% and a word error rate of 62.83-65.42% when evaluated on the test set, which is twice as better than the default Syriac model of Tesseract.

dataset, recognition, syriac, (14 more...)

arXiv.org Artificial Intelligence

2408.13631

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > Middle East > Iraq > Kurdistan Region > Duhok Governorate > Duhok (0.05)
Asia > Middle East > Iraq > Erbil Governorate > Erbil (0.05)
(11 more...)

Genre: Research Report (1.00)

Industry:

Government (0.93)
Education > Educational Setting > Higher Education (0.39)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.85)

Add feedback

Challenges and Responses in the Practice of Large Language Models

Zhu, Hongyin

arXiv.org Artificial IntelligenceAug-21-2024

This paper meticulously curates questions that are both thought-provoking and practically relevant, providing nuanced and insightful answers to each. To facilitate readers' understanding and reference, this paper specifically classifies and organizes these questions systematically and meticulously from the five core dimensions of computing power infrastructure, software architecture, data resources, application scenarios, and brain science. This work aims to provide readers with a comprehensive, in-depth and cutting-edge AI knowledge framework to help people from all walks of life grasp the pulse of AI development, stimulate innovative thinking, and promote industrial progress.

big model, knowledge graph, mechanism, (15 more...)

arXiv.org Artificial Intelligence

2408.09416

Country:

Asia > China (0.05)
North America > United States > Nebraska > Douglas County > Omaha (0.04)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Information Technology > Security & Privacy (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.68)

Add feedback

CrossFi: A Cross Domain Wi-Fi Sensing Framework Based on Siamese Network

Zhao, Zijian, Chen, Tingwei, Cai, Zhijie, Li, Xiaoyang, Li, Hang, Chen, Qimei, Zhu, Guangxu

arXiv.org Artificial IntelligenceAug-20-2024

In recent years, Wi-Fi sensing has garnered significant attention due to its numerous benefits, such as privacy protection, low cost, and penetration ability. Extensive research has been conducted in this field, focusing on areas such as gesture recognition, people identification, and fall detection. However, many data-driven methods encounter challenges related to domain shift, where the model fails to perform well in environments different from the training data. One major factor contributing to this issue is the limited availability of Wi-Fi sensing datasets, which makes models learn excessive irrelevant information and over-fit to the training set. Unfortunately, collecting large-scale Wi-Fi sensing datasets across diverse scenarios is a challenging task. To address this problem, we propose CrossFi, a siamese network-based approach that excels in both in-domain scenario and cross-domain scenario, including few-shot, zero-shot scenarios, and even works in few-shot new-class scenario where testing set contains new categories. The core component of CrossFi is a sample-similarity calculation network called CSi-Net, which improves the structure of the siamese network by using an attention mechanism to capture similarity information, instead of simply calculating the distance or cosine similarity. Based on it, we develop an extra Weight-Net that can generate a template for each class, so that our CrossFi can work in different scenarios. Experimental results demonstrate that our CrossFi achieves state-of-the-art performance across various scenarios. In gesture recognition task, our CrossFi achieves an accuracy of 98.17% in in-domain scenario, 91.72% in one-shot cross-domain scenario, 64.81% in zero-shot cross-domain scenario, and 84.75% in one-shot new-class scenario. To facilitate future research, we will release the code for our model upon publication.

scenario, target domain, template, (16 more...)

arXiv.org Artificial Intelligence

2408.10919

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > China > Hubei Province > Wuhan (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
(2 more...)

Genre: Research Report > New Finding (0.88)

Industry: Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.70)

Add feedback

Pattern-Matching Dynamic Memory Network for Dual-Mode Traffic Prediction

Weng, Wenchao, Wu, Mei, Jiang, Hanyu, Kong, Wanzeng, Kong, Xiangjie, Xia, Feng

arXiv.org Artificial IntelligenceAug-12-2024

In recent years, deep learning has increasingly gained attention in the field of traffic prediction. Existing traffic prediction models often rely on GCNs or attention mechanisms with O(N^2) complexity to dynamically extract traffic node features, which lack efficiency and are not lightweight. Additionally, these models typically only utilize historical data for prediction, without considering the impact of the target information on the prediction. To address these issues, we propose a Pattern-Matching Dynamic Memory Network (PM-DMNet). PM-DMNet employs a novel dynamic memory network to capture traffic pattern features with only O(N) complexity, significantly reducing computational overhead while achieving excellent performance. The PM-DMNet also introduces two prediction methods: Recursive Multi-step Prediction (RMP) and Parallel Multi-step Prediction (PMP), which leverage the time features of the prediction targets to assist in the forecasting process. Furthermore, a transfer attention mechanism is integrated into PMP, transforming historical data features to better align with the predicted target states, thereby capturing trend changes more accurately and reducing errors. Extensive experiments demonstrate the superiority of the proposed model over existing benchmarks. The source codes are available at: https://github.com/wengwenchao123/PM-DMNet.

dataset, pm-dmnet, prediction, (15 more...)

arXiv.org Artificial Intelligence

2408.071

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Zhejiang Province > Hangzhou (0.05)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry:

Transportation > Passenger (0.46)
Transportation > Infrastructure & Services (0.46)
Transportation > Ground > Road (0.46)
Consumer Products & Services > Travel (0.31)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deformable Image Registration with Multi-scale Feature Fusion from Shared Encoder, Auxiliary and Pyramid Decoders

Zhou, Hongchao, Hu, Shunbo

arXiv.org Artificial IntelligenceAug-11-2024

In this work, we propose a novel deformable convolutional pyramid network for unsupervised image registration. Specifically, the proposed network enhances the traditional pyramid network by adding an additional shared auxiliary decoder for image pairs. This decoder provides multi-scale high-level feature information from unblended image pairs for the registration task. During the registration process, we also design a multi-scale feature fusion block to extract the most beneficial features for the registration task from both global and local contexts. Validation results indicate that this method can capture complex deformations while achieving higher registration accuracy and maintaining smooth and plausible deformations.

decoder, image registration, registration, (13 more...)

arXiv.org Artificial Intelligence

2408.05717

Country: Asia > China > Shandong Province (0.05)

Genre: Research Report (0.40)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback