AITopics | Pattern Recognition

Collaborating Authors

Pattern Recognition

"... the research area that studies the operation and design of systems that recognize patterns in data." It includes statistical methods like discriminant analysis, feature extraction, error estimation, cluster analysis.
– Pattern Recognition Laboratory at Delft University of Technology

News Overviews Instructional Materials AI-Alerts Classics

Reinforcement Learning from Diffusion Feedback: Q* for Image Search

Marathe, Aboli

arXiv.org Artificial IntelligenceNov-27-2023

Large vision-language models are steadily gaining personalization capabilities at the cost of fine-tuning or data augmentation. We present two models for image generation using model-agnostic learning that align semantic priors with generative capabilities. RLDF, or Reinforcement Learning from Diffusion Feedback, is a singular approach for visual imitation through prior-preserving reward function guidance. This employs Q-learning (with standard Q*) for generation and follows a semantic-rewarded trajectory for image search through finite encoding-tailored actions. The second proposed method, noisy diffusion gradient, is optimization driven. At the root of both methods is a special CFG encoding that we propose for continual semantic guidance. Using only a single input image and no text input, RLDF generates high-quality images over varied domains including retail, sports and agriculture showcasing class-consistency and strong visual diversity. Project website is available at https://infernolia.github.io/RLDF.

arxiv preprint arxiv, diffusion model, preprint arxiv, (12 more...)

arXiv.org Artificial Intelligence

2311.15648

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.70)

Add feedback

SLMIA-SR: Speaker-Level Membership Inference Attacks against Speaker Recognition Systems

Chen, Guangke, Zhang, Yedi, Song, Fu

arXiv.org Artificial IntelligenceNov-27-2023

Membership inference attacks allow adversaries to determine whether a particular example was contained in the model's training dataset. While previous works have confirmed the feasibility of such attacks in various applications, none has focused on speaker recognition (SR), a promising voice-based biometric recognition technique. In this work, we propose SLMIA-SR, the first membership inference attack tailored to SR. In contrast to conventional example-level attack, our attack features speaker-level membership inference, i.e., determining if any voices of a given speaker, either the same as or different from the given inference voices, have been involved in the training of a model. It is particularly useful and practical since the training and inference voices are usually distinct, and it is also meaningful considering the open-set nature of SR, namely, the recognition speakers were often not present in the training data. We utilize intra-similarity and inter-dissimilarity, two training objectives of SR, to characterize the differences between training and non-training speakers and quantify them with two groups of features driven by carefully-established feature engineering to mount the attack. To improve the generalizability of our attack, we propose a novel mixing ratio training strategy to train attack models. To enhance the attack performance, we introduce voice chunk splitting to cope with the limited number of inference voices and propose to train attack models dependent on the number of inference voices. Our attack is versatile and can work in both white-box and black-box scenarios. Additionally, we propose two novel techniques to reduce the number of black-box queries while maintaining the attack performance. Extensive experiments demonstrate the effectiveness of SLMIA-SR.

attack model, slmia-sr, target sr, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.14722/ndss.2024.241323

2309.07983

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Asia > Singapore (0.04)
Asia > China (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Speech Recognition (0.61)

Add feedback

Ultra-Range Gesture Recognition using an RGB Camera in Human-Robot Interaction

Bamani, Eran, Nissinman, Eden, Meir, Inbar, Koenigsberg, Lisa, Sintov, Avishai

arXiv.org Artificial IntelligenceNov-26-2023

Hand gestures play a significant role in human interactions where non-verbal intentions, thoughts and commands are conveyed. In Human-Robot Interaction (HRI), hand gestures offer a similar and efficient medium for conveying clear and rapid directives to a robotic agent. However, state-of-the-art vision-based methods for gesture recognition have been shown to be effective only up to a user-camera distance of seven meters. Such a short distance range limits practical HRI with, for example, service robots, search and rescue robots and drones. In this work, we address the Ultra-Range Gesture Recognition (URGR) problem by aiming for a recognition distance of up to 25 meters and in the context of HRI. We propose a novel deep-learning framework for URGR using solely a simple RGB camera. First, a novel super-resolution model termed HQ-Net is used to enhance the low-resolution image of the user. Then, we propose a novel URGR classifier termed Graph Vision Transformer (GViT) which takes the enhanced image as input. GViT combines the benefits of a Graph Convolutional Network (GCN) and a modified Vision Transformer (ViT). Evaluation of the proposed framework over diverse test data yields a high recognition rate of 98.1%. The framework has also exhibited superior performance compared to human recognition in ultra-range distances. With the framework, we analyze and demonstrate the performance of an autonomous quadruped robot directed by human gestures in complex ultra-range indoor and outdoor environments.

gesture recognition, model certainty, recognition, (14 more...)

arXiv.org Artificial Intelligence

2311.15361

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
South America > Ecuador (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Switzerland (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine (1.00)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision > Gesture Recognition (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Add feedback

IRFL: Image Recognition of Figurative Language

Yosef, Ron, Bitton, Yonatan, Shahaf, Dafna

arXiv.org Artificial IntelligenceNov-25-2023

Figures of speech such as metaphors, similes, and idioms are integral parts of human communication. They are ubiquitous in many forms of discourse, allowing people to convey complex, abstract ideas and evoke emotion. As figurative forms are often conveyed through multiple modalities (e.g., both text and images), understanding multimodal figurative language is an important AI challenge, weaving together profound vision, language, commonsense and cultural knowledge. In this work, we develop the Image Recognition of Figurative Language (IRFL) dataset. We leverage human annotation and an automatic pipeline we created to generate a multimodal dataset, and introduce two novel tasks as a benchmark for multimodal figurative language understanding. We experimented with state-of-the-art vision and language models and found that the best (22%) performed substantially worse than humans (97%). We release our dataset, benchmark, and code, in hopes of driving the development of models that can better understand figurative language.

figurative language, idiom, simile, (14 more...)

arXiv.org Artificial Intelligence

2303.15445

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(2 more...)

Add feedback

Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey

Wörmann, Julian, Bogdoll, Daniel, Brunner, Christian, Bührle, Etienne, Chen, Han, Chuo, Evaristus Fuh, Cvejoski, Kostadin, van Elst, Ludger, Gottschall, Philip, Griesche, Stefan, Hellert, Christian, Hesels, Christian, Houben, Sebastian, Joseph, Tim, Keil, Niklas, Kelsch, Johann, Keser, Mert, Königshof, Hendrik, Kraft, Erwin, Kreuser, Leonie, Krone, Kevin, Latka, Tobias, Mattern, Denny, Matthes, Stefan, Motzkus, Franz, Munir, Mohsin, Nekolla, Moritz, Paschke, Adrian, von Pilchau, Stefan Pilar, Pintz, Maximilian Alexander, Qiu, Tianming, Qureishi, Faraz, Rizvi, Syed Tahseen Raza, Reichardt, Jörg, von Rueden, Laura, Sagel, Alexander, Sasdelli, Diogo, Scholl, Tobias, Schunk, Gerhard, Schwalbe, Gesina, Shen, Hao, Shoeb, Youssef, Stapelbroek, Hendrik, Stehr, Vera, Srinivas, Gurucharan, Tran, Anh Tuan, Vivekanandan, Abhishek, Wang, Ya, Wasserrab, Florian, Werner, Tino, Wirth, Christian, Zwicklbauer, Stefan

arXiv.org Artificial IntelligenceNov-20-2023

The availability of representative datasets is an essential prerequisite for many successful artificial intelligence and machine learning models. However, in real life applications these models often encounter scenarios that are inadequately represented in the data used for training. There are various reasons for the absence of sufficient data, ranging from time and cost constraints to ethical considerations. As a consequence, the reliable usage of these models, especially in safety-critical applications, is still a tremendous challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches. Knowledge augmented machine learning approaches offer the possibility of compensating for deficiencies, errors, or ambiguities in the data, thus increasing the generalization capability of the applied models. Even more, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-driven models with existing knowledge. The identified approaches are structured according to the categories knowledge integration, extraction and conformity. In particular, we address the application of the presented methods in the field of autonomous driving.

ieee cvf international conference, logic & formal reasoning, pattern recognition, (31 more...)

arXiv.org Artificial Intelligence

2205.04712

Country:

Europe > Germany (0.92)
Asia > China (0.67)
North America > United States > New York > New York County > New York City (0.28)
(29 more...)

Genre:

Overview (1.00)
Instructional Material > Course Syllabus & Notes (0.67)
Research Report > Promising Solution (0.67)
Research Report > New Finding (0.67)

Industry:

Transportation > Ground > Road (1.00)
Leisure & Entertainment > Games > Computer Games (1.00)
Law (1.00)
(7 more...)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
(19 more...)

Add feedback

Understanding and Mitigating Classification Errors Through Interpretable Token Patterns

Hedderich, Michael A., Fischer, Jonas, Klakow, Dietrich, Vreeken, Jilles

arXiv.org Artificial IntelligenceNov-17-2023

State-of-the-art NLP methods achieve human-like performance on many tasks, but make errors nevertheless. Characterizing these errors in easily interpretable terms gives insight into whether a classifier is prone to making systematic errors, but also gives a way to act and improve the classifier. We propose to discover those patterns of tokens that distinguish correct and erroneous predictions as to obtain global and interpretable descriptions for arbitrary NLP classifiers. We formulate the problem of finding a succinct and non-redundant set of such patterns in terms of the Minimum Description Length principle. Through an extensive set of experiments, we show that our method, Premise, performs well in practice. Unlike existing solutions, it recovers ground truth, even on highly imbalanced data over large vocabularies. In VQA and NER case studies, we confirm that it gives clear and actionable insight into the systematic errors made by NLP classifiers.

classifier, mitigating classification error, proceedings, (15 more...)

arXiv.org Artificial Intelligence

2311.1092

Country:

Europe > Germany > Saarland (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.05)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.49)

Add feedback

Parrot-Trained Adversarial Examples: Pushing the Practicality of Black-Box Audio Attacks against Speaker Recognition Models

Duan, Rui, Qu, Zhe, Ding, Leah, Liu, Yao, Lu, Zhuo

arXiv.org Artificial IntelligenceNov-17-2023

Audio adversarial examples (AEs) have posed significant security challenges to real-world speaker recognition systems. Most black-box attacks still require certain information from the speaker recognition model to be effective (e.g., keeping probing and requiring the knowledge of similarity scores). This work aims to push the practicality of the black-box attacks by minimizing the attacker's knowledge about a target speaker recognition model. Although it is not feasible for an attacker to succeed with completely zero knowledge, we assume that the attacker only knows a short (or a few seconds) speech sample of a target speaker. Without any probing to gain further knowledge about the target model, we propose a new mechanism, called parrot training, to generate AEs against the target model. Motivated by recent advancements in voice conversion (VC), we propose to use the one short sentence knowledge to generate more synthetic speech samples that sound like the target speaker, called parrot speech. Then, we use these parrot speech samples to train a parrot-trained(PT) surrogate model for the attacker. Under a joint transferability and perception framework, we investigate different ways to generate AEs on the PT model (called PT-AEs) to ensure the PT-AEs can be generated with high transferability to a black-box target model with good human perceptual quality. Real-world experiments show that the resultant PT-AEs achieve the attack success rates of 45.8% - 80.8% against the open-source models in the digital-line scenario and 47.9% - 58.3% against smart devices, including Apple HomePod (Siri), Amazon Echo, and Google Home, in the over-the-air scenario.

speech sample, target model, target speaker, (13 more...)

arXiv.org Artificial Intelligence

2311.0778

Country:

North America > United States > Florida > Hillsborough County > University (0.04)
North America > United States > Florida > Hillsborough County > Tampa (0.04)
North America > United States > District of Columbia > Washington (0.04)
Asia > China > Hunan Province > Changsha (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Air (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(3 more...)

Add feedback

The Challenges of HTR Model Training: Feedback from the Project Donner le gout de l'archive a l'ere numerique

Couture, Beatrice, Verret, Farah, Gohier, Maxime, Deslandres, Dominique

arXiv.org Artificial IntelligenceNov-12-2023

The arrival of handwriting recognition technologies offers new possibilities for research in heritage studies. However, it is now necessary to reflect on the experiences and the practices developed by research teams. Our use of the Transkribus platform since 2018 has led us to search for the most significant ways to improve the performance of our handwritten text recognition (HTR) models which are made to transcribe French handwriting dating from the 17th century. This article therefore reports on the impacts of creating transcribing protocols, using the language model at full scale and determining the best way to use base models in order to help increase the performance of HTR models. Combining all of these elements can indeed increase the performance of a single model by more than 20% (reaching a Character Error Rate below 5%). This article also discusses some challenges regarding the collaborative nature of HTR platforms such as Transkribus and the way researchers can share their data generated in the process of creating or training handwritten text recognition models.

base model, htr model, transcription, (17 more...)

arXiv.org Artificial Intelligence

2212.11146

Country:

North America > Canada > Quebec > Montreal (0.07)
Europe > France (0.05)
Europe > Austria > Vienna (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Law (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision > Handwriting Recognition (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Interpretable Geoscience Artificial Intelligence (XGeoS-AI): Application to Demystify Image Recognition

Xu, Jin-Jian, Zhang, Hao, Tang, Chao-Sheng, Li, Lin, Shi, Bin

arXiv.org Artificial IntelligenceNov-7-2023

As Earth science enters the era of big data, artificial intelligence (AI) not only offers great potential for solving geoscience problems, but also plays a critical role in accelerating the understanding of the complex, interactive, and multiscale processes of Earth's behavior. As geoscience AI models are progressively utilized for significant predictions in crucial situations, geoscience researchers are increasingly demanding their interpretability and versatility. This study proposes an interpretable geoscience artificial intelligence (XGeoS-AI) framework to unravel the mystery of image recognition in the Earth sciences, and its effectiveness and versatility is demonstrated by taking computed tomography (CT) image recognition as an example. Inspired by the mechanism of human vision, the proposed XGeoS-AI framework generates a threshold value from a local region within the whole image to complete the recognition. Different kinds of artificial intelligence (AI) methods, such as Support Vector Regression (SVR), Multilayer Perceptron (MLP), Convolutional Neural Network (CNN), can be adopted as the AI engines of the proposed XGeoS-AI framework to efficiently complete geoscience image recognition tasks. Experimental results demonstrate that the effectiveness, versatility, and heuristics of the proposed framework have great potential in solving geoscience image recognition problems. Interpretable AI should receive more and more attention in the field of the Earth sciences, which is the key to promoting more rational and wider applications of AI in the field of Earth sciences. In addition, the proposed interpretable framework may be the forerunner of technological innovation in the Earth sciences.

application, demystify image recognition, interpretable geoscience artificial intelligence, (1 more...)

arXiv.org Artificial Intelligence

2311.0494

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Proceedings of the 5th International Workshop on Reading Music Systems

Calvo-Zaragoza, Jorge, Pacha, Alexander, Shatri, Elona

arXiv.org Artificial IntelligenceNov-7-2023

The International Workshop on Reading Music Systems (WoRMS) is a workshop that tries to connect researchers who develop systems for reading music, such as in the field of Optical Music Recognition, with other researchers and practitioners that could benefit from such systems, like librarians or musicologists. The relevant topics of interest for the workshop include, but are not limited to: Music reading systems; Optical music recognition; Datasets and performance evaluation; Image processing on music scores; Writer identification; Authoring, editing, storing and presentation systems for music scores; Multi-modal systems; Novel input-methods for music to produce written music; Web-based Music Information Retrieval services; Applications and projects; Use-cases related to written music. These are the proceedings of the 5th International Workshop on Reading Music Systems, held in Milan, Italy on Nov. 4th 2023.

international workshop, proceedings, recognition, (13 more...)

arXiv.org Artificial Intelligence

2311.04091

Country:

North America > Canada > Quebec > Montreal (0.28)
Europe > Italy > Lombardy > Milan (0.24)
Europe > United Kingdom > England > Greater London > London (0.14)
(28 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Promising Solution (0.92)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Education > Curriculum > Subject-Specific Education (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(5 more...)

Add feedback