AITopics | Pattern Recognition

Collaborating Authors

Pattern Recognition

"... the research area that studies the operation and design of systems that recognize patterns in data." It includes statistical methods like discriminant analysis, feature extraction, error estimation, cluster analysis.
– Pattern Recognition Laboratory at Delft University of Technology

News Overviews Instructional Materials AI-Alerts Classics

Lumos : Empowering Multimodal LLMs with Scene Text Recognition

Shenoy, Ashish, Lu, Yichao, Jayakumar, Srihari, Chatterjee, Debojeet, Moslehpour, Mohsen, Chuang, Pierce, Harpale, Abhay, Bhardwaj, Vikas, Xu, Di, Zhao, Shicong, Zhao, Longfang, Ramchandani, Ankit, Dong, Xin Luna, Kumar, Anuj

arXiv.org Artificial IntelligenceFeb-12-2024

We introduce Lumos, the first end-to-end multimodal question-answering system with text understanding capabilities. At the core of Lumos is a Scene Text Recognition (STR) component that extracts text from first person point-of-view images, the output of which is used to augment input to a Multimodal Large Language Model (MM-LLM). While building Lumos, we encountered numerous challenges related to STR quality, overall latency, and model inference. In this paper, we delve into those challenges, and discuss the system architecture, design choices, and modeling techniques employed to overcome these obstacles. We also provide a comprehensive evaluation for each component, showcasing high quality and efficiency.

arxiv, latency, paragraph, (13 more...)

arXiv.org Artificial Intelligence

2402.08017

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Greece (0.04)

Genre: Research Report (0.50)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Text Recognition (0.63)

Add feedback

Open-Vocabulary Calibration for Vision-Language Models

Wang, Shuoyuan, Wang, Jindong, Wang, Guoqing, Zhang, Bob, Zhou, Kaiyang, Wei, Hongxin

arXiv.org Artificial IntelligenceFeb-7-2024

Vision-language models (VLMs) have emerged as formidable tools, showing their strong capability in handling various open-vocabulary tasks in image recognition, text-driven visual content generation, and visual chatbots, to name a few. In recent years, considerable efforts and resources have been devoted to adaptation methods for improving downstream performance of VLMs, particularly on parameter-efficient fine-tuning methods like prompt learning. However, a crucial aspect that has been largely overlooked is the confidence calibration problem in fine-tuned VLMs, which could greatly reduce reliability when deploying such models in the real world. This paper bridges the gap by systematically investigating the confidence calibration problem in the context of prompt learning and reveals that existing calibration methods are insufficient to address the problem, especially in the open-vocabulary setting. To solve the problem, we present a simple and effective approach called Distance-Aware Calibration (DAC), which is based on scaling the temperature using as guidance the distance between predicted text labels and base classes. The experiments with 7 distinct prompt learning methods applied across 11 diverse downstream datasets demonstrate the effectiveness of DAC, which achieves high efficacy without sacrificing the inference speed.

base class, calibration, new class, (15 more...)

arXiv.org Artificial Intelligence

2402.04655

Country:

Asia > Macao (0.04)
Asia > China > Hong Kong (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
(2 more...)

Add feedback

Measuring machine learning harms from stereotypes: requires understanding who is being harmed by which errors in what ways

Wang, Angelina, Bai, Xuechunzi, Barocas, Solon, Blodgett, Su Lin

arXiv.org Artificial IntelligenceFeb-6-2024

As machine learning applications proliferate, we need an understanding of their potential for harm. However, current fairness metrics are rarely grounded in human psychological experiences of harm. Drawing on the social psychology of stereotypes, we use a case study of gender stereotypes in image search to examine how people react to machine learning errors. First, we use survey studies to show that not all machine learning errors reflect stereotypes nor are equally harmful. Then, in experimental studies we randomly expose participants to stereotype-reinforcing, -violating, and -neutral machine learning errors. We find stereotype-reinforcing errors induce more experientially (i.e., subjectively) harmful experiences, while having minimal changes to cognitive beliefs, attitudes, or behaviors. This experiential harm impacts women more than men. However, certain stereotype-violating errors are more experientially harmful for men, potentially due to perceived threats to masculinity. We conclude that harm cannot be the sole guide in fairness mitigation, and propose a nuanced perspective depending on who is experiencing what harm and why.

participant, stereotype, stereotype-reinforcing error, (16 more...)

arXiv.org Artificial Intelligence

2402.0442

Country:

North America > United States > Massachusetts > Middlesex County > Reading (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Alaska (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Law (0.67)
Government (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.35)

Add feedback

Mining a Minimal Set of Behavioral Patterns using Incremental Evaluation

Acheli, Mehdi, Grigori, Daniela, Weidlich, Matthias

arXiv.org Artificial IntelligenceFeb-5-2024

Process mining provides methods to analyse event logs generated by information systems during the execution of processes. It thereby supports the design, validation, and execution of processes in domains ranging from healthcare, through manufacturing, to e-commerce. To explore the regularities of flexible processes that show a large behavioral variability, it was suggested to mine recurrent behavioral patterns that jointly describe the underlying process. Existing approaches to behavioral pattern mining, however, suffer from two limitations. First, they show limited scalability as incremental computation is incorporated only in the generation of pattern candidates, but not in the evaluation of their quality. Second, process analysis based on mined patterns shows limited effectiveness due to an overwhelmingly large number of patterns obtained in practical application scenarios, many of which are redundant. In this paper, we address these limitations to facilitate the analysis of complex, flexible processes based on behavioral patterns. Specifically, we improve COBPAM, our initial behavioral pattern mining algorithm, by an incremental procedure to evaluate the quality of pattern candidates, optimizing thereby its efficiency. Targeting a more effective use of the resulting patterns, we further propose pruning strategies for redundant patterns and show how relations between the remaining patterns are extracted and visualized to provide process insights. Our experiments with diverse real-world datasets indicate a considerable reduction of the runtime needed for pattern mining, while a qualitative assessment highlights how relations between patterns guide the analysis of the underlying process.

algorithm, alignment, operator, (17 more...)

arXiv.org Artificial Intelligence

2402.02921

Country:

North America > United States > New Jersey > Hudson County > Hoboken (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > Spain > Andalusia > Málaga Province > Málaga (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)

Add feedback

Spatially Covariant Image Registration with Text Prompts

Chen, Xiang, Liu, Min, Wang, Rongguang, Hu, Renjiu, Liu, Dongdong, Li, Gaolei, Zhang, Hang

arXiv.org Artificial IntelligenceFeb-5-2024

Medical images are often characterized by their structured anatomical representations and spatially inhomogeneous contrasts. Leveraging anatomical priors in neural networks can greatly enhance their utility in resource-constrained clinical settings. Prior research has harnessed such information for image segmentation, yet progress in deformable image registration has been modest. Our work introduces textSCF, a novel method that integrates spatially covariant filters and textual anatomical prompts encoded by visual-language models, to fill this gap. This approach optimizes an implicit function that correlates text embeddings of anatomical regions to filter weights, relaxing the typical translation-invariance constraint of convolutional operations. TextSCF not only boosts computational efficiency but can also retain or improve registration accuracy. By capturing the contextual interplay between anatomical regions, it offers impressive inter-regional transferability and the ability to preserve structural discontinuities during registration. TextSCF's performance has been rigorously tested on inter-subject brain MRI and abdominal CT registration tasks, outperforming existing state-of-the-art models in the MICCAI Learn2Reg 2021 challenge and leading the leaderboard. In abdominal registrations, textSCF's larger model variant improved the Dice score by 11.3% over the second-best model, while its smaller variant maintained similar accuracy but with an 89.13% reduction in network parameters and a 98.34\% decrease in computational operations.

deformation field, registration, textscf, (14 more...)

arXiv.org Artificial Intelligence

2311.15607

Country:

North America > United States > New York (0.04)
South America > Peru > Lima Department > Lima Province > Lima (0.04)
North America > United States > Pennsylvania (0.04)
(6 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.93)
Research Report > Promising Solution (0.68)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Health Care Technology (0.88)
Health & Medicine > Therapeutic Area > Neurology (0.88)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Machine Intelligence in Africa: a survey

Tapo, Allahsera Auguste, Traore, Ali, Danioko, Sidy, Tembine, Hamidou

arXiv.org Artificial IntelligenceFeb-3-2024

In the last 5 years, the availability of large audio datasets in African countries has opened unlimited opportunities to build machine intelligence (MI) technologies that are closer to the people and speak, learn, understand, and do businesses in local languages, including for those who cannot read and write. Unfortunately, these audio datasets are not fully exploited by current MI tools, leaving several Africans out of MI business opportunities. Additionally, many state-of-the-art MI models are not culture-aware, and the ethics of their adoption indexes are questionable. The lack thereof is a major drawback in many applications in Africa. This paper summarizes recent developments in machine intelligence in Africa from a multi-layer multiscale and culture-aware ethics perspective, showcasing MI use cases in 54 African countries through 400 articles on MI research, industry, government actions, as well as uses in art, music, the informal economy, and small businesses in Africa. The survey also opens discussions on the reliability of MI rankings and indexes in the African continent as well as algorithmic definitions of unclear terms used in MI.

data mining, natural language, pattern recognition, (25 more...)

arXiv.org Artificial Intelligence

2402.02218

Country:

Africa > Nigeria (1.00)
Africa > Democratic Republic of the Congo (0.92)
Africa > Cameroon (0.67)
(45 more...)

Genre:

Summary/Review (1.00)
Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
(5 more...)

Industry:

Water & Waste Management > Water Management > Water Supplies & Services (1.00)
Transportation > Ground > Road (1.00)
Telecommunications (1.00)
(47 more...)

Technology:

Information Technology > e-Commerce > Financial Technology (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
(20 more...)

Add feedback

WiOpen: A Robust Wi-Fi-based Open-set Gesture Recognition Framework

Zhang, Xiang, Huang, Jingyang, Yan, Huan, Zhao, Peng, Zhuang, Guohang, Liu, Zhi, Liu, Bin

arXiv.org Artificial IntelligenceFeb-1-2024

Recent years have witnessed a growing interest in Wi-Fi-based gesture recognition. However, existing works have predominantly focused on closed-set paradigms, where all testing gestures are predefined during training. This poses a significant challenge in real-world applications, as unseen gestures might be misclassified as known classes during testing. To address this issue, we propose WiOpen, a robust Wi-Fi-based Open-Set Gesture Recognition (OSGR) framework. Implementing OSGR requires addressing challenges caused by the unique uncertainty in Wi-Fi sensing. This uncertainty, resulting from noise and domains, leads to widely scattered and irregular data distributions in collected Wi-Fi sensing data. Consequently, data ambiguity between classes and challenges in defining appropriate decision boundaries to identify unknowns arise. To tackle these challenges, WiOpen adopts a two-fold approach to eliminate uncertainty and define precise decision boundaries. Initially, it addresses uncertainty induced by noise during data preprocessing by utilizing the CSI ratio. Next, it designs the OSGR network based on an uncertainty quantification method. Throughout the learning process, this network effectively mitigates uncertainty stemming from domains. Ultimately, the network leverages relationships among samples' neighbors to dynamically define open-set decision boundaries, successfully realizing OSGR. Comprehensive experiments on publicly accessible datasets confirm WiOpen's effectiveness. Notably, WiOpen also demonstrates superiority in cross-domain tasks when compared to state-of-the-art approaches.

gesture recognition, recognition, wiopen, (15 more...)

arXiv.org Artificial Intelligence

2402.00822

Country:

Asia > China > Anhui Province > Hefei (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.94)
Information Technology > Artificial Intelligence > Vision > Gesture Recognition (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Verification for Object Detection -- IBP IoU

Cohen, Noémie, Ducoffe, Mélanie, Boumazouza, Ryma, Gabreau, Christophe, Pagetti, Claire, Pucel, Xavier, Galametz, Audrey

arXiv.org Artificial IntelligenceJan-30-2024

We introduce a novel Interval Bound Propagation (IBP) approach for the formal verification of object detection models, specifically targeting the Intersection over Union (IoU) metric. The approach has been implemented in an open source code, named IBP IoU, compatible with popular abstract interpretation based verification tools. The resulting verifier is evaluated on landing approach runway detection and handwritten digit recognition case studies. Comparisons against a baseline (Vanilla IBP IoU) highlight the superior performance of IBP IoU in ensuring accuracy and stability, contributing to more secure and robust machine learning applications.

detection, iou, perturbation, (16 more...)

arXiv.org Artificial Intelligence

2403.08788

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
Europe > Iceland > Capital Region > Reykjavik (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > Japan (0.04)

Genre: Research Report (0.64)

Industry:

Transportation > Air (0.68)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.34)

Add feedback

Local Feature Matching Using Deep Learning: A Survey

Xu, Shibiao, Chen, Shunpeng, Xu, Rongtao, Wang, Changwei, Lu, Peng, Guo, Li

arXiv.org Artificial IntelligenceJan-30-2024

Local feature matching enjoys wide-ranging applications in the realm of computer vision, encompassing domains such as image retrieval, 3D reconstruction, and object recognition. However, challenges persist in improving the accuracy and robustness of matching due to factors like viewpoint and lighting variations. In recent years, the introduction of deep learning models has sparked widespread exploration into local feature matching techniques. The objective of this endeavor is to furnish a comprehensive overview of local feature matching methods. These methods are categorized into two key segments based on the presence of detectors. The Detector-based category encompasses models inclusive of Detect-then-Describe, Joint Detection and Description, Describe-then-Detect, as well as Graph Based techniques. In contrast, the Detector-free category comprises CNN Based, Transformer Based, and Patch Based methods. Our study extends beyond methodological analysis, incorporating evaluations of prevalent datasets and metrics to facilitate a quantitative comparison of state-of-the-art techniques. The paper also explores the practical application of local feature matching in diverse domains such as Structure from Motion, Remote Sensing Image Registration, and Medical Image Registration, underscoring its versatility and significance across various fields. Ultimately, we endeavor to outline the current challenges faced in this domain and furnish future research directions, thereby serving as a reference for researchers involved in local feature matching and its interconnected domains.

descriptor, matching, proceedings, (13 more...)

arXiv.org Artificial Intelligence

2401.17592

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(10 more...)

Genre:

Research Report > Promising Solution (1.00)
Overview (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.48)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Efficient Gesture Recognition on Spiking Convolutional Networks Through Sensor Fusion of Event-Based and Depth Data

Steffen, Lea, Trapp, Thomas, Roennau, Arne, Dillmann, Rüdiger

arXiv.org Artificial IntelligenceJan-30-2024

As intelligent systems become increasingly important in our daily lives, new ways of interaction are needed. Classical user interfaces pose issues for the physically impaired and are partially not practical or convenient. Gesture recognition is an alternative, but often not reactive enough when conventional cameras are used. This work proposes a Spiking Convolutional Neural Network, processing event- and depth data for gesture recognition. The network is simulated using the open-source neuromorphic computing framework LAVA for offline training and evaluation on an embedded system. For the evaluation three open source data sets are used. Since these do not represent the applied bi-modality, a new data set with synchronized event- and depth data was recorded. The results show the viability of temporal encoding on depth information and modality fusion, even on differently encoded data, to be beneficial to network performance and generalization capabilities.

dataset, depth data, recognition, (13 more...)

arXiv.org Artificial Intelligence

2401.17064

Country: Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Vision > Gesture Recognition (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback