AITopics | Pattern Recognition

Collaborating Authors

Pattern Recognition

"... the research area that studies the operation and design of systems that recognize patterns in data." It includes statistical methods like discriminant analysis, feature extraction, error estimation, cluster analysis.
– Pattern Recognition Laboratory at Delft University of Technology

News Overviews Instructional Materials AI-Alerts Classics

Computational Social Linguistics for Telugu Cultural Preservation: Novel Algorithms for Chandassu Metrical Pattern Recognition

Pavan, Boddu Sri, Sree, Boddu Swathi

arXiv.org Artificial IntelligenceOct-3-2025

This research presents a computational social science approach to preserving Telugu Chandassu, the metrical poetry tradition representing centuries of collective cultural intelligence. We develop the first comprehensive digital framework for analyzing Telugu prosodic patterns, bridging traditional community knowledge with modern computational methods. Our social computing approach involves collaborative dataset creation of 4,651 annotated padyams, expert-validated linguistic patterns, and culturally-informed algorithmic design. The framework includes AksharamTokenizer for prosody-aware tokenization, LaghuvuGuruvu Generator for classifying light and heavy syllables, and PadyaBhedam Checker for automated pattern recognition. Our algorithm achieves 91.73% accuracy on the proposed Chandassu Score, with evaluation metrics reflecting traditional literary standards. This work demonstrates how computational social science can preserve endangered cultural knowledge systems while enabling new forms of collective intelligence around literary heritage. The methodology offers insights for community-centered approaches to cultural preservation, supporting broader initiatives in digital humanities and socially-aware computing systems.

machine learning, natural language, pattern recognition, (19 more...)

arXiv.org Artificial Intelligence

2510.01233

Country: North America (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.73)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

Arbicon-Net: Arbitrary Continuous Geometric Transformation Networks for Image Registration

Jianchun Chen, Lingjing Wang, Xiang Li, Yi Fang

Neural Information Processing SystemsOct-2-2025, 18:38:49 GMT

This paper concerns the undetermined problem of estimating geometric transformation between image pairs.

artificial intelligence, machine learning, pattern recognition, (17 more...)

Neural Information Processing Systems

Country: Asia > Middle East > UAE (0.28)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes

Neural Information Processing SystemsOct-2-2025, 08:26:34 GMT

The task requires subtle reasoning, yet is straightforward to evaluate as a binary classification problem. We provide baseline performance numbers for unimodal models, as well as for multimodal models with various degrees of sophistication.

arxiv preprint arxiv, machine learning, pattern recognition, (16 more...)

Neural Information Processing Systems

Country: North America (0.28)

Industry:

Information Technology > Services (1.00)
Law Enforcement & Public Safety > Terrorism (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Inducing Uncertainty on Open-Weight Models for Test-Time Privacy in Image Recognition

Ashiq, Muhammad H., Triantafillou, Peter, Tseng, Hung Yun, Chrysos, Grigoris G.

arXiv.org Artificial IntelligenceOct-1-2025

A key concern for AI safety remains understudied in the machine learning (ML) literature: how can we ensure users of ML models do not leverage predictions on incorrect personal data to harm others? This is particularly pertinent given the rise of open-weight models, where simply masking model outputs does not suffice to prevent adversaries from recovering harmful predictions. To address this threat, which we call *test-time privacy*, we induce maximal uncertainty on protected instances while preserving accuracy on all other instances. Our proposed algorithm uses a Pareto optimal objective that explicitly balances test-time privacy against utility. We also provide a certifiable approximation algorithm which achieves $(\varepsilon, δ)$ guarantees without convexity assumptions. We then prove a tight bound that characterizes the privacy-utility tradeoff that our algorithms incur. Empirically, our method obtains at least $>3\times$ stronger uncertainty than pretraining with marginal drops in accuracy on various image recognition benchmarks. Altogether, this framework provides a tool to guarantee additional protection to end users.

artificial intelligence, machine learning, pattern recognition, (21 more...)

arXiv.org Artificial Intelligence

2509.11625

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance > Insurance (0.93)
Health & Medicine > Therapeutic Area > Dermatology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.60)

Add feedback

CURA: Size Isnt All You Need -- A Compact Universal Architecture for On-Device Intelligence

Seo, Jae-Bum, Salman, Muhammad, Caceres-Najarro, Lismer Andres

arXiv.org Artificial IntelligenceSep-30-2025

Existing on-device AI architectures for resource-constrained environments face two critical limitations: they lack compactness, with parameter requirements scaling proportionally to task complexity, and they exhibit poor generalizability, performing effectively only on specific application domains (e.g., models designed for regression tasks cannot adapt to natural language processing (NLP) applications). In this paper, we propose CURA, an architecture inspired by analog audio signal processing circuits that provides a compact and lightweight solution for diverse machine learning tasks across multiple domains. Our architecture offers three key advantages over existing approaches: (1) Compactness: it requires significantly fewer parameters regardless of task complexity; (2) Generalizability: it adapts seamlessly across regression, classification, complex NLP, and computer vision tasks; and (3) Complex pattern recognition: it can capture intricate data patterns while maintaining extremely low model complexity. We evaluated CURA across diverse datasets and domains. For compactness, it achieved equivalent accuracy using up to 2,500 times fewer parameters compared to baseline models. For generalizability, it demonstrated consistent performance across four NLP benchmarks and one computer vision dataset, nearly matching specialized existing models (achieving F1-scores up to 90%). Lastly, it delivers superior forecasting accuracy for complex patterns, achieving 1.6 times lower mean absolute error and 2.1 times lower mean squared error than competing models.

machine learning, natural language, pattern recognition, (21 more...)

arXiv.org Artificial Intelligence

2509.24601

Country: North America > United States > Minnesota (0.28)

Genre:

Research Report (0.82)
Overview (0.66)

Industry:

Information Technology (1.00)
Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

CHURRO: Making History Readable with an Open-Weight Large Vision-Language Model for High-Accuracy, Low-Cost Historical Text Recognition

Semnani, Sina J., Zhang, Han, He, Xinyan, Tekgürler, Merve, Lam, Monica S.

arXiv.org Artificial IntelligenceSep-25-2025

Accurate text recognition for historical documents can greatly advance the study and preservation of cultural heritage. Existing vision-language models (VLMs), however, are designed for modern, standardized texts and are not equipped to read the diverse languages and scripts, irregular layouts, and frequent degradation found in historical materials. This paper presents CHURRO, a 3B-parameter open-weight VLM specialized for historical text recognition. The model is trained on CHURRO-DS, the largest historical text recognition dataset to date. CHURRO-DS unifies 155 historical corpora comprising 99,491 pages, spanning 22 centuries of textual heritage across 46 language clusters, including historical variants and dead languages. We evaluate several open-weight and closed VLMs and optical character recognition (OCR) systems on CHURRO-DS and find that CHURRO outperforms all other VLMs. On the CHURRO-DS test set, CHURRO achieves 82.3% (printed) and 70.1% (handwritten) normalized Levenshtein similarity, surpassing the second-best model, Gemini 2.5 Pro, by 1.4% and 6.5%, respectively, while being 15.5 times more cost-effective. By releasing the model and dataset, we aim to enable community-driven research to improve the readability of historical texts and accelerate scholarship.

large language model, machine learning, pattern recognition, (22 more...)

arXiv.org Artificial Intelligence

2509.19768

Country:

North America > United States (1.00)
Europe (1.00)
Asia > Middle East (0.67)

Genre:

Research Report (1.00)
Overview (0.92)

Industry:

Health & Medicine (1.00)
Media (0.69)
Law (0.67)
Government > Military (0.45)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

Qianfan-VL: Domain-Enhanced Universal Vision-Language Models

Dong, Daxiang, Zheng, Mingming, Xu, Dong, Zhuang, Bairong, Zhang, Wenyu, Luo, Chunhua, Wang, Haoran, Zhao, Zijian, Li, Jie, Li, Yuxuan, Zhong, Hanjun, Liu, Mengyue, Chen, Jieting, Li, Shupeng, Tian, Lun, Feng, Yaping, Li, Xin, Jiang, Donggang, Chen, Yong, Xu, Yehua, Qin, Duohao, Feng, Chen, Wang, Dan, Zhang, Henghua, Ha, Jingjing, He, Jinhui, Zhai, Yanfeng, Zheng, Chengxin, Mao, Jiayi, Chen, Jiacheng, Yao, Ruchang, Yuan, Ziye, Wu, Jianmin, Xie, Guangjun, Shen, Dou

arXiv.org Artificial IntelligenceSep-24-2025

We present Qianfan-VL, a series of multimodal large language models ranging from 3B to 70B parameters, achieving state-of-the-art performance through innovative domain enhancement techniques. Our approach employs multi-stage progressive training and high-precision data synthesis pipelines, which prove to be critical technologies for enhancing domain-specific capabilities while maintaining strong general performance. Qianfan-VL achieves comparable results to leading open-source models on general benchmarks, with state-of-the-art performance on benchmarks such as CCBench, SEEDBench IMG, ScienceQA, and MMStar. The domain enhancement strategy delivers significant advantages in OCR and document understanding, validated on both public benchmarks (OCRBench 873, DocVQA 94.75%) and in-house evaluations. Notably, Qianfan-VL-8B and 70B variants incorporate long chain-of-thought capabilities, demonstrating superior performance on mathematical reasoning (MathVista 78.6%) and logical inference tasks. All models are trained entirely on Baidu's Kunlun P800 chips, validating the capability of large-scale AI infrastructure to train SOTA-level multimodal models with over 90% scaling efficiency on 5000 chips for a single task. This work establishes an effective methodology for developing domain-enhanced multimodal models suitable for diverse enterprise deployment scenarios.

large language model, machine learning, pattern recognition, (19 more...)

arXiv.org Artificial Intelligence

2509.18189

Genre: Research Report (0.50)

Industry:

Media (0.67)
Marketing (0.46)
Leisure & Entertainment (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Evaluation of Ensemble Learning Techniques for handwritten OCR Improvement

Preiß, Martin

arXiv.org Artificial IntelligenceSep-23-2025

For the bachelor project 2021 of Professor Lippert's research group, handwritten entries of historical patient records needed to be digitized using Optical Character Recognition (OCR) methods. Since the data will be used in the future, a high degree of accuracy is naturally required. Especially in the medical field this has even more importance. Ensemble Learning is a method that combines several machine learning models and is claimed to be able to achieve an increased accuracy for existing methods. For this reason, Ensemble Learning in combination with OCR is investigated in this work in order to create added value for the digitization of the patient records. It was possible to discover that ensemble learning can lead to an increased accuracy for OCR, which methods were able to achieve this and that the size of the training data set did not play a role here.

data mining, machine learning, pattern recognition, (14 more...)

arXiv.org Artificial Intelligence

2509.16221

Country:

Europe (0.46)
Asia (0.28)

Genre: Research Report > New Finding (0.47)

Industry: Health & Medicine > Health Care Technology > Medical Record (0.54)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.88)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.86)

Add feedback

Optimal Transport for Handwritten Text Recognition in a Low-Resource Regime

Wraight, Petros Georgoulas, Sfikas, Giorgos, Kordonis, Ioannis, Maragos, Petros, Retsinas, George

arXiv.org Artificial IntelligenceSep-23-2025

Handwritten Text Recognition (HTR) is a task of central importance in the field of document image understanding. State-of-the-art methods for HTR require the use of extensive annotated sets for training, making them impractical for low-resource domains like historical archives or limited-size modern collections. This paper introduces a novel framework that, unlike the standard HTR model paradigm, can leverage mild prior knowledge of lexical characteristics; this is ideal for scenarios where labeled data are scarce. We propose an iterative bootstrapping approach that aligns visual features extracted from unlabeled images with semantic word representations using Optimal Transport (OT). Starting with a minimal set of labeled examples, the framework iteratively matches word images to text labels, generates pseudo-labels for high-confidence alignments, and retrains the recognizer on the growing dataset. Numerical experiments demonstrate that our iterative visual-semantic alignment scheme significantly improves recognition accuracy on low-resource HTR benchmarks.

machine learning, pattern recognition, recognition, (19 more...)

arXiv.org Artificial Intelligence

2509.16977

Country: Europe > Greece (0.15)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision > Handwriting Recognition (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Text Recognition (0.61)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.54)

Add feedback

Learning to Align: Addressing Character Frequency Distribution Shifts in Handwritten Text Recognition

Kaliosis, Panagiotis, Pavlopoulos, John

arXiv.org Artificial IntelligenceSep-23-2025

Handwritten text recognition aims to convert visual input into machine-readable text, and it remains challenging due to the evolving and context-dependent nature of handwriting. Character sets change over time, and character frequency distributions shift across historical periods or regions, often causing models trained on broad, heterogeneous corpora to underperform on specific subsets. To tackle this, we propose a novel loss function that incorporates the Wasserstein distance between the character frequency distribution of the predicted text and a target distribution empirically derived from training data. By penalizing divergence from expected distributions, our approach enhances both accuracy and robustness under temporal and contextual intra-dataset shifts. Furthermore, we demonstrate that character distribution alignment can also improve existing models at inference time without requiring retraining by integrating it as a scoring function in a guided decoding scheme. Experimental results across multiple datasets and architectures confirm the effectiveness of our method in boosting generalization and performance. We open source our code at https://github.com/pkaliosis/fada.

large language model, machine learning, pattern recognition, (20 more...)

arXiv.org Artificial Intelligence

2506.09846

Country:

North America > United States (0.28)
Europe > France (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.93)
(2 more...)

Add feedback