AITopics

2508.13319

Country: Asia > India (0.15)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.52)
(2 more...)

Mitra, Sirshapan, Rawat, Yogesh S.

GaitCrafter: Diffusion Model for Biometric Preserving Gait Synthesis

arXiv.org Artificial IntelligenceAug-20-2025

Gait recognition is a valuable biometric task that enables the identification of individuals from a distance based on their walking patterns. However, it remains limited by the lack of large-scale labeled datasets and the difficulty of collecting diverse gait samples for each individual while preserving privacy. To address these challenges, we propose GaitCrafter, a diffusion-based framework for synthesizing realistic gait sequences in the silhouette domain. Unlike prior works that rely on simulated environments or alternative generative models, GaitCrafter trains a video diffusion model from scratch, exclusively on gait silhouette data. Our approach enables the generation of temporally consistent and identity-preserving gait sequences. Moreover, the generation process is controllable-allowing conditioning on various covariates such as clothing, carried objects, and view angle. We show that incorporating synthetic samples generated by GaitCrafter into the gait recognition pipeline leads to improved performance, especially under challenging conditions. Additionally, we introduce a mechanism to generate novel identities-synthetic individuals not present in the original dataset-by interpolating identity embeddings. These novel identities exhibit unique, consistent gait patterns and are useful for training models while maintaining privacy of real subjects. Overall, our work takes an important step toward leveraging diffusion models for high-quality, controllable, and privacy-aware gait data generation.

machine learning, pattern recognition, recognition, (15 more...)

2508.133

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.76)

Neural Information Processing SystemsAug-18-2025, 16:50:12 GMT

Efficient Graph Similarity Computation with Alignment Regularization

We consider the graph similarity computation (GSC) task based on graph edit distance (GED) estimation. State-of-the-art methods treat GSC as a learning-based prediction task using Graph Neural Networks (GNNs).

artificial intelligence, machine learning, pattern recognition, (19 more...)

Country: Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

arXiv.org Artificial IntelligenceAug-18-2025

Handwritten Text Recognition of Historical Manuscripts Using Transformer-Based Models

Meoded, Erez

Historical handwritten text recognition (HTR) is essential for unlocking the cultural and scholarly value of archival documents, yet digitization is often hindered by scarce transcriptions, linguistic variation, and highly diverse handwriting styles. In this study, we apply TrOCR, a state-of-the-art transformer-based HTR model, to 16th-century Latin manuscripts authored by Rudolf Gwalther. We investigate targeted image preprocessing and a broad suite of data augmentation techniques, introducing four novel augmentation methods designed specifically for historical handwriting characteristics. We also evaluate ensemble learning approaches to leverage the complementary strengths of augmentation-trained models. On the Gwalther dataset, our best single-model augmentation (Elastic) achieves a Character Error Rate (CER) of 1.86, while a top-5 voting ensemble achieves a CER of 1.60 - representing a 50% relative improvement over the best reported TrOCR_BASE result and a 42% improvement over the previous state of the art. These results highlight the impact of domain-specific augmentations and ensemble strategies in advancing HTR performance for historical manuscripts.

machine learning, natural language, pattern recognition, (13 more...)

2508.11499

Country:

Asia (0.93)
North America > United States (0.16)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsAug-17-2025, 16:05:10 GMT

CEDe: A collection of expert-curated datasets with atom-level entity annotations for Optical Chemical Structure Recognition

Chemical information in scientific literature is commonly presented in various ways, such as text, tables, charts, and images.

machine learning, natural language, pattern recognition, (18 more...)

Country:

North America > United States > California > Orange County > Irvine (0.04)
Europe > United Kingdom > England > West Midlands > Birmingham (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Industry:

Law > Intellectual Property & Technology Law (0.69)
Health & Medicine > Pharmaceuticals & Biotechnology (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.93)
(2 more...)

Neural Information Processing SystemsAug-16-2025, 16:00:23 GMT

CoMIR: Contrastive Multimodal Image Representation for Registration

We propose contrastive coding to learn shared, dense image representations, referred to as CoMIRs (Contrastive Multimodal Image Representations).

comir, registration, representation, (13 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.05)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Michigan (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.93)
Health & Medicine > Diagnostic Medicine > Imaging (0.68)
Government (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.48)

Neural Information Processing SystemsAug-15-2025, 19:19:59 GMT

8bf1211fd4b7b94528899de0a43b9fb3-Paper.pdf

data mining, machine learning, pattern recognition, (23 more...)

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > New York (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
Asia > China > Jiangsu Province > Yancheng (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Information Management > Search (0.71)
(4 more...)

Neural Information Processing SystemsAug-15-2025, 12:22:31 GMT

8171ac2c5544a5cb54ac0f38bf477af4-Paper.pdf

circle intersection, opération, sdm, (15 more...)

Country:

North America (0.14)
South America > Uruguay > Maldonado > Maldonado (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(2 more...)

Neural Information Processing SystemsAug-14-2025, 20:52:42 GMT

Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition

They split every 2D image into a fixed number of patches, each of which is treated as a token.

arxiv preprint arxiv, dvt, transformer, (10 more...)

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

arXiv.org Artificial IntelligenceAug-13-2025

Fuzzy-Pattern Tsetlin Machine

Hnilov, Artem

The "all-or-nothing" clause evaluation strategy is a core mechanism in the Tsetlin Machine (TM) family of algorithms. In this approach, each clause - a logical pattern composed of binary literals mapped to input data - is disqualified from voting if even a single literal fails. Due to this strict requirement, standard TMs must employ thousands of clauses to achieve competitive accuracy. This paper introduces the Fuzzy-Pattern Tsetlin Machine (FPTM), a novel variant where clause evaluation is fuzzy rather than strict. If some literals in a clause fail, the remaining ones can still contribute to the overall vote with a proportionally reduced score. As a result, each clause effectively consists of sub-patterns that adapt individually to the input, enabling more flexible, efficient, and robust pattern matching. The proposed fuzzy mechanism significantly reduces the required number of clauses, memory footprint, and training time, while simultaneously improving accuracy. On the IMDb dataset, FPTM achieves 90.15% accuracy with only one clause per class, a 50x reduction in clauses and memory over the Coalesced Tsetlin Machine. FPTM trains up to 316x faster (45 seconds vs. 4 hours) and fits within 50 KB, enabling online learning on microcontrollers. Inference throughput reaches 34.5 million predictions/second (51.4 GB/s). On Fashion-MNIST, accuracy reaches 92.18% (2 clauses), 93.19% (20 clauses) and 94.68% (8000 clauses), a ~400x clause reduction compared to the Composite TM's 93.00% (8000 clauses). On the Amazon Sales dataset with 20% noise, FPTM achieves 85.22% accuracy, significantly outperforming the Graph Tsetlin Machine (78.17%) and a Graph Convolutional Neural Network (66.23%).

artificial intelligence, machine learning, pattern recognition, (18 more...)

2508.0835

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Education > Educational Setting > Online (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)