AITopics | Hradiš, Michal

Collaborating Authors

Hradiš, Michal

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AnnoPage Dataset: Dataset of Non-Textual Elements in Documents with Fine-Grained Categorization

Kišš, Martin, Hradiš, Michal, Dvořáková, Martina, Jiroušek, Václav, Kersch, Filip

arXiv.org Artificial IntelligenceMar-28-2025

We introduce the AnnoPage Dataset, a novel collection of 7 550 pages from historical documents, primarily in Czech and German, spanning from 1485 to the present, focusing on the late 19th and early 20th centuries. The dataset is designed to support research in document layout analysis and object detection. Each page is annotated with axis-aligned bounding boxes (AABB) representing elements of 25 categories of non-textual elements, such as images, maps, decorative elements, or charts, following the Czech Methodology of image document processing. The annotations were created by expert librarians to ensure accuracy and consistency. The dataset also incorporates pages from multiple, mainly historical, document datasets to enhance variability and maintain continuity. The dataset is divided into development and test subsets, with the test set carefully selected to maintain the category distribution. We provide baseline results using YOLO and DETR object detectors, offering a reference point for future research.

artificial intelligence, dataset, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2503.22526

Country: Europe > Czechia (0.29)

Genre: Research Report (0.50)

Industry:

Media (1.00)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)

Add feedback

Masked Self-Supervised Pre-Training for Text Recognition Transformers on Large-Scale Datasets

Kišš, Martin, Hradiš, Michal

arXiv.org Artificial IntelligenceMar-28-2025

Self-supervised learning has emerged as a powerful approach for leveraging large-scale unlabeled data to improve model performance in various domains. In this paper, we explore masked self-supervised pre-training for text recognition transformers. Specifically, we propose two modifications to the pre-training phase: progressively increasing the masking probability, and modifying the loss function to incorporate both masked and non-masked patches. We conduct extensive experiments using a dataset of 50M unlabeled text lines for pre-training and four differently sized annotated datasets for fine-tuning. Furthermore, we compare our pre-trained models against those trained with transfer learning, demonstrating the effectiveness of the self-supervised pre-training. In particular, pre-training consistently improves the character error rate of models, in some cases up to 30 % relatively. It is also on par with transfer learning but without relying on extra annotated text lines.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.22513

Country: Europe (0.46)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

Self-supervised Pre-training of Text Recognizers

Kišš, Martin, Hradiš, Michal

arXiv.org Artificial IntelligenceMay-1-2024

In this paper, we investigate self-supervised pre-training methods for document text recognition. Nowadays, large unlabeled datasets can be collected for many research tasks, including text recognition, but it is costly to annotate them. Therefore, methods utilizing unlabeled data are researched. We study self-supervised pre-training methods based on masked label prediction using three different approaches -- Feature Quantization, VQ-VAE, and Post-Quantized AE. We also investigate joint-embedding approaches with VICReg and NT-Xent objectives, for which we propose an image shifting technique to prevent model collapse where it relies solely on positional encoding while completely ignoring the input image. We perform our experiments on historical handwritten (Bentham) and historical printed datasets mainly to investigate the benefits of the self-supervised pre-training techniques with different amounts of annotated target domain data. We use transfer learning as strong baselines. The evaluation shows that the self-supervised pre-training on data from the target domain is very effective, but it struggles to outperform transfer learning from closely related domains. This paper is one of the first researches exploring self-supervised pre-training in document text recognition, and we believe that it will become a cornerstone for future research in this area. We made our implementation of the investigated methods publicly available at https://github.com/DCGM/pero-pretraining.

artificial intelligence, machine learning, pattern recognition, (17 more...)

arXiv.org Artificial Intelligence

2405.0042

Country: Europe > Czechia (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

SoftCTC -- Semi-Supervised Learning for Text Recognition using Soft Pseudo-Labels

Kišš, Martin, Hradiš, Michal, Beneš, Karel, Buchal, Petr, Kula, Michal

arXiv.org Artificial IntelligenceSep-19-2023

This paper explores semi-supervised training for sequence tasks, such as Optical Character Recognition or Automatic Speech Recognition. We propose a novel loss function $\unicode{x2013}$ SoftCTC $\unicode{x2013}$ which is an extension of CTC allowing to consider multiple transcription variants at the same time. This allows to omit the confidence based filtering step which is otherwise a crucial component of pseudo-labeling approaches to semi-supervised learning. We demonstrate the effectiveness of our method on a challenging handwriting recognition task and conclude that SoftCTC matches the performance of a finely-tuned filtering based pipeline. We also evaluated SoftCTC in terms of computational efficiency, concluding that it is significantly more efficient than a na\"ive CTC-based approach for training on multiple transcription variants, and we make our GPU implementation public.

artificial intelligence, machine learning, pattern recognition, (20 more...)

arXiv.org Artificial Intelligence

2212.02135

Country:

Europe > Czechia (0.14)
North America > United States (0.14)
Asia (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.91)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.86)
(3 more...)

Add feedback