Digital Peter: Dataset, Competition and Handwriting Recognition Methods

Potanin, Mark, Dimitrov, Denis, Shonenkov, Alex, Bataev, Vladimir, Karachev, Denis, Novopoltsev, Maxim

Mar-16-2021–arXiv.org Artificial Intelligence

This paper presents a new dataset of Peter the Great's manuscripts and describes a segmentation procedure that converts initial images of documents into the lines. The new dataset may be useful for researchers to train handwriting text recognition models as a benchmark for comparing different models. It consists of 9 694 images and text files corresponding to lines in historical documents. The open machine learning competition Digital Peter was held based on the considered dataset. The baseline solution for this competition as well as more advanced methods on handwritten text recognition are described in the article. Full dataset and all code are publicly available.

dataset, digital peter, recognition, (11 more...)

arXiv.org Artificial Intelligence

Mar-16-2021

arXiv.org PDF

Add feedback

Country:
- Asia > Russia (0.69)
- North America > United States
  - California > Los Angeles County > Long Beach (0.04)
- Europe
  - United Kingdom > England
    - Greater London > London (0.04)
  - Russia > Central Federal District
    - Moscow Oblast > Moscow (0.04)
  - Italy > Calabria
    - Catanzaro Province > Catanzaro (0.04)

Genre:
- Research Report (0.40)

Industry:
- Government > Regional Government
  - Europe Government > Russia Government (0.36)
  - Asia Government > Russia Government (0.36)

Technology:
- Information Technology > Artificial Intelligence
  - Vision > Handwriting Recognition (0.88)
  - Machine Learning
    - Neural Networks > Deep Learning (0.94)
    - Pattern Recognition (0.71)
    - Statistical Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found