Normalized vs Diplomatic Annotation: A Case Study of Automatic Information Extraction from Handwritten Uruguayan Birth Certificates
Bottaioli, Natalia, Tarride, Solène, Anger, Jérémy, Mowlavi, Seginus, Gardella, Marina, Tadros, Antoine, Facciolo, Gabriele, von Gioi, Rafael Grompone, Kermorvant, Christopher, Morel, Jean-Michel, Preciozzi, Javier
–arXiv.org Artificial Intelligence
This study evaluates the recently proposed Document Attention Network (DAN) for extracting key-value information from Uruguayan birth certificates, handwritten in Spanish. We investigate two annotation strategies for automatically transcribing handwritten documents, fine-tuning DAN with minimal training data and annotation effort. Experiments were conducted on two datasets containing the same images (201 scans of birth certificates written by more than 15 different writers) but with different annotation methods. Our findings indicate that normalized annotation is more effective for fields that can be standardized, such as dates and places of birth, whereas diplomatic annotation performs much better for fields containing names and surnames, which can not be standardized.
arXiv.org Artificial Intelligence
Jul-14-2025
- Country:
- Africa (0.04)
- Asia
- China > Hong Kong (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Europe
- France > Île-de-France
- Germany > Bavaria
- Upper Bavaria > Munich (0.04)
- Switzerland > Vaud
- Lausanne (0.04)
- North America
- Belize (0.04)
- Canada > Quebec (0.04)
- Jamaica (0.04)
- United States > New York
- New York County > New York City (0.04)
- South America
- Brazil > Rio de Janeiro
- Rio de Janeiro (0.04)
- Peru (0.04)
- Uruguay
- Montevideo > Montevideo (0.05)
- Tacuarembó > Tacuarembó (0.05)
- Brazil > Rio de Janeiro
- Genre:
- Personal (0.83)
- Research Report > New Finding (0.48)
- Technology: