Goto

Collaborating Authors

 icdar 2021


Few-Shot Segmentation of Historical Maps via Linear Probing of Vision Foundation Models

arXiv.org Artificial Intelligence

As rich sources of history, maps provide crucial insights into historical changes, yet their diverse visual representations and limited annotated data pose significant challenges for automated processing. We propose a simple yet effective approach for few-shot segmentation of historical maps, leveraging the rich semantic embeddings of large vision foundation models combined with parameter-efficient fine-tuning. Our method outperforms the state-of-the-art on the Siegfried benchmark dataset in vineyard and railway segmentation, achieving +5% and +13% relative improvements in mIoU in 10-shot scenarios and around +20% in the more challenging 5-shot setting. Additionally, it demonstrates strong performance on the ICDAR 2021 competition dataset, attaining a mean PQ of 67.3% for building block segmentation, despite not being optimized for this shape-sensitive metric, underscoring its generalizability. Notably, our approach maintains high performance even in extremely low-data regimes (10- & 5-shot), while requiring only 689k trainable parameters - just 0.21% of the total model size. Our approach enables precise segmentation of diverse historical maps while drastically reducing the need for manual annotations, advancing automated processing and analysis in the field.


1st Place Solution for ICDAR 2021 Competition on Mathematical Formula Detection

arXiv.org Artificial Intelligence

In this technical report, we present our 1st place solution for the ICDAR 2021 competition on mathematical formula detection (MFD). The MFD task has three key challenges including a large scale span, large variation of the ratio between height and width, and rich character set and mathematical expressions. Considering these challenges, we used Generalized Focal Loss (GFL), an anchor-free method, instead of the anchor-based method, and prove the Adaptive Training Sampling Strategy (ATSS) and proper Feature Pyramid Network (FPN) can well solve the important issue of scale variation. Meanwhile, we also found some tricks, e.g., Deformable Convolution Network (DCN), SyncBN, and Weighted Box Fusion (WBF), were effective in MFD task. Our proposed method ranked 1st in the final 15 teams.


ICDAR 2021 Competition on Components Segmentation Task of Document Photos

arXiv.org Artificial Intelligence

This paper describes the short-term competition on "Components Segmentation Task of Document Photos" that was prepared in the context of the "16th International Conference on Document Analysis and Recognition" (ICDAR 2021). This competition aims to bring together researchers working on the filed of identification document image processing and provides them a suitable benchmark to compare their techniques on the component segmentation task of document images. Three challenge tasks were proposed entailing different segmentation assignments to be performed on a provided dataset. The collected data are from several types of Brazilian ID documents, whose personal information was conveniently replaced. There were 16 participants whose results obtained for some or all the three tasks show different rates for the adopted metrics, like "Dice Similarity Coefficient" ranging from 0.06 to 0.99. Different Deep Learning models were applied by the entrants with diverse strategies to achieve the best results in each of the tasks. Obtained results show that the current applied methods for solving one of the proposed tasks (document boundary detection) are already well stablished. However, for the other two challenge tasks (text zone and handwritten sign detection) research and development of more robust approaches are still required to achieve acceptable results.


ICDAR 2021 Competition on Scientific Table Image Recognition to LaTeX

arXiv.org Artificial Intelligence

Tables present important information concisely in many scientific documents. Visual features like mathematical symbols, equations, and spanning cells make structure and content extraction from tables embedded in research documents difficult. This paper discusses the dataset, tasks, participants' methods, and results of the ICDAR 2021 Competition on Scientific Table Image Recognition to LaTeX. Specifically, the task of the competition is to convert a tabular image to its corresponding LaTeX source code. We proposed two subtasks. In Subtask 1, we ask the participants to reconstruct the LaTeX structure code from an image. In Subtask 2, we ask the participants to reconstruct the LaTeX content code from an image. This report describes the datasets and ground truth specification, details the performance evaluation metrics used, presents the final results, and summarizes the participating methods. Submission by team VCGroup got the highest Exact Match accuracy score of 74% for Subtask 1 and 55% for Subtask 2, beating previous baselines by 5% and 12%, respectively. Although improvements can still be made to the recognition capabilities of models, this competition contributes to the development of fully automated table recognition systems by challenging practitioners to solve problems under specific constraints and sharing their approaches; the platform will remain available for post-challenge submissions at https://competitions.codalab.org/competitions/26979 .


PingAn-VCGroup's Solution for ICDAR 2021 Competition on Scientific Literature Parsing Task B: Table Recognition to HTML

arXiv.org Artificial Intelligence

The ICDAR 2021 competition on scientific literature parsing task B is to reconstruct the table image into an HTML code. In this competition, PubTabNet dataset (v2.0.0) [3] is provided as the official evaluation data, and Tree-Edit-Distance-based similarity (TEDS) metric is used for evaluation. The PubTabNet data set consists of 500,777 training samples, 9,115 validation samples, 9,138 samples for the development stage, and 9,064 samples for the final evaluation stage. For the training and validation data, the ground truth HTML codes and the position of non-empty table cells are provided to the participants. Participants of this competition need to develop a model that can convert images of tabular data into the corresponding HTML code, which should correctly represent the structure of the table and the content of each cell. The labels of samples for the development and the final evaluation stages are preserved by the organizers. We divide this task into four sub-tasks: table structure recognition, text line detection, text line recognition, and box assignment. And several tricks are tried to improve the model. The details of each sub-task will be discussed in the following section.