table detection and extraction
Table Detection and Extraction -- TableNet, Deep Learning model with PyTorch from images
The loss function that will be used for this model is torch.nn.BCEWithLogitsLoss() this loss function is a combination of the Sigmoid and the Binary Cross Entropy Loss functions, you can read more about it here. The train function returns a metric dictionary containing the F1 Score, Accuracy, Precision, Recall, and Loss for the current epoch. Note that F1 Score as I said takes into account the recall and precision but I wanted to know which one of these is better or worse. The test function is very similar to the train function and returns the F1 Score, Accuracy, Precision, Recall, and Loss for the current epoch. The model is trained for about 100 epochs with early stopping. In each epoch, I use both the train_on_epoch and the test_on_epoch functions, display them, and check them against the last epoch scores.
TAO: System for Table Detection and Extraction from PDF Documents
Perez-Arriaga, Martha O. (University of New Mexico) | Estrada, Trilce (University of New Mexico) | Abad-Mota, Soraya (University of New Mexico)
Digital documents present knowledge in most areas of study, exchanging and communicating information in a portable way. To better use the knowledge embedded in an ever-growing information source, effective tools for automatic information extraction are needed. Tables are crucial information elements in documents of scientific nature. Most publications use tables to represent and report concrete findings of research. Current methods used to extract table data from PDF documents lack precision in detecting, extracting, and representing data from diverse layouts. We present the system TAble Organization (TAO) to automatically detect, extract and organize information from tables in PDF documents. TAO uses a processing, based on the k-nearest neighbor method and layout heuristics, to detect tables within a document and to extract table information. This system generates an enriched representation of the data extracted from tables in the PDF documents. TAO’s performance is comparable to other table extraction methods, but it overcomes some related work limitations and proves to be more robust in experiments with diverse document layouts.