Goto

Collaborating Authors

 Verma, Kunal


A Gaussian Process Upsampling Model for Improvements in Optical Character Recognition

arXiv.org Machine Learning

Optical Character Recognition and extraction is a key tool in the automatic evaluation of documents in a financial context. However, the image data provided to automated systems can have unreliable quality, and can be inherently low-resolution or downsampled and compressed by a transmitting program. In this paper, we illustrate the efficacy of a Gaussian Process upsampling model for the purposes of improving OCR and extraction through upsampling low resolution documents.


Joint Extraction and Labeling via Graph Propagation for Dictionary Construction

AAAI Conferences

In this paper, we present an approach that jointly infers the boundaries of tokens and their labels to construct dictionaries for Information Extraction. Our approach for joint-inference is based on graph propagation, and extends it in two novel ways. First, we extend the graph representation to capture ambiguities that occur during the token extraction phase. Second, we modify the labeling phase (i.e., label propagation) to utilize this new representation, allowing evidence from labeling to be used for token extraction. Our evaluation shows these extensions (and hence our approach) significantly improve the performance of the outcome dictionaries over pipeline-based approaches by preventing aggressive commitment. Our evaluation also shows that our extensions over a base graph-propagation framework improve the precision without hurting the recall.


Linked Data Is Merely More Data

AAAI Conferences

In this position paper, we argue that the Linked Open Data (LoD) Cloud, in its current form, is only of limited value for furthering the Semantic Web vision. Being merely a weakly linked triple collection, it will only be of very limited benefit for the AI or Semantic Web communities. We describe the corresponding problems with the LoD Cloud and give directions for research to remedy the situation.