Optical Character Recognition
Boosting OCR Accuracy Using Crowdsourcing
Wang, Shuo-Yang (Academia Sinica) | Wang, Ming-Hung (National Taiwan University) | Chen, Kuan-Ta (Academia Sinica)
Book digitizing is an important work in preserving ancient heritages. However, digitizing books contains a series of labor-intensive works, and one of them is to verify optical character recognition (OCR) outcomes. In this paper, we propose a crowdsourceable OCR verification method. Using our method, content holders are able to leverage the power of crowds to complete verification tasks and avoid content leakage. From the experiment results, our method is more efficient and reliable than the traditional method.
Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems
Karger, David R., Oh, Sewoong, Shah, Devavrat
Crowdsourcing systems, in which numerous tasks are electronically distributed to numerous "information piece-workers", have emerged as an effective paradigm for human-powered solving of large scale problems in domains such as image classification, data entry, optical character recognition, recommendation, and proofreading. Because these low-paid workers can be unreliable, nearly all such systems must devise schemes to increase confidence in their answers, typically by assigning each task multiple times and combining the answers in an appropriate manner, e.g. majority voting. In this paper, we consider a general model of such crowdsourcing tasks and pose the problem of minimizing the total price (i.e., number of task assignments) that must be paid to achieve a target overall reliability. We give a new algorithm for deciding which tasks to assign to which workers and for inferring correct answers from the workers' answers. We show that our algorithm, inspired by belief propagation and low-rank matrix approximation, significantly outperforms majority voting and, in fact, is optimal through comparison to an oracle that knows the reliability of every worker. Further, we compare our approach with a more general class of algorithms which can dynamically assign tasks. By adaptively deciding which questions to ask to the next arriving worker, one might hope to reduce uncertainty more efficiently. We show that, perhaps surprisingly, the minimum price necessary to achieve a target reliability scales in the same manner under both adaptive and non-adaptive scenarios. Hence, our non-adaptive approach is order-optimal under both scenarios. This strongly relies on the fact that workers are fleeting and can not be exploited. Therefore, architecturally, our results suggest that building a reliable worker-reputation system is essential to fully harnessing the potential of adaptive designs.
Examples of Artificial Perceptions in Optical Character Recognition and Iris Recognition
Noaica, Cristina M., Badea, Robert, Motoc, Iulia M., Ghica, Claudiu G., Rosoiu, Alin C., Popescu-Bodorin, Nicolaie
This paper assumes the hypothesis that human learning is perception based, and consequently, the learning process and perceptions should not be represented and investigated independently or modeled in different simulation spaces. In order to keep the analogy between the artificial and human learning, the former is assumed here as being based on the artificial perception. Hence, instead of choosing to apply or develop a Computational Theory of (human) Perceptions, we choose to mirror the human perceptions in a numeric (computational) space as artificial perceptions and to analyze the interdependence between artificial learning and artificial perception in the same numeric space, using one of the simplest tools of Artificial Intelligence and Soft Computing, namely the perceptrons. As practical applications, we choose to work around two examples: Optical Character Recognition and Iris Recognition. In both cases a simple Turing test shows that artificial perceptions of the difference between two characters and between two irides are fuzzy, whereas the corresponding human perceptions are, in fact, crisp.
Sequence Labeling with Non-Negative Weighted Higher Order Features
Qian, Xian (University of Texas at Dallas) | Liu, Yang (University of Texas at Dallas)
In sequence labeling, using higher order features leads to high inference complexity. A lot of studies have been conducted to address this problem. In this paper, we propose a new exact decoding algorithm under the assumption that weights of all higher order features are non-negative. In the worst case, the time complexity of our algorithm is quadratic on the number of higher order features. Comparing with existing algorithms, our method is more efficient and easier to implement. We evaluate our method on two sequence labeling tasks: Optical Character Recognition and Chinese part-of-speech tagging. Our experimental results demonstrate that adding higher order features significantly improves the performance while requiring only 30% additional inference time.
Online Semi-Supervised Learning on Quantized Graphs
Valko, Michal, Kveton, Branislav, Huang, Ling, Ting, Daniel
In this paper, we tackle the problem of online semi-supervised learning (SSL). When data arrive in a stream, the dual problems of computation and data storage arise for any SSL method. We propose a fast approximate online SSL algorithm that solves for the harmonic solution on an approximate graph. We show, both empirically and theoretically, that good behavior can be achieved by collapsing nearby points into a set of local "representative points" that minimize distortion. Moreover, we regularize the harmonic solution to achieve better stability properties. We apply our algorithm to face recognition and optical character recognition applications to show that we can take advantage of the manifold structure to outperform the previous methods. Unlike previous heuristic approaches, we show that our method yields provable performance bounds.
Segmentation of Offline Handwritten Bengali Script
Basu, Subhadip, Chaudhuri, Chitrita, Kundu, Mahantapas, Nasipuri, Mita, Basu, Dipak K.
Character segmentation has long been one of the most critical areas of optical character recognition process. Through this operation, an image of a sequence of characters, which may be connected in some cases, is decomposed into sub-images of individual alphabetic symbols. In this paper, segmentation of cursive handwritten script of world's fourth popular language, Bengali, is considered. Unlike English script, Bengali handwritten characters and its components often encircle the main character, making the conventional segmentation methodologies inapplicable. Experimental results, using the proposed segmentation technique, on sample cursive handwritten data containing 218 ideal segmentation points show a success rate of 97.7%. Further feature-analysis on these segments may lead to actual recognition of handwritten cursive Bengali script.
Cognitive Memory Network
James, Alex Pappachen, Dimitrijev, Sima
A resistive memory network that has no crossover wiring is proposed to overcome the hardware limitations to size and functional complexity that is associated with conventional analogue neural networks. The proposed memory network is based on simple network cells that are arranged in a hierarchical modular architecture. Cognitive functionality of this network is demonstrated by an example of character recognition. The network is trained by an evolutionary process to completely recognise characters deformed by random noise, rotation, scaling and shifting
Iterative Learning for Reliable Crowdsourcing Systems
Karger, David R., Oh, Sewoong, Shah, Devavrat
Crowdsourcing systems, in which tasks are electronically distributed to numerous ``information piece-workers'', have emerged as an effective paradigm for human-powered solving of large scale problems in domains such as image classification, data entry, optical character recognition, recommendation, and proofreading. Because these low-paid workers can be unreliable, nearly all crowdsourcers must devise schemes to increase confidence in their answers, typically by assigning each task multiple times and combining the answers in some way such as majority voting. In this paper, we consider a general model of such rowdsourcing tasks, and pose the problem of minimizing the total price (i.e., number of task assignments) that must be paid to achieve a target overall reliability. We give new algorithms for deciding which tasks to assign to which workers and for inferring correct answers from the workers’ answers. We show that our algorithm significantly outperforms majority voting and, in fact, are asymptotically optimal through comparison to an oracle that knows the reliability of every worker.
CrowdSight: Rapidly Prototyping Intelligent Visual Processing Apps
Rodriguez, Mario (University of California, Santa Cruz) | Davis, James (University of California, Santa Cruz)
We describe a framework for rapidly prototyping applications which require intelligent visual processing, but for which reliable algorithms do not yet exist, or for which engineering those algorithms is too costly. The framework, CrowdSight, leverages the power of crowdsourcing to offload intelligent processing to humans, and enables new applications to be built quickly and cheaply, affording system builders the opportunity to validate a concept before committing significant time or capital. Our service accepts requests from users either via email or simple mobile applications, and handles all the communication with a backend human computation platform. We build redundant requests and data aggregation into the system freeing the user from managing these requirements. We validate our framework by building several test applications and verifying that prototypes can be built more easily and quickly than would be the case without the framework.
MobileWorks: A Mobile Crowdsourcing Platform for Workers at the Bottom of the Pyramid
Narula, Prayag (University of California, Berkeley and MobileWorks, Inc.) | Gutheim, Philipp (University of California, Berkeley) | Rolnitzky, David (University of California, Berkeley) | Kulkarni, Anand (University of California, Berkeley) | Hartmann, Björn (University of California, Berkeley)
Existing crowdsourcing markets are often inaccessible to workers living at the bottom of the economic pyramid. We present MobileWorks, a mobile phone-based crowdsourcing platform intended to provide employment to developing world users. MobileWorks provides human optical character recognition (OCR) tasks that can be completed by workers on low-end mobile phones through a web browser. To address the limited screen resolution available on low-end phones, MobileWorks divides documents into many small pieces and sends each piece to a different worker. An initial pilot study with 10 users over a two month period revealed that it is feasible to do basic OCR tasks using a simple mobile web-based application. We find that workers using MobileWorks average 120 tasks per hour at an accuracy rate of 99% using a multiple entry solution. In addition, users had a positive experience with MobileWorks: all study participants would recommend MobileWorks to friends and family.