Optical Character Recognition
MobileWorks: A Mobile Crowdsourcing Platform for Workers at the Bottom of the Pyramid
Narula, Prayag (University of California, Berkeley and MobileWorks, Inc.) | Gutheim, Philipp (University of California, Berkeley) | Rolnitzky, David (University of California, Berkeley) | Kulkarni, Anand (University of California, Berkeley) | Hartmann, Bjรถrn (University of California, Berkeley)
Existing crowdsourcing markets are often inaccessible to workers living at the bottom of the economic pyramid. We present MobileWorks, a mobile phone-based crowdsourcing platform intended to provide employment to developing world users. MobileWorks provides human optical character recognition (OCR) tasks that can be completed by workers on low-end mobile phones through a web browser. To address the limited screen resolution available on low-end phones, MobileWorks divides documents into many small pieces and sends each piece to a different worker. An initial pilot study with 10 users over a two month period revealed that it is feasible to do basic OCR tasks using a simple mobile web-based application. We find that workers using MobileWorks average 120 tasks per hour at an accuracy rate of 99% using a multiple entry solution. In addition, users had a positive experience with MobileWorks: all study participants would recommend MobileWorks to friends and family.
CrowdSight: Rapidly Prototyping Intelligent Visual Processing Apps
Rodriguez, Mario (University of California, Santa Cruz) | Davis, James (University of California, Santa Cruz)
We describe a framework for rapidly prototyping applications which require intelligent visual processing, but for which reliable algorithms do not yet exist, or for which engineering those algorithms is too costly. The framework, CrowdSight, leverages the power of crowdsourcing to offload intelligent processing to humans, and enables new applications to be built quickly and cheaply, affording system builders the opportunity to validate a concept before committing significant time or capital. Our service accepts requests from users either via email or simple mobile applications, and handles all the communication with a backend human computation platform. We build redundant requests and data aggregation into the system freeing the user from managing these requirements. We validate our framework by building several test applications and verifying that prototypes can be built more easily and quickly than would be the case without the framework.
Human Computation
Human computation is a new and evolving research area that centers around harnessing human intelligence to solve computational problems that are beyond the scope of existing Artificial Intelligence (AI) algorithms. With the growth of the Web, human computation systems can now leverage the abilities of an unprecedented number of people via the Web to perform complex computation. There are various genres of human computation applications that exist today. Games with a purpose (e.g., the ESP Game) specifically target online gamers who generate useful data (e.g., image tags) while playing an enjoyable game. Crowdsourcing marketplaces (e.g., Amazon Mechanical Turk) are human computation systems that coordinate workers to perform tasks in exchange for monetary rewards.
Analogical Dissimilarity: Definition, Algorithms and Two Experiments in Machine Learning
Miclet, L., Bayoudh, S., Delhay, A.
This paper defines the notion of analogical dissimilarity between four objects, with a special focus on objects structured as sequences. Firstly, it studies the case where the four objects have a null analogical dissimilarity, i.e. are in analogical proportion. Secondly, when one of these objects is unknown, it gives algorithms to compute it. Thirdly, it tackles the problem of defining analogical dissimilarity, which is a measure of how far four objects are from being in analogical proportion. In particular, when objects are sequences, it gives a definition and an algorithm based on an optimal alignment of the four sequences. It gives also learning algorithms, i.e. methods to find the triple of objects in a learning sample which has the least analogical dissimilarity with a given object. Two practical experiments are described: the first is a classification problem on benchmarks of binary and nominal data, the second shows how the generation of sequences by solving analogical equations enables a handwritten character recognition system to rapidly be adapted to a new writer.
Comparison of Human and Machine Word Recognition
Schenkel, Markus, Latimer, Cyril, Jabri, Marwan A.
We present a study which is concerned with word recognition rates for heavily degraded documents. We compare human with machine reading capabilities in a series of experiments, which explores the interaction of word/non-word recognition, word frequency and legality of non-words with degradation level. We also study the influence of character segmentation, and compare human performance with that of our artificial neural network model for reading. We found that the proposed computer model uses word context as efficiently as humans, but performs slightly worse on the pure character recognition task. 1 Introduction Optical Character Recognition (OCR) of machine-print document images ยทhas matured considerably during the last decade. Recognition rates as high as 99.5% have been reported on good quality documents. However, for lower image resolutions (200 Dpl and below), noisy images, images with blur or skew, the recognition rate declines considerably. In bad quality documents, character segmentation is as big a problem as the actual character recognition.
Human Reading and the Curse of Dimensionality
Whereas optical character recognition (OCR) systems learn to classify single characters; people learn to classify long character strings in parallel, within a single fixation. This difference is surprising because high dimensionality is associated with poor classification learning. This paper suggests that the human reading system avoids these problems because the number of to-be-classified images is reduced by consistent and optimal eye fixation positions, and by character sequence regularities. An interesting difference exists between human reading and optical character recognition (OCR) systems. The input/output dimensionality of character classification in human reading is much greater than that for OCR systems (see Figure 1). OCR systems classify one character at time; while the human reading system classifies as many as 8-13 characters per eye fixation (Rayner, 1979) and within a fixation, character category and sequence information is extracted in parallel (Blanchard, McConkie, Zola, and Wolverton, 1984; Reicher, 1969).
Human Reading and the Curse of Dimensionality
Whereas optical character recognition (OCR) systems learn to classify single characters; people learn to classify long character strings in parallel, within a single fixation. This difference is surprising because high dimensionality is associated with poor classification learning. This paper suggests that the human reading system avoids these problems because the number of to-be-classified images is reduced by consistent and optimal eye fixation positions, and by character sequence regularities. An interesting difference exists between human reading and optical character recognition (OCR) systems. The input/output dimensionality of character classification in human reading is much greater than that for OCR systems (see Figure 1). OCR systems classify one character at time; while the human reading system classifies as many as 8-13 characters per eye fixation (Rayner, 1979) and within a fixation, character category and sequence information is extracted in parallel (Blanchard, McConkie, Zola, and Wolverton, 1984; Reicher, 1969).
Human Reading and the Curse of Dimensionality
Whereas optical character recognition (OCR) systems learn to classify singlecharacters; people learn to classify long character strings in parallel, within a single fixation. This difference is surprising because high dimensionality is associated with poor classification learning. This paper suggests that the human reading system avoids these problems because the number of to-be-classified images isreduced by consistent and optimal eye fixation positions, and by character sequence regularities. An interesting difference exists between human reading and optical character recognition (OCR)systems. The input/output dimensionality of character classification in human reading is much greater than that for OCR systems (see Figure 1) . OCR systems classify one character at time; while the human reading system classifies as many as 8-13 characters per eye fixation (Rayner, 1979) and within a fixation, character category and sequence information is extracted in parallel (Blanchard, McConkie, Zola, and Wolverton, 1984; Reicher, 1969).
Transformation Invariant Autoassociation with Application to Handwritten Character Recognition
Schwenk, Holger, Milgram, Maurice
When training neural networks by the classical backpropagation algorithm the whole problem to learn must be expressed by a set of inputs and desired outputs. However, we often have high-level knowledge about the learning problem. In optical character recognition (OCR), for instance, we know that the classification should be invariant under a set of transformations like rotation or translation. We propose a new modular classification system based on several autoassociative multilayer perceptrons which allows the efficient incorporation of such knowledge. Results are reported on the NIST database of upper case handwritten letters and compared to other approaches to the invariance problem. 1 INCORPORATION OF EXPLICIT KNOWLEDGE The aim of supervised learning is to learn a mapping between the input and the output space from a set of example pairs (input, desired output). The classical implementation in the domain of neural networks is the backpropagation algorithm. If this learning set is sufficiently representative of the underlying data distributions, one hopes that after learning, the system is able to generalize correctly to other inputs of the same distribution.
Transformation Invariant Autoassociation with Application to Handwritten Character Recognition
Schwenk, Holger, Milgram, Maurice
When training neural networks by the classical backpropagation algorithm the whole problem to learn must be expressed by a set of inputs and desired outputs. However, we often have high-level knowledge about the learning problem. In optical character recognition (OCR), for instance, we know that the classification should be invariant under a set of transformations like rotation or translation. We propose a new modular classification system based on several autoassociative multilayer perceptrons which allows the efficient incorporation of such knowledge. Results are reported on the NIST database of upper case handwritten letters and compared to other approaches to the invariance problem. 1 INCORPORATION OF EXPLICIT KNOWLEDGE The aim of supervised learning is to learn a mapping between the input and the output space from a set of example pairs (input, desired output). The classical implementation in the domain of neural networks is the backpropagation algorithm. If this learning set is sufficiently representative of the underlying data distributions, one hopes that after learning, the system is able to generalize correctly to other inputs of the same distribution.