Goto

Collaborating Authors

 Optical Character Recognition


Supreme Court rejects challenge to Google book-scanning project

PCWorld

Fair use also allows for people to transform the original content into a new type of work, and that transformation of the printed books was part of Google's argument in this case. The group wanted the Supreme Court to "recognize Google's seizure of property as a serious threat to writers and their livelihoods, one which will affect the depth, resilience, and vitality of our intellectual culture," the Authors Guild said on a webpage detailing the case. The Supreme Court decision gave authors a "colossal loss," Authors Guild President Roxana Robinson said in a statement. Google Books project may lead to a short-term public benefit, but it will come at the expense of the future vitality of U.S. culture, Robinson added. "The denial of review is further proof that we're witnessing a vast redistribution of wealth from the creative sector to the tech sector, not only with books, but across the spectrum of the arts," she said.


Blind man 'reads' for the first time in 20 YEARS using hi-tech OrCam MyReader glasses

Daily Mail - Science & tech

A lot of people would struggle to get through their daily lives without the help of glasses, but for most, they help make everything they are looking at slightly clearer. But for one man, a new pair glasses is doing much more than that, allowing him to read for the first time in 20 years. Luke Hines was left blind in one eye and with only three per cent vision in the other after an operation to remove a childhood brain tumour in 1997. Luke Hines (pictured) was left blind in one eye and with only three per cent vision in the other after an operation to remove a childhood brain tumour in 1997. He was unable to attend school, has not found work because of his condition and has spent years feeling isolated.


The Panama Papers- It's all about the data!

#artificialintelligence

The latest buzz about Panama Papers has shaken the world. As we all know the Panama Papers is a set of 2.6 TB of data that includes 11.5 million confidential documents with detailed information about more than 214,000 offshore companies listed by the Panamanian corporate service provider Mossack Fonseca. The Panama Papers has set an excellent example for the world about the importance of data science when it comes to analyzing big data. This leak makes us realize that appropriate approaches are needed to handle the challenges of data management for the present and the future. Let's take a deep dive into the Panama Papers and dig down the secret behind the biggest leak ever This leak contains 4.8 million emails, 3 million database entries, 21.5 million PDFs, around one million images and 320,000 text documents.


Robots Might Be The Future Of Mail Delivery

Huffington Post - Tech news and opinion

TROISDORF, Germany (Reuters) - Germany's Deutsche Post is testing robots that could help postal workers cope with increasing numbers of parcels on their delivery rounds, a company manager said on Thursday. The volume of parcels being delivered by Deutsche Post in Germany is rising steadily as more and more Germans buy goods online from retailers such as Amazon.com That is making up for declining letter volumes, but posing problems due to the larger size of items involved. "Robots could be used in deliveries in three to five years' time," Clemens Beckmann, head of innovation at the group's parcel and letter division, said in an interview with Reuters. The robots, which look like a table on wheels on which goods can be placed, would follow delivery workers, helping them to transport and carry heavy parcels. If the postie stops walking, the robot stops too, and it only starts again when they move on.


Google Opens Cloud Vision API Beta to Entire Developer Community

#artificialintelligence

Today, Google announced the beta release of its Google Cloud Vision API. The API was designed to empower applications to both see and understand images submitted to the API. With powerful features such as label/entity detection, optical character recognition, safe search detection, facial detection, landmark detection, and logo detection; the Cloud Vision API gives applications unprecedented ability to comprehend the situation within an image. With the new API, Google enters a rapidly developing market where both startups and major enterprises are producing cutting edge technology. From Microsoft, with its Project Oxford, to niche startups like Cognitec and Lambda Labs; image analysis is proving to be an attractive space as it appeals across industries from marketing to security.


Font Identification in Historical Documents Using Active Learning

arXiv.org Machine Learning

Identifying the type of font (e.g., Roman, Blackletter) used in historical documents can help optical character recognition (OCR) systems produce more accurate text transcriptions. Towards this end, we present an active-learning strategy that can significantly reduce the number of labeled samples needed to train a font classifier. Our approach extracts image-based features that exploit geometric differences between fonts at the word level, and combines them into a bag-of-word representation for each page in a document. We evaluate six sampling strategies based on uncertainty, dissimilarity and diversity criteria, and test them on a database containing over 3,000 historical documents with Blackletter, Roman and Mixed fonts. Our results show that a combination of uncertainty and diversity achieves the highest predictive accuracy (89% of test cases correctly classified) while requiring only a small fraction of the data (17%) to be labeled. We discuss the implications of this result for mass digitization projects of historical documents.


Calibrated Structured Prediction

Neural Information Processing Systems

In user-facing applications, displaying calibrated confidence measures---probabilities that correspond to true frequency---can be as important as obtaining high accuracy. We are interested in calibration for structured prediction problems such as speech recognition, optical character recognition, and medical diagnosis. Structured prediction presents new challenges for calibration: the output space is large, and users may issue many types of probability queries (e.g., marginals) on the structured output. We extend the notion of calibration so as to handle various subtleties pertaining to the structured setting, and then provide a simple recalibration method that trains a binary classifier to predict probabilities of interest. We explore a range of features appropriate for structured recalibration, and demonstrate their efficacy on three real-world datasets.


Moral Reminder as a Way to Improve Worker Performance on Amazon Mechanical Turk

AAAI Conferences

The present study explores a method to reduce abusive worker behavior on Amazon Mechanical Turk (AMT), namely reminding workers of moral standards. We manipulated workers’ awareness of moral standards via the presence or the absence of an honesty statement in a survey. The results showed that the honesty statement significantly improved workers’ performance during the first half of the survey. This suggests that a moral reminder is a simple and efficient way to reduce abusive worker behavior in a relatively short survey on AMT.


Automatic Assessment of OCR Quality in Historical Documents

AAAI Conferences

Mass digitization of historical documents is a challenging problem for optical character recognition (OCR) tools. Issues include noisy backgrounds and faded text due to aging, border/marginal noise, bleed-through, skewing, warping, as well as irregular fonts and page layouts. As a result, OCR tools often produce a large number of spurious bounding boxes (BBs) in addition to those that correspond to words in the document. This paper presents an iterative classification algorithm to automatically label BBs (i.e., as text or noise) based on their spatial distribution and geometry. The approach uses a rule-base classifier to generate initial text/noise labels for each BB, followed by an iterative classifier that refines the initial labels by incorporating local information to each BB, its spatial location, shape and size. When evaluated on a dataset containing over 72,000 manually-labeled BBs from 159 historical documents, the algorithm can classify BBs with 0.95 precision and 0.96 recall. Further evaluation on a collection of 6,775 documents with ground-truth transcriptions shows that the algorithm can also be used to predict document quality (0.7 correlation) and improve OCR transcriptions in 85% of the cases.


Analogical Dissimilarity: Definition, Algorithms and Two Experiments in Machine Learning

arXiv.org Artificial Intelligence

This paper defines the notion of analogical dissimilarity between four objects, with a special focus on objects structured as sequences. Firstly, it studies the case where the four objects have a null analogical dissimilarity, i.e. are in analogical proportion. Secondly, when one of these objects is unknown, it gives algorithms to compute it. Thirdly, it tackles the problem of defining analogical dissimilarity, which is a measure of how far four objects are from being in analogical proportion. In particular, when objects are sequences, it gives a definition and an algorithm based on an optimal alignment of the four sequences. It gives also learning algorithms, i.e. methods to find the triple of objects in a learning sample which has the least analogical dissimilarity with a given object. Two practical experiments are described: the first is a classification problem on benchmarks of binary and nominal data, the second shows how the generation of sequences by solving analogical equations enables a handwritten character recognition system to rapidly be adapted to a new writer.