Goto

Collaborating Authors

 ocr tool


How to Compare OCR Tools: Tesseract OCR vs Amazon Textract vs Azure OCR vs Google OCR

#artificialintelligence

Optical Character Recognition (OCR) tools are software able to detect and extract texts from images. They are used in the early steps of the analysis of scanned documents to recognize and automatically process the information that the documents contain. Depending on the complexity of the documents to be analyzed, OCR tools can be used to both detect and extract the texts from them or, in some pipelines, they are used only to extract the text from previously identified regions of interest, e.g. Paragraphs, Tables, Titles,… The latter case is of my particular interest. For these reasons the article will be focused on the extraction task.


How Machine Learning and A.I Will Help you Acquire a Mortgage

#artificialintelligence

AI is about as big a buzzword that has ever existed in the mortgage industry, on par with automated underwriting, cloud technology, and digital mortgages. Indeed, AI is intrinsically tied to these innovations. AI tools enhance automation, can be delivered through the cloud, and would significantly improve the production of digital mortgages. At the same time, AI is also one of the least understood terms in the mortgage industry. This fact is keeping most mortgage industry participants from realizing its full benefits.


An Actual Application for the MNIST Digits Classifier

#artificialintelligence

Have you ever thought to yourself "I just made a great MNIST classifier! While the handwritten digits dataset is a great, clean way to get into machine learning (on the classification side, anyway), it is rightly dubbed the "Hello World" of the field. You can use it to make a sensible ML pipeline and learn how to implement different kinds of models, but it doesn't have much use past that… until now. One of my first posts here used some basic python data structures and logic to solve Sudoku puzzles about twice as fast as you could blink, but I had to manually enter the numbers into the arrays to prepare the solver. In this post, I'd like to get into how to use some image processing tools and a convolutional neural net to function for optical character recognition (OCR).


How machine learning is revolutionizing journalism - ICIJ

#artificialintelligence

The rise of the machine has freed ICIJ members globally to pore over millions of documents in a custom-built search engine. But even this next-level research has posed substantial challenges: for example, what to do when certain phrases return an indigestible 150,000 results? Clearly, the next step to speeding up our research was to intelligently filter information relevant to each investigation. Here's how we streamlined the previously daunting process, giving us both unprecedented flexibility and the required search success rate. In leaks like the Paradise Papers, we dealt with millions of documents (including PDFs, photos, and emails) that traditional platforms like Excel can't process.