Book Metadata and Cover Retrieval Using OCR and Google Books API - KDnuggets


Most of the time, the raw data that we need for our data science project is not organized in a neat, well-structured, and insightful table. Rather, this is sometimes stored as text in a scanned document. Words in the document must then be extracted one by one to form a text formatted data cell. This is the task performed by Optical Character Recognition (OCR). As you read the words of this article, be it text or number, your eyes are able to process them by recognizing light and dark patterns that make up characters (e.g., letters, number, punctuation marks, etc.).

