second text column
A Better Way to Process Images for OCR
The previous code worked well in splitting the image into single columns of text for relatively clean images but performed poorly when images were very distorted or'dirty'. The algorithm relied on finding a clean white space between the columns, but there were a lot of black areas on all sides which could not be cut out easily. The new algorithmic approach used OpenCV's EAST text detector to find the bounding boxes of all the text. Then, I created a histogram based on all the leftmost vertical edges of the bounding boxes. "OpenCV's EAST text detector is a deep learning model, based on a novel architecture and training pattern. It is capable of (1) running at near real-time at 13 FPS on 720p images and (2) obtains state-of-the-art text detection accuracy."