Our second example deals with a more challenging problem: the recognition of hand-printed letters of the alphabet. The characters that people print in the ordinary course of filling out forms and questionnaires are surprisingly varied. Gaps abound wherecontinuous lines might be expected; curves and sharp angles appear interchangeably; there is almost every imaginable distortion of slant, shape and size. Even human readers cannot always identify such characters; their error rate is about 3 per cent on randomly selected letters and numbers, seen out of context.
– from Oliver G. Selfridge & Ulric Neisser. PATTERN RECOGNITION BY MACHINE . In Computers & thought, Edward A. Feigenbaum and Julian Feldman (Eds.). MIT Press, Cambridge, MA, USA, 1963. pp. 8-30.
OCR software have been critical to businesses looking to grow quickly by leveraging digital workflows & automated processes. OCR software automate data capture from scanned documents/images and digitize the data in convenient, editable formats that fit into organizational workflows. Scanning & processing documents such as invoices, receipts, and images for valuable data has traditionally been a manual process fraught with errors and delays. OCR software solutions help businesses save time and resources that would otherwise be spent on data entry & manual validation/verification. Modern OCR software are fast, accurate and can handle common document processing constraints such as poorly formatted scans, handwritten documents, low quality images/scans, and blemishes that would have traditionally required extended manual interventions. More and more organizations are automating document processing workflows to go paperless and leverage cloud-based digital solutions that improve bottom lines.
For years the finance industry, which encompasses organizations in financial services, insurance, and banking, has been a strong adopter of artificial intelligence (AI). Financial organizations are using artificial intelligence in multiple ways, including to improve service, better understand customers, gauge risk and predict market movements, and speed claims processing. For example, chatbots and natural language processing (NLP) assist with customer support, Optical Character Recognition (OCR) helps with the ingestion of information from documents, computer vision analyzes images and videos to speed claim processing, and machine learning models assess risk, detect fraud, and help determine rating and pricing. Results from GlobalData's 2021 ICT Customer Insight survey reveal that between 25-27% of digital spending by companies in finance will go towards artificial intelligence and machine learning. Interestingly, GlobalData's survey indicated that the portion of budget allocated to disruptive technologies is slightly higher for small financial organizations than for the largest businesses, as show in Figure 2.
I never really identified with Tim Allen's "more power" mantra from Home Improvement until I became a homeowner. When you live in an apartment, you don't need to have an extra-powerful chainsaw to cut through tough-as-steel tree limbs, or need an excavator to clear out dump truck after dump truck of dirt to make sure water drains away from your house. But when you own a home, many homeowner projects call out for "more power." Likewise, if you're not dealing with a tremendous number of paper documents, you might not understand why you'd need a scanner with More Power. In fact, you might not even identify with needing a scanner.
Yann Andre LeCun, a French computer scientist who focuses on machine learning, computer vision, mobile robotics, and computational neuroscience, recently tweeted that one of his articles has been rejected from NeurIPS 2021. Yann LeCun is a Silver Professor at New York University's Courant Institute of Mathematical Sciences and Vice President, Chief AI Scientist at Facebook. He is well-known for his work on optical character recognition and computer vision using convolutional neural networks (CNNs) and is often regarded as the inventor of convolutional nets. He is also a co-creator of the DjVu image compression technology. The author is a multifaceted individual with academic and industrial experience in artificial intelligence, machine learning, deep learning, computer vision, intelligent data analysis, data mining, data compression, digital library systems, and robotics.
Yeelen Knegtering, CEO & Co-founder of Klippa, is passionate about developing digital products that help people to save time on administrative hassle and spend time on the things they love. With a degree in Information Technology at the University of Groningen, he started Klippa with the idea that there had to be a better way to organize and manage receipts. Now, Klippa is a document digitization company with a focus on digitizing and automating document streams for companies.
Recognizing handwritten text is a problem that can be traced back to the first automatic machines that needed to recognize individual characters in handwritten documents. Think about, for example, the ZIP codes on letters at the post office and the automation needed to recognize these five digits. Perfect recognition of these codes is necessary in order to sort mail automatically and efficiently. Included among the other applications that may come to mind is OCR (Optical Character Recognition) software. OCR software must read handwritten text, or pages of printed books, for general electronic documents in which each character is well defined.
Given a piece of speech and its transcript text, text-based speech editing aims to generate speech that can be seamlessly inserted into the given speech by editing the transcript. Existing methods adopt a two-stage approach: synthesize the input text using a generic text-to-speech (TTS) engine and then transform the voice to the desired voice using voice conversion (VC). A major problem of this framework is that VC is a challenging problem which usually needs a moderate amount of parallel training data to work satisfactorily. In this paper, we propose a one-stage context-aware framework to generate natural and coherent target speech without any training data of the target speaker. In particular, we manage to perform accurate zero-shot duration prediction for the inserted text. The predicted duration is used to regulate both text embedding and speech embedding. Then, based on the aligned cross-modality input, we directly generate the mel-spectrogram of the edited speech with a transformer-based decoder. Subjective listening tests show that despite the lack of training data for the speaker, our method has achieved satisfactory results. It outperforms a recent zero-shot TTS engine by a large margin.
Scanning is one of those technologies that seem like if you need it, you know you need it. But as it turns out, scanning is also a technology where you may not know you need it until an urgency hits, and you need it right this minute. In addition to the urgent personal or business need that comes from moving a paper document from one location or another, scanning can be very helpful for document archiving and search, reducing clutter and physical space, reducing the time it takes to file documents, faster document preparation, and disaster management. I've personally benefited from using scanning during disaster management. The classic case was when my family had to evacuate during a hurricane and yet I had incredibly critical family business that required accessing certain years-old documents.