Our second example deals with a more challenging problem: the recognition of hand-printed letters of the alphabet. The characters that people print in the ordinary course of filling out forms and questionnaires are surprisingly varied. Gaps abound wherecontinuous lines might be expected; curves and sharp angles appear interchangeably; there is almost every imaginable distortion of slant, shape and size. Even human readers cannot always identify such characters; their error rate is about 3 per cent on randomly selected letters and numbers, seen out of context.
– from Oliver G. Selfridge & Ulric Neisser. PATTERN RECOGNITION BY MACHINE . In Computers & thought, Edward A. Feigenbaum and Julian Feldman (Eds.). MIT Press, Cambridge, MA, USA, 1963. pp. 8-30.
Anyone with any doubts about the interest in AI and its use across enterprise technologies only needs to look at the example of the Intelligent Document Processing (IDP) market and the kind of verticals that are investing in it to quash those doubts. According to the Everest Group's recently published report, Intelligent Document Processing (IDP) State of the Market Report 2021 (purchase required) the market for this segment alone is estimated at $700-750 million in 2020 and expected to grow at a rate of 55-65% over the next year. Cost impact is now the key driver for intelligent document processing adoption, closely followed by improving operational efficiency and productivity. These solutions blend AI technologies to efficiently process all types of documents and feed the output into downstream applications. Optical character recognition (OCR), computer vision, machine learning (ML) and deep learning models, and natural language processing (NLP) are the key core technologies powering IDP capabilities.
TikTok's text-to-speech feature allows creators to put text over their videos and have a Siri-like voice read it out loud. It's a helpful way to annotate your videos to help describe what's happening, add context, or to serve whatever purpose you see fit. There's also no rule saying you can't use it just to make the text-to-speech voice say silly things. Here's how you can easily add text-to-speech to your TikTok videos. You can cancel it, edit the text, or adjust the duration of the text just by tapping the text again. Once you're happy with your video, just click "Next," apply whatever hashtags you want, and post!
We're a couple of decades into the 21st century, cars are literally starting to fly, a vacation to space is just around the corner ... and yet somehow, computers still sound like parodies of confused robots whenever asked to convert text-to-speech (TTS). Come on, devs, there has to be a better solution. A firm called WellSaid Labs believes it has one, and it's getting a boost thanks to an oversubscribed Series A. "Plain and simple, WellSaid is the future of content creation for voice. This is why thousands of customers love using the product daily with off-the-charts bottom-up adoption. Matt and Michael have assembled a world-class team, and we couldn't be more thrilled to be a part of the WellSaid journey," says Cameron Borumand, General Partner at FUSE, which led the round.
Where does your enterprise stand on the AI adoption curve? Take our AI survey to find out. Anyline, a company that builds mobile data capture and scanning technologies for multiple industries, has raised $20 million. Founded out of Vienna, Austria, in 2013, Anyline has developed a range of data capture products such as barcode scanning, optical character recognition (OCR)-powered document scanning, biometric face authentication, serial number scanning, and even driving licensing scanning which enables retailers to easily verify a person's age and identity at the point-of-sale or curbside pickup. Elsewhere, police forces can integrate Anyline's technology to scan all manner of IDs and vehicle license plates to verify drivers instantly, which not only speeds things up but also reduces the chances of errors through traditional manual processes such as typing or broadcasting data across radio. This, according to Anyline CEO and cofounder Lukas Kinigadner, is perhaps the number one benefit Anyline brings to organizations across the spectrum.
AI is getting better at supporting multiple modalities within a single ML model, such as text, vision, speech and IoT sensor data. Developers are starting to find innovative ways to combine modalities to improve common tasks like document understanding, said David Talby, founder and CTO of John Snow Labs, an NLP tools provider. For example, patient data collected and processed by healthcare systems can include visual lab results, genetic sequencing reports, clinical trial forms and other scanned documents. The layout and presentation style of this information, if done right, can help doctors better understand what they're looking at. AI algorithms trained using multi-modal techniques such as machine vison and optical character recognition could optimize the presentation of results, improving medical diagnosis.
Synechron, a leading digital transformation consulting firm launched an annual report, Top Strategic Technology Trends. The report noted data science as one of its eight major trends for 2021, and the company's experts put our three critical trends. The first trend talks about the business applications of self-supervised models, where AI teaches itself to solve problems without human classification of data. The second trend refers to the increased adoption of the Natural Language Generation that uses AI to create several hand-produced documents that are needed every day. The third and final trend is concerned with technologies like ML, Optical Character Recognition, and NLP that will increase efficiency, reduce costs, and detect financial crimes during KYC.
Have you ever faced a large corpus of text missing capitalization of words? You required to uppercase thousand of words before publishing the text. In this post, I demonstrate how to repair case information in documents automatically. Truecasing is a natural language processing problem of finding the proper capitalization of words within a text where such information is unavailable. Use cases include transcripts from various audio sources, automatic speech recognition, optical character recognition, medical records, online messaging, and gaming.
Visual information extraction (VIE) has attracted increasing attention in recent years. The existing methods usually first organized optical character recognition (OCR) results into plain texts and then utilized token-level entity annotations as supervision to train a sequence tagging model. However, it expends great annotation costs and may be exposed to label confusion, and the OCR errors will also significantly affect the final performance. In this paper, we propose a unified weakly-supervised learning framework called TCPN (Tag, Copy or Predict Network), which introduces 1) an efficient encoder to simultaneously model the semantic and layout information in 2D OCR results; 2) a weakly-supervised training strategy that utilizes only key information sequences as supervision; and 3) a flexible and switchable decoder which contains two inference modes: one (Copy or Predict Mode) is to output key information sequences of different categories by copying a token from the input or predicting one in each time step, and the other (Tag Mode) is to directly tag the input sequence in a single forward pass. Our method shows new state-of-the-art performance on several public benchmarks, which fully proves its effectiveness.