Optical Character Recognition
HCR-Net: A deep learning based script independent handwritten character recognition network
Chauhan, Vinod Kumar, Singh, Sukhdeep, Sharma, Anuj
Handwritten character recognition (HCR) is a challenging learning problem in pattern recognition, mainly due to similarity in structure of characters, different handwriting styles, noisy datasets and a large variety of languages and scripts. HCR problem is studied extensively for a few decades but there is very limited research on script independent models. This is because of factors, like, diversity of scripts, focus of the most of conventional research efforts on handcrafted feature extraction techniques which are language/script specific and are not always available, and unavailability of public datasets and codes to reproduce the results. On the other hand, deep learning has witnessed huge success in different areas of pattern recognition, including HCR, and provides end-to-end learning, i.e., automated feature extraction and recognition. In this paper, we have proposed a novel deep learning architecture which exploits transfer learning and image-augmentation for end-to-end learning for script independent handwritten character recognition, called HCR-Net. The network is based on a novel transfer learning approach for HCR, where some of lower layers of a pre-trained VGG16 network are utilised. Due to transfer learning and image-augmentation, HCR-Net provides faster training, better performance and better generalisations. The experimental results on publicly available datasets of Bangla, Punjabi, Hindi, English, Swedish, Urdu, Farsi, Tibetan, Kannada, Malayalam, Telugu, Marathi, Nepali and Arabic languages prove the efficacy of HCR-Net and establishes several new benchmarks. For reproducibility of the results and for the advancements of the HCR research, complete code is publicly released at \href{https://github.com/jmdvinodjmd/HCR-Net}{GitHub}.
Artificial intelligence technology to manage smart contracts
Choosing the right contract management software can increase productivity in any company. The main factors are cloud-based and the use of artificial intelligence. Contracts have a direct impact on the success of the company. In order to maintain an overview of the portfolio of contracts and the resulting rights and obligations, automated and clearly defined processes as well as clear lists and dashboards are required. This is especially true when the creation, conclusion and storage of contract documents is decentralized.
Cortical.io's AI makes bulk contract analysis faster and more accurate
All the sessions from Transform 2021 are available on-demand now. In the past, reviewing large stacks of documents was a mind-numbing chore for junior attorneys -- a process that could literally consume months of multiple employees' lives. But innovations in artificial intelligence have enabled Cortical.io Using large quantities of documents as inputs and a semantic folding theory-based natural language understanding system to parse content, Contract Intelligence can transform structured agreements and unstructured documents into comprehensible data. The software is able to search, extract, classify, and compare data from contracts, policies, financial reports, and other documents, including the ability to understand the meanings of concepts and whole sentences -- more than just keywords, which might previously have been extracted and searchable using basic optical character recognition.
Searching for ROI in Artificial Intelligence Deployments
Anyone with any doubts about the interest in AI and its use across enterprise technologies only needs to look at the example of the Intelligent Document Processing (IDP) market and the kind of verticals that are investing in it to quash those doubts. According to the Everest Group's recently published report, Intelligent Document Processing (IDP) State of the Market Report 2021 (purchase required) the market for this segment alone is estimated at $700-750 million in 2020 and expected to grow at a rate of 55-65% over the next year. Cost impact is now the key driver for intelligent document processing adoption, closely followed by improving operational efficiency and productivity. These solutions blend AI technologies to efficiently process all types of documents and feed the output into downstream applications. Optical character recognition (OCR), computer vision, machine learning (ML) and deep learning models, and natural language processing (NLP) are the key core technologies powering IDP capabilities.
Digital Einstein Experience: Fast Text-to-Speech for Conversational AI
Rownicka, Joanna, Sprenkamp, Kilian, Tripiana, Antonio, Gromoglasov, Volodymyr, Kunz, Timo P
We describe our approach to create and deliver a custom voice for a conversational AI use-case. More specifically, we provide a voice for a Digital Einstein character, to enable human-computer interaction within the digital conversation experience. To create the voice which fits the context well, we first design a voice character and we produce the recordings which correspond to the desired speech attributes. We then model the voice. Our solution utilizes Fastspeech 2 for log-scaled mel-spectrogram prediction from phonemes and Parallel WaveGAN to generate the waveforms. The system supports a character input and gives a speech waveform at the output. We use a custom dictionary for selected words to ensure their proper pronunciation. Our proposed cloud architecture enables for fast voice delivery, making it possible to talk to the digital version of Albert Einstein in real-time.
Robust Learning for Text Classification with Multi-source Noise Simulation and Hard Example Mining
Xu, Guowei, Ding, Wenbiao, Fu, Weiping, Wu, Zhongqin, Liu, Zitao
Many real-world applications involve the use of Optical Character Recognition (OCR) engines to transform handwritten images into transcripts on which downstream Natural Language Processing (NLP) models are applied. In this process, OCR engines may introduce errors and inputs to downstream NLP models become noisy. Despite that pre-trained models achieve state-of-the-art performance in many NLP benchmarks, we prove that they are not robust to noisy texts generated by real OCR engines. This greatly limits the application of NLP models in real-world scenarios. In order to improve model performance on noisy OCR transcripts, it is natural to train the NLP model on labelled noisy texts. However, in most cases there are only labelled clean texts. Since there is no handwritten pictures corresponding to the text, it is impossible to directly use the recognition model to obtain noisy labelled data. Human resources can be employed to copy texts and take pictures, but it is extremely expensive considering the size of data for model training. Consequently, we are interested in making NLP models intrinsically robust to OCR errors in a low resource manner. We propose a novel robust training framework which 1) employs simple but effective methods to directly simulate natural OCR noises from clean texts and 2) iteratively mines the hard examples from a large number of simulated samples for optimal performance. 3) To make our model learn noise-invariant representations, a stability loss is employed. Experiments on three real-world datasets show that the proposed framework boosts the robustness of pre-trained models by a large margin. We believe that this work can greatly promote the application of NLP models in actual scenarios, although the algorithm we use is simple and straightforward. We make our codes and three datasets publicly available\footnote{https://github.com/tal-ai/Robust-learning-MSSHEM}.
Anyline nabs $20M to automate mobile data capture for enterprises
Where does your enterprise stand on the AI adoption curve? Take our AI survey to find out. Anyline, a company that builds mobile data capture and scanning technologies for multiple industries, has raised $20 million. Founded out of Vienna, Austria, in 2013, Anyline has developed a range of data capture products such as barcode scanning, optical character recognition (OCR)-powered document scanning, biometric face authentication, serial number scanning, and even driving licensing scanning which enables retailers to easily verify a person's age and identity at the point-of-sale or curbside pickup. Elsewhere, police forces can integrate Anyline's technology to scan all manner of IDs and vehicle license plates to verify drivers instantly, which not only speeds things up but also reduces the chances of errors through traditional manual processes such as typing or broadcasting data across radio. This, according to Anyline CEO and cofounder Lukas Kinigadner, is perhaps the number one benefit Anyline brings to organizations across the spectrum.
9 Top AI and Machine Learning Trends for 2021
AI is getting better at supporting multiple modalities within a single ML model, such as text, vision, speech and IoT sensor data. Developers are starting to find innovative ways to combine modalities to improve common tasks like document understanding, said David Talby, founder and CTO of John Snow Labs, an NLP tools provider. For example, patient data collected and processed by healthcare systems can include visual lab results, genetic sequencing reports, clinical trial forms and other scanned documents. The layout and presentation style of this information, if done right, can help doctors better understand what they're looking at. AI algorithms trained using multi-modal techniques such as machine vison and optical character recognition could optimize the presentation of results, improving medical diagnosis.
Top Data Science News of the Week
Synechron, a leading digital transformation consulting firm launched an annual report, Top Strategic Technology Trends. The report noted data science as one of its eight major trends for 2021, and the company's experts put our three critical trends. The first trend talks about the business applications of self-supervised models, where AI teaches itself to solve problems without human classification of data. The second trend refers to the increased adoption of the Natural Language Generation that uses AI to create several hand-produced documents that are needed every day. The third and final trend is concerned with technologies like ML, Optical Character Recognition, and NLP that will increase efficiency, reduce costs, and detect financial crimes during KYC.