Optical Character Recognition
Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training
In cross-lingual speech synthesis, the speech in various languages can be synthesized for a monoglot speaker. Normally, only the data of monoglot speakers are available for model training, thus the speaker similarity is relatively low between the synthesized cross-lingual speech and the native language recordings. Based on the multilingual transformer text-to-speech model, this paper studies a multi-task learning framework to improve the cross-lingual speaker similarity. To further improve the speaker similarity, joint training with a speaker classifier is proposed. Here, a scheme similar to parallel scheduled sampling is proposed to train the transformer model efficiently to avoid breaking the parallel training mechanism when introducing joint training. By using multi-task learning and speaker classifier joint training, in subjective and objective evaluations, the cross-lingual speaker similarity can be consistently improved for both the seen and unseen speakers in the training set.
A guide to text detection and recognition using MMOCR
Optical character recognition (OCR) is a sort of image conversion that basically extracts text from a given image, a document photo, etc. Various applications and technologies, such as Adobe Acrobat and the ML-based tool, such as Tesseract OCR, have been developed to aid with this process. In this article, we will go over tasks performed in the OCR method. Thereafter, we will look into MMOCR, a Python-based application that centralizes all OCR-related operations. Below are major points listed that are to be discussed in this article. Let's first discuss text detection.
Solving CAPTCHAs With Machine Learning to Enable Dark Web Research
A joint academic research project from the United States has developed a method to foil CAPTCHA* tests, reportedly outperforming similar state-of-the-art machine learning solutions by using Generative Adversarial Networks (GANs) to decode the visually complex challenges. Testing the new system against the best current frameworks, the researchers found that their method achieves more than 94.4% success on a carefully curated real-world benchmark dataset, and has proved capable of'eliminating human involvement' when navigating a highly CAPTCHA-protected emerging Dark Net Marketplace, automatically resolving CAPTCHA challenges in a maximum of three attempts. The authors contend that their approach represents a breakthrough for cybersecurity researchers, who traditionally have had to bear the costs of supplying humans-in-the-loop to manually solve CAPTCHAs, usually via crowdsourcing platforms such as Amazon Mechanical Turk (AMT). If the system can prove adaptable and resilient, it may further pave the way for more automated oversight systems, and for the indexing and web-scraping of TOR networks. This could enable scalable and high-volume analyses, as well as the development of new cybersecurity approaches and techniques, which have been hamstrung, to date, by CAPTCHA firewalls.
Artificial Intelligence: FinTech's innovation driver - BusinessWorld Online
FinTech refers to any idea or innovation that improves or optimizes the way individuals or companies conduct financial activities. Early FinTech concentrated on developing add-on products to complement existing financial services. This combination of finance and technology has spawned a slew of valuable goods and services that redefine financial services and make them more accessible to the general public. Some of these products and services include insurance aggregators, mobile wallets, AI investment management advisers, peer-to-peer (P2P) lending and crowdfunding tools, and platforms for trading financial assets. The cutting-edge solutions that contributed to such technologies include Blockchain, Deep Learning, and Artificial Intelligence (AI).
Innovation Award Honorees - CES 2022
OrCam MyEye PRO is a wearable assistive technology device for people who are blind, visually impaired or have reading challenges. It's lightweight, finger-size and magnetically mounts on eyeglass frames. The device instantly reads aloud any printed text (books, menus, signs) and digital screens (computer, smartphone), recognizes faces, and identifies products/bar codes, money notes and colors โ all in real time and offline. The interactive Smart Reading feature enables users to tailor their assistive reading experience, and Orientation assists with guidance and identification of objects. Newly released "Hey OrCam" enables control of all device features and settings hands-free, using voice commands.
Guided-TTS:Text-to-Speech with Untranscribed Speech
Kim, Heeseung, Kim, Sungwon, Yoon, Sungroh
Most neural text-to-speech (TTS) models require
An Automatic Approach for Generating Rich, Linked Geo-Metadata from Historical Map Images
Li, Zekun, Chiang, Yao-Yi, Tavakkol, Sasan, Shbita, Basel, Uhl, Johannes H., Leyk, Stefan, Knoblock, Craig A.
Historical maps contain detailed geographic information difficult to find elsewhere covering long-periods of time (e.g., 125 years for the historical topographic maps in the US). However, these maps typically exist as scanned images without searchable metadata. Existing approaches making historical maps searchable rely on tedious manual work (including crowd-sourcing) to generate the metadata (e.g., geolocations and keywords). Optical character recognition (OCR) software could alleviate the required manual work, but the recognition results are individual words instead of location phrases (e.g., "Black" and "Mountain" vs. "Black Mountain"). This paper presents an end-to-end approach to address the real-world problem of finding and indexing historical map images. This approach automatically processes historical map images to extract their text content and generates a set of metadata that is linked to large external geospatial knowledge bases. The linked metadata in the RDF (Resource Description Framework) format support complex queries for finding and indexing historical maps, such as retrieving all historical maps covering mountain peaks higher than 1,000 meters in California. We have implemented the approach in a system called mapKurator. We have evaluated mapKurator using historical maps from several sources with various map styles, scales, and coverage. Our results show significant improvement over the state-of-the-art methods. The code has been made publicly available as modules of the Kartta Labs project at https://github.com/kartta-labs/Project.
Here's how AI can transform the lives of disabled
Many believe that artificial intelligence is a futuristic concept that we only see in sci-fi movies with humanoid robots and holograms. However, it is becoming rooted in our reality, affecting various fields and groups, including persons with disabilities. Accessibility and inclusivity are genuinely revolutionized, thanks to artificial intelligence! People with disabilities can substantially enhance their daily life thanks to AI technology solutions. We've already shown how smartphones can be tools for people with vision impairments.
Book Metadata and Cover Retrieval Using OCR and Google Books API - KDnuggets
Most of the time, the raw data that we need for our data science project is not organized in a neat, well-structured, and insightful table. Rather, this is sometimes stored as text in a scanned document. Words in the document must then be extracted one by one to form a text formatted data cell. This is the task performed by Optical Character Recognition (OCR). As you read the words of this article, be it text or number, your eyes are able to process them by recognizing light and dark patterns that make up characters (e.g., letters, number, punctuation marks, etc.).