AITopics

Technology: Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)

arXiv.org Artificial IntelligenceAug-26-2022

An End-to-End OCR Framework for Robust Arabic-Handwriting Recognition using a Novel Transformers-based Model and an Innovative 270 Million-Words Multi-Font Corpus of Classical Arabic with Diacritics

Mostafa, Aly, Mohamed, Omar, Ashraf, Ali, Elbehery, Ahmed, Jamal, Salma, Salah, Anas, Ghoneim, Amr S.

This research is the second phase in a series of investigations on developing an Optical Character Recognition (OCR) of Arabic historical documents and examining how different modeling procedures interact with the problem. The first research studied the effect of Transformers on our custom-built Arabic dataset. One of the downsides of the first research was the size of the training data, a mere 15000 images from our 30 million images, due to lack of resources. Also, we add an image enhancement layer, time and space optimization, and Post-Correction layer to aid the model in predicting the correct word for the correct context. Notably, we propose an end-to-end text recognition approach using Vision Transformers as an encoder, namely BEIT, and vanilla Transformer as a decoder, eliminating CNNs for feature extraction and reducing the model's complexity. The experiments show that our end-to-end model outperforms Convolutions Backbones. The model attained a CER of 4.46%.

dataset, neural network, segmentation, (14 more...)

2208.11484

Country:

Africa > Middle East > Egypt (0.04)
Europe > Switzerland > Fribourg > Fribourg (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.89)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.89)

Maniati, Georgia, Vioni, Alexandra, Ellinas, Nikolaos, Nikitaras, Karolos, Klapsas, Konstantinos, Sung, June Sig, Jho, Gunu, Chalamandaris, Aimilios, Tsiakoulis, Pirros

SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis

arXiv.org Artificial IntelligenceAug-24-2022

In this work, we present the SOMOS dataset, the first large-scale mean opinion scores (MOS) dataset consisting of solely neural text-to-speech (TTS) samples. It can be employed to train automatic MOS prediction systems focused on the assessment of modern synthesizers, and can stimulate advancements in acoustic model evaluation. It consists of 20K synthetic utterances of the LJ Speech voice, a public domain speech dataset which is a common benchmark for building neural acoustic models and vocoders. Utterances are generated from 200 TTS systems including vanilla neural acoustic models as well as models which allow prosodic variations. An LPCNet vocoder is used for all systems, so that the samples' variation depends only on the acoustic models. The synthesized utterances provide balanced and adequate domain and length coverage. We collect MOS naturalness evaluations on 3 English Amazon Mechanical Turk locales and share practices leading to reliable crowdsourced annotations for this task. We provide baseline results of state-of-the-art MOS prediction models on the SOMOS dataset and show the limitations that such models face when assigned to evaluate TTS utterances.

artificial intelligence, machine learning, optical character recognition, (19 more...)

doi: 10.21437/Interspeech.2022-10922

2204.0304

Country:

South America > Colombia > Meta Department > Villavicencio (0.04)
North America > United States (0.04)
North America > Canada (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Semiconductors & Electronics (0.51)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.91)

Mukhopadhyay, Abhishek, Agarwal, Shubham, Zwick, Patrick Dylan, Biswas, Pradipta

To show or not to show: Redacting sensitive text from videos of electronic displays

arXiv.org Artificial IntelligenceAug-19-2022

This, combined with major developments in computer vision and machine learning technology, has created enormous opportunities to make life better through the collection and utilization of this video data. Potential applications here range from improved security to interactive entertainment. However, the collection and utilization of this data also entails ethical privacy concerns and the potential for unwanted intrusion into people's lives without their permission. One way to attempt to achieve the benefits of more omnipresent video collection while mitigating the intrusion on privacy is through the automatic redaction of personally identifiable information (PII). This means automatically removing or obscuring content from video data that can be used to identify an individual while maintaining as much other video data as possible. A relatively new context generating a significant amount of video data is the cabins of automobiles.

phone number, redacting sensitive text, video, (14 more...)

2208.1027

Country:

Asia > South Korea > Seoul > Seoul (0.07)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.05)
Asia > India > Karnataka > Bengaluru (0.05)

Genre: Research Report (0.83)

Industry:

Information Technology > Security & Privacy (1.00)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.31)

PCWorldAug-18-2022, 14:33:20 GMT

The next PowerToy will give your PC easy OCR powers

We love us some PowerToys here at PCWorld, and it seems the semi-official add-on for Windows power users is only getting better. A recent update to the GitHub project for PowerToys indicates that "PowerOCR" is in the latter stages of approval and should land in the official app before too long. The tool will add in an easy way for Windows users to activate Optical Character Recognition (OCR) via a quick screenshot interface. NeoWin spotted the changes to the PowerToys Github, with extensive documentation of the new PowerOCR tool and an apparent approval nod from a Microsoft manager. The tool is mostly the work of independent developer Joesph Finney, contributing code that works similarly to his paid Text Grab app.

documentation, pc easy ocr power, powertoy, (1 more...)

PCWorld

Technology: Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.64)

Gaman, Mihaela, Ghadamiyan, Lida, Ionescu, Radu Tudor, Popescu, Marius

Self-paced learning to improve text row detection in historical documents with missing labels

arXiv.org Artificial IntelligenceAug-15-2022

An important preliminary step of optical character recognition systems is the detection of text rows. To address this task in the context of historical data with missing labels, we propose a self-paced learning algorithm capable of improving the row detection performance. We conjecture that pages with more ground-truth bounding boxes are less likely to have missing annotations. Based on this hypothesis, we sort the training examples in descending order with respect to the number of ground-truth bounding boxes, and organize them into k batches. Using our self-paced learning method, we train a row detector over k iterations, progressively adding batches with less ground-truth annotations. At each iteration, we combine the ground-truth bounding boxes with pseudo-bounding boxes (bounding boxes predicted by the model itself) using non-maximum suppression, and we include the resulting annotations at the next training iteration. We demonstrate that our self-paced learning strategy brings significant performance gains on two data sets of historical documents, improving the average precision of YOLOv4 with more than 12% on one data set and 39% on the other.

detection, historical document, text row detection, (11 more...)

2201.12216

Country: Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.05)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.88)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.69)

#artificialintelligenceAug-10-2022, 02:42:17 GMT

Level Up Your AI Skillset and Dive Into The Deep End Of TinyML

Machine learning (ML) is a growing field, gaining popularity in academia, industry, and among makers. We will take a look at some of the available tools to help make machine learning easier, but first, let's review some of the terms commonly used in machine learning. John McCarthy provides a definition of artificial intelligence (AI) in his 2007 Stanford paper, "What is Artificial Intelligence?" In it, he says AI "is the science and engineering of making intelligent machines, especially intelligent computer programs." This definition is extremely broad, as McCarthy defines intelligence as "the computational part of the ability to achieve goals in the world." As a result, any program that achieves some goal can easily be classified as artificial intelligence. In her article "Machine Learning on Microcontrollers" (Make: Vol.

application, dataset, machine learning, (13 more...)

Country: Europe > Czechia > Prague (0.05)

Industry: Education (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.47)

#artificialintelligenceAug-9-2022, 09:00:51 GMT

Leaks - chessie rae

Official - OCR - Convert image to text Multi Language Marks-Man submitted a new resource: [TheJavaSea] OCR - Convert image to text - Perform basic OCR (Optical Character Recognition) in English and 100 other languages. Our OCR application allows you to perform basic OCR (Optical Character Recognition) in English and 100 other languages. You must upgrade your account or reply to the thread to see the hidden content.

chessie rae, ocr, optical character recognition, (4 more...)

Technology: Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)

arXiv.org Artificial IntelligenceAug-7-2022

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

Ren, Yi, Hu, Chenxu, Tan, Xu, Qin, Tao, Zhao, Sheng, Zhao, Zhou, Liu, Tie-Yan

Non-autoregressive text to speech (TTS) models such as FastSpeech can synthesize speech significantly faster than previous autoregressive models with comparable quality. The training of FastSpeech model relies on an autoregressive teacher model for duration prediction (to provide more information as input) and knowledge distillation (to simplify the data distribution in output), which can ease the one-to-many mapping problem (i.e., multiple speech variations correspond to the same text) in TTS. However, FastSpeech has several disadvantages: 1) the teacher-student distillation pipeline is complicated and time-consuming, 2) the duration extracted from the teacher model is not accurate enough, and the target mel-spectrograms distilled from teacher model suffer from information loss due to data simplification, both of which limit the voice quality. In this paper, we propose FastSpeech 2, which addresses the issues in FastSpeech and better solves the one-to-many mapping problem in TTS by 1) directly training the model with ground-truth target instead of the simplified output from teacher, and 2) introducing more variation information of speech (e.g., pitch, energy and more accurate duration) as conditional inputs. Specifically, we extract duration, pitch and energy from speech waveform and directly take them as conditional inputs in training and use predicted values in inference. We further design FastSpeech 2s, which is the first attempt to directly generate speech waveform from text in parallel, enjoying the benefit of fully end-to-end inference. Experimental results show that 1) FastSpeech 2 achieves a 3x training speed-up over FastSpeech, and FastSpeech 2s enjoys even faster inference speed; 2) FastSpeech 2 and 2s outperform FastSpeech in voice quality, and FastSpeech 2 can even surpass autoregressive models. Audio samples are available at https://speechresearch.github.io/fastspeech2/.

artificial intelligence, machine learning, optical character recognition, (17 more...)

2006.04558

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > China > Beijing > Beijing (0.04)
Africa > Angola > Namibe Province > South Atlantic Ocean (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.73)
Information Technology > Artificial Intelligence > Speech > Acoustic Processing (0.69)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

#artificialintelligenceAug-2-2022, 03:10:49 GMT

Machine Learning Engineer, Text-To-Speech (TTS) Research

Find open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general, filtered by job title or popular skill, toolset and products used.

machine learning engineer, text-to-speech, tts

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.63)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.40)