Optical Character Recognition
Apple Books quietly launches AI-narrated audiobooks - The Verge
Apple's website says the feature is initially only available for romance and fiction books, where it lists two available digital voices: Madison and Jackson. The service is only available in English at present, and Apple is oddly specific about the genres of books its digital narrators are able to tackle. "Primary category must be romance or fiction (literary, historical, and women's fiction are eligible; mysteries and thrillers, and science fiction and fantasy are not currently supported)," its website reads.
Convert Text to Speech in Python - DataFlair
Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio with a button click or finger touch. Text to speech python project is very helpful for people who are struggling with reading. To implement this project, we will use the basic concepts of Python, Tkinter, gTTS, and playsound libraries. The objective of this project is to convert the text into voice with the click of a button.
GitHub - jaketae/storyteller: Multimodal AI Story Teller, built with Stable Diffusion, GPT, and neural text-to-speech
A multimodal AI story teller, built with Stable Diffusion, GPT, and neural text-to-speech (TTS). Given a prompt as an opening line of a story, GPT writes the rest of the plot; Stable Diffusion draws an image for each sentence; a TTS model narrates each line, resulting in a fully animated video of a short story, replete with audio and visuals.
Convert text to speech quickly with this intuitive platform
Some operations and tasks don't require painstaking attention to detail. With sensitive salary and wage information, bank and direct deposit accounts, social security numbers, and other personal information in play, the stakes are high. When preparing a payroll run or supporting payroll operations, it's important to follow a ...
ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech
Chen, Zehua, Wu, Yihan, Leng, Yichong, Chen, Jiawei, Liu, Haohe, Tan, Xu, Cui, Yang, Wang, Ke, He, Lei, Zhao, Sheng, Bian, Jiang, Mandic, Danilo
Denoising Diffusion Probabilistic Models (DDPMs) are emerging in text-to-speech (TTS) synthesis because of their strong capability of generating high-fidelity samples. However, their iterative refinement process in high-dimensional data space results in slow inference speed, which restricts their application in real-time systems. Previous works have explored speeding up by minimizing the number of inference steps but at the cost of sample quality. In this work, to improve the inference speed for DDPM-based TTS model while achieving high sample quality, we propose ResGrad, a lightweight diffusion model which learns to refine the output spectrogram of an existing TTS model (e.g., FastSpeech 2) by predicting the residual between the model output and the corresponding ground-truth speech. ResGrad has several advantages: 1) Compare with other acceleration methods for DDPM which need to synthesize speech from scratch, ResGrad reduces the complexity of task by changing the generation target from ground-truth mel-spectrogram to the residual, resulting into a more lightweight model and thus a smaller real-time factor. 2) ResGrad is employed in the inference process of the existing TTS model in a plug-and-play way, without re-training this model. We verify ResGrad on the single-speaker dataset LJSpeech and two more challenging datasets with multiple speakers (LibriTTS) and high sampling rate (VCTK). Experimental results show that in comparison with other speed-up methods of DDPMs: 1) ResGrad achieves better sample quality with the same inference speed measured by real-time factor; 2) with similar speech quality, ResGrad synthesizes speech faster than baseline methods by more than 10 times. Audio samples are available at https://resgrad1.github.io/.
A Comprehensive Gold Standard and Benchmark for Comics Text Detection and Recognition
Soykan, Gรผrkan, Yuret, Deniz, Sezgin, Tevfik Metin
This study focuses on improving the optical character recognition (OCR) data for panels in the COMICS dataset, the largest dataset containing text and images from comic books. To do this, we developed a pipeline for OCR processing and labeling of comic books and created the first text detection and recognition datasets for western comics, called "COMICS Text+: Detection" and "COMICS Text+: Recognition". We evaluated the performance of state-of-the-art text detection and recognition models on these datasets and found significant improvement in word accuracy and normalized edit distance compared to the text in COMICS. We also created a new dataset called "COMICS Text+", which contains the extracted text from the textboxes in the COMICS dataset. Using the improved text data of COMICS Text+ in the comics processing model from resulted in state-of-the-art performance on cloze-style tasks without changing the model architecture. The COMICS Text+ dataset can be a valuable resource for researchers working on tasks including text detection, recognition, and high-level processing of comics, such as narrative understanding, character relations, and story generation. All the data and inference instructions can be accessed in https://github.com/gsoykan/comics_text_plus.
FedEx, UPS warn mail delivery could be interrupted by winter storm as driver safety takes priority
Fox News correspondent Mike Tobin reports that severe weather disrupts travel plans ahead of the holidays on'Special Report.' FedEx and UPS announced mail delivery could be interrupted by the massive winter storm moving across the U.S. after key distribution hubs were blasted by the severe weather conditions. On Friday, FedEx posted a statement to its website warning those who used its Express service that the guaranteed delivery date of Dec. 26 may not be met after the Memphis and Indianapolis hubs experienced "substantial" weather disruptions. The shipping company said actions have been taken to lessen any impact on delivery, but the safety of its team members is the "number one priority." "We recognize the importance of deliveries this holiday weekend and are committed to providing service to the best of our ability by implementing contingency measures where it is safe and possible to do so," the statement read.
Verbyl โ Text-to-Speech Converter
Since the dawns of humanity people would gather around the fire and listen to storiesโฆ Only in the last 100 years, we are used to watching stories at the cinema, TV and later on YouTube. VIDEOS without a good VOICEOVER will not convert, will not get you clicks, leads, traffic, or any sales! That's why a VIDEO is not efficient Without A GOOD VOICEOVER That Tells The Actual Story!
Bengali Handwritten Digit Recognition using CNN with Explainable AI
Shawon, Md Tanvir Rouf, Tanvir, Raihan, Alam, Md. Golam Rabiul
Handwritten character recognition is a hot topic for research nowadays. If we can convert a handwritten piece of paper into a text-searchable document using the Optical Character Recognition (OCR) technique, we can easily understand the content and do not need to read the handwritten document. OCR in the English language is very common, but in the Bengali language, it is very hard to find a good quality OCR application. If we can merge machine learning and deep learning with OCR, it could be a huge contribution to this field. Various researchers have proposed a number of strategies for recognizing Bengali handwritten characters. A lot of ML algorithms and deep neural networks were used in their work, but the explanations of their models are not available. In our work, we have used various machine learning algorithms and CNN to recognize handwritten Bengali digits. We have got acceptable accuracy from some ML models, and CNN has given us great testing accuracy. Grad-CAM was used as an XAI method on our CNN model, which gave us insights into the model and helped us detect the origin of interest for recognizing a digit from an image.
[2212.08610v1] Huruf: An Application for Arabic Handwritten Character Recognition Using Deep Learning
Handwriting Recognition has been a field of great interest in the Artificial Intelligence domain. Due to its broad use cases in real life, research has been conducted widely on it. Prominent work has been done in this field focusing mainly on Latin characters. However, the domain of Arabic handwritten character recognition is still relatively unexplored. The inherent cursive nature of the Arabic characters and variations in writing styles across individuals makes the task even more challenging. We identified some probable reasons behind this and proposed a lightweight Convolutional Neural Network-based architecture for recognizing Arabic characters and digits. The proposed pipeline consists of a total of 18 layers containing four layers each for convolution, pooling, batch normalization, dropout, and finally one Global average pooling and a Dense layer. Furthermore, we thoroughly investigated the different choices of hyperparameters such as the choice of the optimizer, kernel initializer, activation function, etc. Evaluating the proposed architecture on the publicly available 'Arabic Handwritten Character Dataset (AHCD)' and 'Modified Arabic handwritten digits Database (MadBase)' datasets, the proposed model respectively achieved an accuracy of 96.93% and 99.35% which is comparable to the state-of-the-art and makes it a suitable solution for real-life end-level applications.