AITopics

doi: 10.1109/TASLP.2022.3167258

2111.0404

Country: Asia > Taiwan > Taiwan Province > Taipei (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.91)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.61)

#artificialintelligenceJul-28-2022, 20:17:02 GMT

EasyOCR: A Free Open-source OCR That Supports 80+ Languages

EasyOCR is a free developer-friendly OCR "Optical Character Recognition" that supports 80 languages including Latin, Chinese, Arabic, and Cyrillic. EasyOCR is written in the Python programming language. It can be installed as a Python package, and integrates well with other Python Frameworks like Django, Flask, and others. You can test the demo here, as you can upload images in different format and test several languages. It comes with a trainer models that can be used to train for new languages, dozens of example datasets for model training, user-friendly instructions on how to train custom recognition models and more. It also supports vertical text, and PIL images, and more.

easyocr, optical character recognition, programming language, (2 more...)

Technology:

Information Technology > Software > Programming Languages (0.69)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.69)

#artificialintelligenceJul-27-2022, 05:39:03 GMT

Machine Learning is the Wrong Way to Extract Data From Most Documents

Documents have spent decades stubbornly guarding their contents against software. In the late 1960s, the first OCR (optical character recognition) techniques turned scanned documents into raw text. By indexing and searching the text from these digitized documents, software sped up formerly laborious legal discovery and research projects. Today, Google, Microsoft, and Amazon provide high-quality OCR as part of their cloud services offerings. But documents remain underused in software toolchains, and valuable data languish in trillions of PDFs.

document layout, representational mode, template, (12 more...)

Country: North America > United States (0.05)

Industry: Banking & Finance > Insurance (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.55)

#artificialintelligenceJul-19-2022, 17:13:30 GMT

Automation Driven by Artificial Intelligence Booms in Uncertain Economic Times

Veryfi, using artificial intelligence (AI) technology to transform documents into structured data in just seconds, has announced continued strong business momentum and growth in the second quarter. As economic concerns increase, many companies begin to reduce their staff to control costs; 88 percent of job loss in routine occupations occurs within 12 months of a recession. While economic uncertainty continues, Veryfi has emerged as a trusted, reliable partner for companies seeking greater efficiency and stronger customer relationships, continuing its strong annual recurring revenue (ARR) growth. In the second quarter, Veryfi added over a dozen new logos and major accounts including a top supplier of enterprise resource planning software and one of the world's largest CRM/Direct Marketing Network companies. "As companies seek new ways to increase efficiency and manage costs to position themselves for a challenging economy, Veryfi is leading the way, applying AI to automate routine data entry and streamline business processes," said Ernest Semerda, co-founder and CEO of Veryfi.

artificial intelligence boom, uncertain economic time, veryfi, (11 more...)

Country:

North America > United States > Illinois > Cook County > Chicago (0.05)
North America > United States > California > San Mateo County > San Mateo (0.05)

Genre: Press Release (1.00)

Industry:

Banking & Finance (0.38)
Information Technology (0.33)

Technology:

Information Technology > Artificial Intelligence > Applied AI (0.72)
Information Technology > Enterprise Applications > Enterprise Resource Planning (0.56)
Information Technology > Artificial Intelligence > Machine Learning (0.51)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.32)

#artificialintelligenceJul-19-2022, 05:20:49 GMT

Amazon Mechanical Turk - Wikipedia

Amazon Mechanical Turk (MTurk) is a crowdsourcing website for businesses (known as Requesters) to hire remotely located "crowdworkers" to perform discrete on-demand tasks that computers are currently unable to do. It is operated under Amazon Web Services, and is owned by Amazon.[1] Employers post jobs known as Human Intelligence Tasks (HITs), such as identifying specific content in an image or video, writing product descriptions, or answering questions, among others. Workers, colloquially known as Turkers or crowdworkers, browse among existing jobs and complete them in exchange for a rate set by the employer. To place jobs, the requesting programs use an open application programming interface (API), or the more limited MTurk Requester site.[2] As of April 2019, Requesters could register from only 49 approved countries.[3]

amazon, mechanical turk, requester, (14 more...)

Country:

Asia > India (0.05)
North America > United States > Texas (0.04)
Europe (0.04)

Genre: Research Report > New Finding (0.47)

Industry:

Law (1.00)
Banking & Finance (0.95)
Information Technology > Services (0.67)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.71)

arXiv.org Artificial IntelligenceJul-19-2022

Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech

Zhang, Guangyan, Song, Kaitao, Tan, Xu, Tan, Daxin, Yan, Yuzi, Liu, Yanqing, Wang, Gang, Zhou, Wei, Qin, Tao, Lee, Tan, Zhao, Sheng

Recently, leveraging BERT pre-training to improve the phoneme encoder in text to speech (TTS) has drawn increasing attention. However, the works apply pre-training with character-based units to enhance the TTS phoneme encoder, which is inconsistent with the TTS fine-tuning that takes phonemes as input. Pre-training only with phonemes as input can alleviate the input mismatch but lack the ability to model rich representations and semantic information due to limited phoneme vocabulary. In this paper, we propose MixedPhoneme BERT, a novel variant of the BERT model that uses mixed phoneme and sup-phoneme representations to enhance the learning capability. Specifically, we merge the adjacent phonemes into sup-phonemes and combine the phoneme sequence and the merged sup-phoneme sequence as the model input, which can enhance the model capacity to learn rich contextual representations. Experiment results demonstrate that our proposed Mixed-Phoneme BERT significantly improves the TTS performance with 0.30 CMOS gain compared with the FastSpeech 2 baseline. The Mixed-Phoneme BERT achieves 3x inference speedup and similar voice quality to the previous TTS pre-trained model PnG BERT

machine learning, mixed-phoneme bert, natural language, (17 more...)

2203.1719

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.73)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

#artificialintelligenceJul-14-2022, 20:38:01 GMT

Automation Artificial Intelligence Booms in Uncertain Economic

As economic concerns increase, many companies begin to reduce their staff to control costs; 88 percent of job loss in routine occupations occurs within 12 months of a recession. While economicda uncertainty continues, Veryfi has emerged as a trusted, reliable partner for companies seeking greater efficiency and stronger customer relationships, continuing its strong annual recurring revenue (ARR) growth. In the second quarter, Veryfi added over a dozen new logos and major accounts including a top supplier of enterprise resource planning software and one of the world's largest CRM/Direct Marketing Network companies. "As companies seek new ways to increase efficiency and manage costs to position themselves for a challenging economy, Veryfi is leading the way, applying AI to automate routine data entry and streamline business processes," said Ernest Semerda, co-founder and CEO of Veryfi. "In Q2, we welcomed over a dozen new customers and multiple strategic accounts spanning key use cases from loyalty marketing to intelligent automation for accounts payable. We are seeing cross-market demand for our Veryfi technology."

automation artificial intelligence boom, machine learning, optical character recognition, (12 more...)

Country: North America > United States (0.06)

Industry: Banking & Finance (0.37)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.33)

arXiv.org Artificial IntelligenceJul-13-2022

ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech

Huang, Rongjie, Zhao, Zhou, Liu, Huadai, Liu, Jinglin, Cui, Chenye, Ren, Yi

Denoising diffusion probabilistic models (DDPMs) have recently achieved leading performances in many generative tasks. However, the inherited iterative sampling process costs hinder their applications to text-to-speech deployment. Through the preliminary study on diffusion model parameterization, we find that previous gradient-based TTS models require hundreds or thousands of iterations to guarantee high sample quality, which poses a challenge for accelerating sampling. In this work, we propose ProDiff, on progressive fast diffusion model for high-quality text-to-speech. Unlike previous work estimating the gradient for data density, ProDiff parameterizes the denoising model by directly predicting clean data to avoid distinct quality degradation in accelerating sampling. To tackle the model convergence challenge with decreased diffusion iterations, ProDiff reduces the data variance in the target site via knowledge distillation. Specifically, the denoising model uses the generated mel-spectrogram from an N-step DDIM teacher as the training target and distills the behavior into a new model with N/2 steps. As such, it allows the TTS model to make sharp predictions and further reduces the sampling time by orders of magnitude. Our evaluation demonstrates that ProDiff needs only 2 iterations to synthesize high-fidelity mel-spectrograms, while it maintains sample quality and diversity competitive with state-of-the-art models using hundreds of steps. ProDiff enables a sampling speed of 24x faster than real-time on a single NVIDIA 2080Ti GPU, making diffusion models practically applicable to text-to-speech synthesis deployment for the first time. Our extensive ablation studies demonstrate that each design in ProDiff is effective, and we further show that ProDiff can be easily extended to the multi-speaker setting. Audio samples are available at \url{https://ProDiff.github.io/.}

artificial intelligence, machine learning, natural language, (15 more...)

2207.06389

Country: Europe > Portugal (0.16)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.61)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

#artificialintelligenceJul-12-2022, 01:03:48 GMT

AWS Amazon Polly – Text to Speech Converter

Detailed and Comprehensive Documentation Cloud Vendor Text to Speech Prices Notes Please note, for the script to work correctly, you need to have valid AWS account. Latest Changes 22.04.2022 - 2.0 - New: Full redesign with Laravel Framework - New: Powerful integrated Sound Studio - New: Mixing up to 20 voices in a single synthesize task

conveniently share synthesize result, real-time text synthesize customize, synthesize result, (12 more...)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.64)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.39)

arXiv.org Artificial IntelligenceJul-11-2022

LIP: Lightweight Intelligent Preprocessor for meaningful text-to-speech

Anand, Harshvardhan, Begam, Nansi, Verma, Richa, Ghosh, Sourav, S, Harichandana B. S., Kumar, Sumit

Existing Text-to-Speech (TTS) systems need to read messages from the email which may have Personal Identifiable Information (PII) to text messages that can have a streak of emojis and punctuation. 92% of the world's online population use emoji with more than 10 billion emojis sent everyday. Lack of preprocessor leads to messages being read as-is including punctuation and infographics like emoticons. This problem worsens if there is a continuous sequence of punctuation/emojis that are quite common in real-world communications like messaging, Social Networking Site (SNS) interactions, etc. In this work, we aim to introduce a lightweight intelligent preprocessor (LIP) that can enhance the readability of a message before being passed downstream to existing TTS systems. We propose multiple sub-modules including: expanding contraction, censoring swear words, and masking of PII, as part of our preprocessor to enhance the readability of text. With a memory footprint of only 3.55 MB and inference time of 4 ms for up to 50-character text, our solution is suitable for real-time deployment. This work being the first of its kind, we try to benchmark with an open independent survey, the result of which shows 76.5% preference towards LIP enabled TTS engine as compared to standard TTS.

emoji, phone number, punctuation, (14 more...)

doi: 10.1109/CONECCT55679.2022.9865708

2207.07118

Country:

Asia > Middle East > UAE (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report (0.50)

Industry: Telecommunications (0.35)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)