AITopics

2205.07211

Country:

Asia > China (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.96)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.74)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)

#artificialintelligenceOct-11-2022, 16:12:36 GMT

AI Voice Generator & Realistic Text to Speech Online

Instantly convert text in to natural-sounding speech and download as MP3 and WAV audio files.

ai voice generator, speech online, voice generator & realistic text

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.70)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.70)
Information Technology > Artificial Intelligence > Assistive Technologies (0.70)

arXiv.org Artificial IntelligenceOct-6-2022

Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus

Kim, Minchan, Jeong, Myeonghun, Choi, Byoung Jin, Ahn, Sunghwan, Lee, Joun Yeop, Kim, Nam Soo

Training a text-to-speech (TTS) model requires a large scale text labeled speech corpus, which is troublesome to collect. In this paper, we propose a transfer learning framework for TTS that utilizes a large amount of unlabeled speech dataset for pre-training. By leveraging wav2vec2.0 representation, unlabeled speech can highly improve performance, especially in the lack of labeled speech. We also extend the proposed method to zero-shot multi-speaker TTS (ZS-TTS). The experimental results verify the effectiveness of the proposed method in terms of naturalness, intelligibility, and speaker generalization. We highlight that the single speaker TTS model fine-tuned on the only 10 minutes of labeled dataset outperforms the other baselines, and the ZS-TTS model fine-tuned on the only 30 minutes of single speaker dataset can generate the voice of the arbitrary speaker, by pre-training on unlabeled multi-speaker speech corpus.

artificial intelligence, machine learning, preprint arxiv, (16 more...)

doi: 10.21437/Interspeech.2022-225

2203.15447

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.73)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.62)

#artificialintelligenceSep-30-2022, 15:15:15 GMT

More businesses need to use AI

As a startup which has been operational for five years, specializing in conversational artificial intelligence (AI), Vbee is a pioneer in providing services such as artificial voice (vbee.vn) However, the path to bringing AI to reality is still tough. First, businesses must be persuaded to apply new technological solutions to improve productivity and reduce costs. Vnee has many solutions such as KYC (Know Your Customer), artificial switchboard, artificial voice, artificial MC, OCR (optical character recognition), voice biometrics, chatbot, call bot and artificial virtual assistant, packaged and ready to be used. But businesses are hesitant to use them.

ai solution, artificial voice, use ai, (4 more...)

Country: Asia > Vietnam (0.13)

Technology: Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.59)

Terdalkar, Hrishikesh, Bhattacharya, Arnab

Chandojnanam: A Sanskrit Meter Identification and Utilization System

arXiv.org Artificial IntelligenceSep-29-2022

We present Chandoj\~n\=anam, a web-based Sanskrit meter (Chanda) identification and utilization system. In addition to the core functionality of identifying meters, it sports a friendly user interface to display the scansion, which is a graphical representation of the metrical pattern. The system supports identification of meters from uploaded images by using optical character recognition (OCR) engines in the backend. It is also able to process entire text files at a time. The text can be processed in two modes, either by treating it as a list of individual lines, or as a collection of verses. When a line or a verse does not correspond exactly to a known meter, Chandoj\~n\=anam is capable of finding fuzzy (i.e., approximate and close) matches based on sequence matching. This opens up the scope of a meter-based correction of erroneous digital corpora. The system is available for use at https://sanskrit.iitk.ac.in/jnanasangraha/chanda/, and the source code in the form of a Python library is made available at https://github.com/hrishikeshrt/chanda/.

chandojnanam, meter identification and utilization system, sanskrit meter identification

2209.14924

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.53)
Information Technology > Artificial Intelligence > Machine Learning (0.53)

arXiv.org Artificial IntelligenceSep-22-2022

EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models

Lam, Perry, Zhang, Huayun, Chen, Nancy F., Sisman, Berrak

Neural models are known to be over-parameterized, and recent work has shown that sparse text-to-speech (TTS) models can outperform dense models. Although a plethora of sparse methods has been proposed for other domains, such methods have rarely been applied in TTS. In this work, we seek to answer the question: what are the characteristics of selected sparse techniques on the performance and model complexity? We compare a Tacotron2 baseline and the results of applying five techniques. We then evaluate the performance via the factors of naturalness, intelligibility and prosody, while reporting model size and training time. Complementary to prior research, we find that pruning before or during training can achieve similar performance to pruning after training and can be trained much faster, while removing entire neurons degrades performance much more than removing parameters. To our best knowledge, this is the first work that compares sparsity paradigms in text-to-speech synthesis.

artificial intelligence, machine learning, sparsity, (18 more...)

doi: 10.21437/Interspeech.2022-10626

2209.1089

Country:

Asia > Singapore (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

PCWorldSep-7-2022, 14:34:28 GMT

PowerToys update adds OCR and two more free tools

If you use Windows, you want PowerToys. This collection of open-source goodies, guided and published by Microsoft itself, is one of the best free software packages out there, and we can't recommend it enough. That only becomes more true today, as the company publishes an updated version with three brand new tools: the previously-spotted Text Extrator (an Optical Character Recognition tool), a ruler for measuring pixels on your screen, and a tool for quickly inserting little-used accents into text. Text Extractor is probably the most universally-applicable addition here. It's an open-source version of Joseph Finney's paid Text Grab app, now integrated into PowerToys and free for Windows users.

free tool, powertoy, text extractor, (1 more...)

PCWorld

Technology:

Information Technology > Software (0.81)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.58)

#artificialintelligenceSep-2-2022, 11:50:16 GMT

Rabobank Australia and New Zealand Inks Deal with nCino

This partnership will benefit the bank's Australian and New Zealand employees and customers, representing a multi-currency, cross-country commitment to provide a better banking experience. "By partnering with nCino, we will optimise our financial spreading analysis," said Alexa Glynn, Chief Operating Officer at RANZ. "This relationship will provide an excellent opportunity for RANZ to support our growing customer base and modernise our systems. We're delighted that nCino's technology will enable us to offer our customers and employees a better banking experience." The world's leading specialist food and agribusiness bank, Rabobank is one of Australia and New Zealand's largest agricultural lenders and a major provider of business and corporate banking services to the country's food and agribusiness sector. By adopting the nCino Bank Operating System, RANZ gains a digital solution that intelligently transforms the process of spreading financials by leveraging machine learning and optical character recognition (OCR).

australia and new zealand, ncino, rabobank australia, (8 more...)

Country:

Oceania > New Zealand (0.91)
Oceania > Australia (0.69)

Industry: Banking & Finance > Financial Services (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.76)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.58)

Bayram, Samet, Barner, Kenneth

A Black-Box Attack on Optical Character Recognition Systems

arXiv.org Artificial IntelligenceAug-30-2022

Adversarial machine learning is an emerging area showing the vulnerability of deep learning models. Exploring attack methods to challenge state of the art artificial intelligence (A.I.) models is an area of critical concern. The reliability and robustness of such A.I. models are one of the major concerns with an increasing number of effective adversarial attack methods. Classification tasks are a major vulnerable area for adversarial attacks. The majority of attack strategies are developed for colored or gray-scaled images. Consequently, adversarial attacks on binary image recognition systems have not been sufficiently studied. Binary images are simple two possible pixel-valued signals with a single channel. The simplicity of binary images has a significant advantage compared to colored and gray scaled images, namely computation efficiency. Moreover, most optical character recognition systems (O.C.R.s), such as handwritten character recognition, plate number identification, and bank check recognition systems, use binary images or binarization in their processing steps. In this paper, we propose a simple yet efficient attack method, Efficient Combinatorial Black-box Adversarial Attack, on binary image classifiers. We validate the efficiency of the attack technique on two different data sets and three classification networks, demonstrating its performance. Furthermore, we compare our proposed method with state-of-the-art methods regarding advantages and disadvantages as well as applicability.

adversarial example, classifier, perturbation, (15 more...)

2208.14302

Country:

North America > United States > Delaware > New Castle County > Newark (0.15)
North America > United States > New York > New York County > New York City (0.05)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

#artificialintelligenceAug-29-2022, 06:55:06 GMT

OCR is getting super cool for Businesses

A Few months back, the student in class captured the image of the notes made by the other student in front of him and used iOS 15's recent text-recognition feature to highlight text, and copy and paste it into his notes. This instance was tweeted by @juanbuis, who shared the video of a student making the most of iOS 15's Live Text OCR feature. This cool OCR or Optical Character Recognition feature that the above student opts for is generally used to pull up the information from the text or documents and then convert it into the machine's language. Recently, the popular app developer Alessandro Paluzzi has also seen that Twitter is working on an OCR (optical character recognition) feature for the description of alt text. In his tweet, Alessandro Paluzzi shared the demonstration of how this twitter feature will function through a short video. At Dwarf AI we too want to make this super cool technology to be easily accessible by other businesses.

dwarf ai ocr solution, information, ocr solution, (10 more...)

Industry: Information Technology (0.57)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.78)