Collaborating Authors

Optical Character Recognition

Google launches a cool new document scanner called Stack


Google's Area 120 – an in-house incubator, where Google engineers pursue their pet projects that sometimes turn into actual products – has launched a document scanner called Stack, and it may just be one of the best such apps out there. Like most document scanners, Stack uses your phone's camera to create a scan of a document, such as a receipt, bill, or a banking statement. However, the app also reads some of the key data from your document – such as the total amount on a bill, for example – and then organizes your scanned documents into folders. These folders, called "Stacks," are labeled as various categories, such as receipts, bills, vehicle, house, IDs, etc. You can also mark a document as starred, which will put it into a separate Stack.

Text-to-Speech: One Small Step by Mankind to Create Lifelike Robots


Note: For those of you who prefer watching videos, please feel free to play the above video on the same content. While speech synthesis has come a long way since Kratzenstein's vowel organ that could produce the five vowel sounds, it is a whole'nother level of challenge to transform text to natural-sounding speech. Recent developments in deep learning have provided us a new approach to the challenge and in this article, we shall briefly introduce a mainstream text-to-speech method before the deep learning era, then explore models like WaveNet that Google's text-to-speech API service is now using for lifelike speech synthesis. If you pause and think for a moment about how you can perform text-to-speech, you would probably formulate a method that is very similar to the concatenative approach. In concatenative text-to-speech, texts are broken down into smaller units such as phonemes, and the corresponding recordings of the units are then combined to form a complete speech.

Tax time cometh: Get your records in order with this document scanner


The approach of tax time for the majority of Americans brings with it one of the more excruciating aspects of this special season: the need to organize your records. And for now, this ritual continues to involve lots and lots of....paper. Over the years, I've found that using a scanner is an essential element to managing my small professional business. Having my records online (and, in particular, in the cloud) has been valuable. Be sure you have the right tool to navigate this annual chore. We pick the very best software.

AI Is Booming: 2 'Strong Buy' Stocks That Stand to Benefit


The COVID pandemic may be receding, but it has left a mark on across multiple aspects of our lives. From mask mandates to travel restrictions, we chafe at some of the changes – but in the business world the use of artificial intelligence (AI) systems has dramatically expanded in the past year. This was probably inevitable – but AI brought advantages in coping with the pandemic for companies that could make use of it, and the expansion accelerated. AI has found its place in a huge range of applications, at both the front and back end of businesses. It’s prevalent in software management and data systems, as well as in communications, where AI systems filter emails and conduct robochats. And this has not been ignored by Wall Street. Analysts say that plenty of compelling investments can be found within this space. With this in mind, we’ve opened up TipRanks’ database, and pulled two stocks which are stand to benefit from AI technology. Importantly, both have amassed enough bullish calls from analysts to be given “Strong Buy” consensus ratings. Nuance Communications (NUAN) We’ll start with Nuance, a company in the communications software niche. This Massachusetts-based company offers solutions for business clients in the healthcare and customer service industries, with products that enhance speech recognition, telephone call steering systems, automated phone directories, medical transcription, and optical character recognition. It’s a full range of AI-powered, cloud communications software, applied in real time. Nuance’s flagship product, the Dragon Ambient eXperience (DAX) is marketed to the healthcare industry, where it uses AI to automate the paperwork burdens on physician practices and hospitals. This streamlines operations allow doctors more time and resources to spend on patients, and provides greater satisfaction to health care providers and users. The applications of Nuance’s product and solution lines to the current environment is clear: when the pandemic locked down so many people at home, businesses still had to maintain their customer-facing systems, and software automation, based on AI tech, made that possible with fewer personnel. Since the pandemic started last winter, the company seen its shares grow tremendously, up 205% in the last 12 months, far outpacing the overall stock market. The most recent quarterly report, for fiscal Q1, showed quarterly revenues above the forecast at $81.4 million. EPS showed a net loss, as expected, but at 27 cents the loss was a 28% sequential improvement from Q3. The company’s balance sheet is strong, with zero debt, $256 million cash on hand, and a credit facility up to $50 million. The company’s most recent quarterly report, for fiscal Q1, beat the forecasts on both the top and bottom lines. Earnings beat expectations by 11%, coming in at 20 cents per share, while revenues of $345.8 million were a modest 2% above the estimates. As a result, operating cash flow grew 22% year-over-year, to $54.6 million for the quarter. Among the bulls is 5-star analyst Daniel Ives, of Wedbush, who rates NUAN shares an Outperform (i.e. Buy), and his $65 price target implies an upside potential of ~44%. (To watch Ives’ track record, click here) "We believe Nuance overall continues to be laser focused on building a global cloud healthcare and AI driven business with growing ARR and a sustainable revenue/ earnings stream going forward with larger deals in the field as more hospital- wide deployments shift to the cloud are playing out and gaining further momentum based on our checks," Ives opined. The analyst added, "From a valuation/ SOTP perspective, we believe over time the DAX business alone could be worth between $3 billion to $4 billion to NUAN's stock as this AI next generation platform represents a potential paradigm changer for hospitals/healthcare clinics/specialists over the coming years." Ives is no outlier on Nuance, as shown by the unanimous Strong Buy analyst consensus on the stock. Nuance has received 6 recent reviews, and all are to Buy. The shares are trading for $45.20, and the $59.67 average price target suggests a 32% one-year upside. (See NUAN stock analysis on TipRanks) Dynatrace, Inc. (DT) The second AI stock we’ll look at, Dynatrace, is another cloud software company – but Dynatrace’s products are designed to power business data. The company’s AI platform brings intelligent automation to network management and cloud monitoring. DT’s platform allows for cloud automation, business analytics, digital experience, application security, applications and microservices, and infrastructure monitoring. It’s sold as a one-stop-shop for network and system managers seeking an intelligent software agent. Dynatrace’s shares have been showing consistent growth over a long term. The stock is up a robust 133% in the past 12 months, and revenues have also been growing over that period. In the most recent report, for Q3 fiscal year 2021, the company showed $182.9 million in top-line revenue, beating the forecast by ~6% and growing 27% year-over-year. EPS came in at 6 cents, flat from Q2 and far better than the break-even reported for the year-ago quarter. Three key metrics stand out in the quarterly report, and both for the right reasons. Subscription revenue grew 33% year-over-year, to reach $170.3 million, and annual recurring revenue (ARR) – which is an important predictor of future performance – grew 35% yoy and came in at $722 million. At the same time, license revenue dropped by more than 93%, to just $300,000. Taken all together, these results point toward a strong shift toward recurring cloud customers – a common trend in the software space. Needham’s 5-star analyst Jack Andrews has been closely following Dynatrace, and he believes DT’s AI products may replace incumbent tools as customers expand to additional modules. “Embedded AIOps and automation creates a compelling value proposition… Compared to competitors in the market, DT's AI Engine is embedded within its core platform and can be levered across the portfolio to deliver answers from data. Moreover, its One Agent technology automatically discovers high-fidelity data from applications and thus can map the billions of dependencies in complex environments," Andrews said. The analyst summed up, "In our view, DT is well-positioned to serve as a single source of truth that can help users trace a line between written code and business outcomes (i.e. BizDevSecOps)." Andrews named Dynatrace as a top pick, and in line with this upbeat assessment, the analyst rates the stock a Buy along with a $66 price target. Ivestors stand to pocket ~28% gain should the analyst's thesis play out. (To watch Andrews’ track record, click here) Once again, we’re looking at a stock who strong performance has inspired unanimity from the Wall Street analysts. DT shares have 13 Buy reviews, for a Strong Buy consensus rating. The stock sells for $51.76 and its $59.69 average price target suggests ~15% upside from that level. (See DT stock analysis on TipRanks) To find good ideas for AI stocks trading at attractive valuations, visit TipRanks’ Best Stocks to Buy, a newly launched tool that unites all of TipRanks’ equity insights. Disclaimer: The opinions expressed in this article are solely those of the featured analysts. The content is intended to be used for informational purposes only. It is very important to do your own analysis before making any investment.

STYLER: Style Modeling with Rapidity and Robustness via SpeechDecomposition for Expressive and Controllable Neural Text to Speech Artificial Intelligence

Previous works on expressive text-to-speech (TTS) have a limitation on robustness and speed when training and inferring. Such drawbacks mostly come from autoregressive decoding, which makes the succeeding step vulnerable to preceding error. To overcome this weakness, we propose STYLER, a novel expressive text-to-speech model with parallelized architecture. Expelling autoregressive decoding and introducing speech decomposition for encoding enables speech synthesis more robust even with high style transfer performance. Moreover, our novel noise modeling approach from audio using domain adversarial training and Residual Decoding enabled style transfer without transferring noise. Our experiments prove the naturalness and expressiveness of our model from comparison with other parallel TTS models. Together we investigate our model's robustness and speed by comparison with the expressive TTS model with autoregressive decoding.

Interpretable Distance Metric Learning for Handwritten Chinese Character Recognition Artificial Intelligence

Handwriting recognition is of crucial importance to both Human Computer Interaction (HCI) and paperwork digitization. In the general field of Optical Character Recognition (OCR), handwritten Chinese character recognition faces tremendous challenges due to the enormously large character sets and the amazing diversity of writing styles. Learning an appropriate distance metric to measure the difference between data inputs is the foundation of accurate handwritten character recognition. Existing distance metric learning approaches either produce unacceptable error rates, or provide little interpretability in the results. In this paper, we propose an interpretable distance metric learning approach for handwritten Chinese character recognition. The learned metric is a linear combination of intelligible base metrics, and thus provides meaningful insights to ordinary users. Our experimental results on a benchmark dataset demonstrate the superior efficiency, accuracy and interpretability of our proposed approach. - Document Intelligence

#artificialintelligence is cloud-native and supports the latest infrastructure technologies, ensuring flexible, cost efficient and enterprise-grade scalability. With this technology foundation, is able to process large volumes of documents with unparalleled accuracy, regardless of its complexity and variety.

AdaSpeech: Adaptive Text to Speech for Custom Voice Artificial Intelligence

Custom voice, a specific text to speech (TTS) service in commercial speech platforms, aims to adapt a source TTS model to synthesize personal voice for a target speaker using few speech data. Custom voice presents two unique challenges for TTS adaptation: 1) to support diverse customers, the adaptation model needs to handle diverse acoustic conditions that could be very different from source speech data, and 2) to support a large number of customers, the adaptation parameters need to be small enough for each target speaker to reduce memory usage while maintaining high voice quality. In this work, we propose AdaSpeech, an adaptive TTS system for high-quality and efficient customization of new voices. We design several techniques in AdaSpeech to address the two challenges in custom voice: 1) To handle different acoustic conditions, we use two acoustic encoders to extract an utterance-level vector and a sequence of phoneme-level vectors from the target speech during training; in inference, we extract the utterance-level vector from a reference speech and use an acoustic predictor to predict the phoneme-level vectors. 2) To better trade off the adaptation parameters and voice quality, we introduce conditional layer normalization in the mel-spectrogram decoder of AdaSpeech, and fine-tune this part in addition to speaker embedding for adaptation. We pre-train the source TTS model on LibriTTS datasets and fine-tune it on VCTK and LJSpeech datasets (with different acoustic conditions from LibriTTS) with few adaptation data, e.g., 20 sentences, about 1 minute speech. Experiment results show that AdaSpeech achieves much better adaptation quality than baseline methods, with only about 5K specific parameters for each speaker, which demonstrates its effectiveness for custom voice. Audio samples are available at

Optical Character Recognition (OCR) for Text Localization, Detection, and More!


If you have trouble reading this email, see it on a web browser. It has been a little while since we sent our last newsletter. In this edition, we are bringing you some exciting goodies we think you will love. To get started, this research paper on Liquid Time-constant Networks led by Ramin Hasani et al. from MIT showcases novel recurrent neural network models that can change their underlying equations to adapt to new data inputs to reduce complexity massively continuously. Have you tried out's natural language API demo (no signup needed to try it!).

The future of USPS trucks is electric: The new fleet will replace, expand more than 230K vehicles

USATODAY - Tech Top Stories

The U.S. Postal Service will finally get new high-tech mail delivery trucks. The agency said Tuesday that it awarded a 10-year multi-billion dollar contract to Wisconsin-based Oshkosh Defense to replace it's aging fleet of vehicles. The new fleet will replace and expand the existing more than 230,000 vehicles – among them approximately 190,000 delivery trucks – including many that have been in service for 30 years. The deal calls for the postal service to order between 50,000 to 165,000 new delivery trucks featuring 360-degree cameras, advanced braking, with front- and rear-collision avoidance system that includes visual, audio warning and automatic braking. What car was Tiger Woods driving in accident?:Golf