speech recognition

Revising The World of Self-Service With Natural Language Processing


SAN FRANCISCO – The ascension of natural language processing through the ranks of artificial intelligence technologies is fairly evident. Its consumerization is demonstrated in a number of audio-related household gadgets, it's found in the most effective text analytics tools, and it's an integral aspect of speech recognition systems. Still, NLP is arguably producing the greatest impact on the enterprise in furthering the self-service movement, particularly in terms of the various implements required for Business Intelligence. According to MicroStrategy VP of Product Marketing Vijay Anand, however, the business value delivered by NLP hinges on more than simply comprehending the intention of the user: "Even with natural language queries and Alexa and all of these natural language tools, the problem of the deficit of these tools as we examine it [is] while it's easy to ask a question, I, for one, certainly believe that most people don't know what the right question is. You need to have that sort of understanding of the business to ask the correct question to get the right answer, first of all."

Facebook releases low-latency online speech recognition framework


Facebook AI Research (FAIR) today said it's open-sourcing wav2letter@anywhere, a deep learning-based inference framework that achieves fast performance for online automatic speech recognition in cloud or embedded edge environments. Wav2letter@anywhere is based on neural net-based language models wav2letter and wav2letter, which upon its release in December 2018, FAIR called the fastest open source speech recognition system available. Automatic speech recognition, or ASR, is used to turn audio of spoken words into text, then infer the speaker's intent in order to carry out a task. An API available on GitHub though the wav2letter repository is built to support concurrent audio streams and popular kinds of deep learning speech recognition models like convolutional neural networks (CNN) or recurrent neural networks (RNN) in order to deliver scale necessary for online ASR. Wav2letter@anywhere achieves better word error rate performance than two baseline models made from bidirectional LSTM RNNs, according to a paper released last week by eight FAIR researchers from labs in New York City and at company headquarters in Menlo Park.

CES 2020: The new IoT, or 'intelligence of things', is the major tech trend of the decade


LAS VEGAS - A new idea surrounding IoT will steer how technology will go in the new decade - instead of standing for the Internet of Things, the acronym should stand for the "intelligence of things", said Consumer Technology Association's (CTA) vice president of research Steve Koenig. "This new IoT bears testimony to the extent that artificial intelligence (AI) is permeating every facet of our commerce and our culture. "Now, commerce is pretty well-understood and we endorse that as we want to advance our economies around the world, but culture is really interesting to me as a researcher, because we're talking about technology's influence on human behaviour," he said. He brought up the example of how fast food giant McDonald's is looking at bringing AI-powered voice assistants to its drive-through restaurants in the United States. "People working in fast food - they've got a tough job.

LG to rival Honda in the race to develop an in-car voice assistant that parallels Siri or Alexa

Daily Mail - Science & tech

LG is throwing its resources behind developing a new breed of AI assistants that can be used to control aspects of cars. The Korean tech company said it has partnered with AI company Cerence to make an AI voice-assistant that is capable of being used to control various aspects of car's entertainment system, navigation, calling and more. That AI assistant, once completed, will eventually be integrated into the company's webOS software that, similarly to Apple CarPlay, powers computers inside vehicles. LG is planning on leasing its AI assistant out to auto manufacturers in search of an added dose of technology in their vehicles. The company's decision to enter the ring on developing an in-car voice assistant comes at a time when other major auto-manufacturers have also announced their intention to create similar products.

How is AI Different From ML?


AI is not a new word. It dates back many decades, and computer scientists in the early 80s developed algorithms that learn and simulate human behavior. On the learning side, the most important algorithm is the neural network, which has not been very successful due to over-fitting (the model is very powerful and there is not enough data). However, in some specific tasks, the idea of using data to fit a function has been a considerable success and is the foundation of machine learning today. On the other hand, AI is very focused on image recognition, speech recognition, and natural language processing.

Artificial intelligence: The good, the bad and the ugly


Welcome to TechTalks' AI book reviews, a series of posts that explore the latest literature on AI. It wouldn't be an overstatement to say that artificial intelligence is one of the most confusing and least understood fields of science. On the one hand, we have headlines that warn of deep learning outperforming medical experts, creating their own language and spinning fake news stories. On the other hand, AI experts point out that artificial neural networks, the key innovation of current AI techniques, fail at some of the most basic tasks that any human child can perform. Artificial intelligence is also marked with some of the most divisive disputes and rivalries.

Six Ways Artificial Intelligence Guarantees Improved Performance of Online Businesses


Artificial intelligence is a concept of making the machines learn, behave, teach- and do everything that a human brain can do. Because of the increased pace of further development with this technology, it becomes easier to implement the developed concepts into practice. Further research and development have brought changes in every aspect of the real world, and industries are affected more than any other. With the help of machine learning and big data analytics, it has become easier to put the existing data to use and provide support for delivering better results and proves itself an efficient technology that delivers the best results. As businesses are considering AI as one of the most efficient technologies of all, the number of businesses implementing AI concepts in use is increasing by leaps and bounds.

Phoneme Recognition with Large Hierarchical Reservoirs

Neural Information Processing Systems

Automatic speech recognition has gradually improved over the years, but the reliable recognition of unconstrained speech is still not within reach. In order to achieve a breakthrough, many research groups are now investigating new methodologies that have potential to outperform the Hidden Markov Model technology that is at the core of all present commercial systems. In this paper, it is shown that the recently introduced concept of Reservoir Computing might form the basis of such a methodology. In a limited amount of time, a reservoir system that can recognize the elementary sounds of continuous speech has been built. The system already achieves a state-of-the-art performance, and there is evidence that the margin for further improvements is still significant.

Correlated Bigram LSA for Unsupervised Language Model Adaptation

Neural Information Processing Systems

We propose using correlated bigram LSA for unsupervised LM adaptation for automatic speech recognition. The model is trained using efficient variational EM and smoothed using the proposed fractional Kneser-Ney smoothing which handles fractional counts. Our approach can be scalable to large training corpora via bootstrapping of bigram LSA from unigram LSA. For LM adaptation, unigram and bigram LSA are integrated into the background N-gram LM via marginal adaptation and linear interpolation respectively. Experimental results show that applying unigram and bigram LSA together yields 6%--8% relative perplexity reduction and 0.6% absolute character error rates (CER) reduction compared to applying only unigram LSA on the Mandarin RT04 test set.

Learning Label Embeddings for Nearest-Neighbor Multi-class Classification with an Application to Speech Recognition

Neural Information Processing Systems

We consider the problem of using nearest neighbor methods to provide a conditional probability estimate, P(y a), when the number of labels y is large and the labels share some underlying structure. We propose a method for learning error-correcting output codes (ECOCs) to model the similarity between labels within a nearest neighbor framework. The learned ECOCs and nearest neighbor information are used to provide conditional probability estimates. We apply these estimates to the problem of acoustic modeling for speech recognition. We demonstrate an absolute reduction in word error rate (WER) of 0.9% (a 2.5% relative reduction in WER) on a lecture recognition task over a state-of-the-art baseline GMM model.