Top 10 NLP (Natural Language Processing) Startups to Lookout for in 2021


Natural Language Processing (NLP), the ability of a software program to understand human language as it is spoken, has seen major breakthroughs, thanks to Artificial Intelligence (AI) and improved access to fast processors and cloud computing. With the introduction of more personal assistants, better smartphone functionality, and the evolution of Big Data to automate even more regular human jobs, NLP adoption is projected to gain up steam in the future years. SoundHound creates AI and conversational intelligence systems that are voice-enabled. It offers a Speech-to-Meaning engine as well as Deep Meaning Understanding technology, which may be integrated into other services and devices. It also creates music recognition apps and voice search assistants.

2021 NLP Trends


Listen to this episode from Tcast on Spotify. Voice recognition software is getting better and better. Once upon a time, it was something that was extremely clunky and unreliable and even the best systems required you to spend far too much time training them while speaking extremely slowly and enunciating like…well…like a computer. However, at last, the systems have improved to the point where it’s possible to at least accurately convey meaning through talk to text features without having to clarify every other word. In fact, I know a trucker who does most of his communication using talk to text on his phone. It makes a few mistakes here and there but its accuracy is still pretty impressive considering he’s speaking normally while in a large moving vehicle.  Then there are the voice assistants on our phones. Whether you talk to Siri, Alexa, or Cortana (all four of you, you know who you are) that voice recognition starts out needing a little training but nothing like it used to. And the more you use it to look up local restaurants, find a factoid to settle an argument or to book a hotel room, the more accurate it gets. Now, they are even in the homes of many, listening constantly for you to need their assistance with something – everything from dimming the lights to spinning up your favorite playlist on Spotify.  The improvements in this software hold a lot of potential. It has already been used for years in business to accommodate certain employees who may not be able to speak clearly or who lose the use of their arms. It is also a much more efficient way to record information than the increasingly dated keyboard. Typing is inherently inefficient, creating the possibility for misspellings that need to be corrected lest they convey an unintended meaning. It also requires a keyboard, which adds space, weight and money to your computer. As voice recognition software improves, the keyboard can be replaced with a simple microphone, probably the one on your phone. Imagine being able to compose reliable messages for business, a book, notes on a law case and have them all transcribed without having to take the time to proofread them. The time savings would be impressive. Or perhaps a more mundane situation in which you’re sitting at home and have a craving for pizza, but you can’t quite remember the name of the place you got it from last month. You throw the question out into the air and your device reminds you of the name, the price and asks you if you’d like it to order a pizza for you. If you think about it, Alexa and other smart devices are only a step or two away from that level of functionality. Another use would be in hospitals. Embedded microphones would record conversations with your doctor, highlighting the important points and recording all of the important information. This would save time and increase efficiency in a number of ways. No more would nurses and admins have to spend hours on data entry, with all the potential transcription errors that entail. Incidentally, that would also save you having to answer the same questions three times every time you go in for a checkup. It also means no one, or at least very few people have to come in contact with the Petri dishes known as keyboards in an environment that should be kept as sterile as possible.  Lectures and presentations could be recorded and transcribed instantly, making information readily available in real time. The possibilities are enormous. Yet, there are potential problems that arise, namely, who owns all that data getting generated and recorded? Is it the place where the recording happens? The place where they are stored? Some other party? At TARTLE, we believe all the data you generate is yours. So if it’s your information and your data that is being recorded, then you deserve to be the primary beneficiary of sharing it, or of deciding whether you want to share that data or not. These are questions that will be addressed sooner or later in the legislative realm which is why we are encouraging people to sign up at to join the TARTLE movement. Together we can help steer that eventual legislation in a direction that will benefit not just a few, but each person who works to generate that data in the first place.  What’s your data worth?

Musicians ask Spotify to publicly abandon controversial speech recognition patent


At the start of the year, Spotify secured a patent for a voice recognition system that could detect the "emotional state," age and gender of a person and use that information to make personalized listening recommendations. As you might imagine, the possibility that the company was working on a technology like that made a lot of people uncomfortable, including digital rights non-profit Access Now. At the start of April, the organization sent Spotify a letter calling on it to abandon the tech. After Spotify privately responded to those concerns, Access Now, along with several other groups and a collection of more than 180 musicians, are asking the company to publicly commit to never using, licensing, selling or monetizing the system it patented. Some of the individuals and bands to sign the letter include Rage Against the Machine guitarist Tom Morello, rapper Talib Kweli and indie group DIIV.

'Hey Spotify, play Up First:' Two weeks with Car Thing


After years of rumors, confirmation and vague descriptions, Spotify has finally made its first piece of hardware available to select users. Even though the company revealed the full details on Car Thing earlier this month, it's only a "limited release" right now. I've spent two weeks with Car Thing in my car (obviously), and can tell you one thing -- this dedicated Spotify player is really more of a controller for the app on your phone. Spotify first tipped its hand on an in-car music player in 2018. It offered a few Reddit users the opportunity to try a compact device that reportedly featured voice control and 4G connectivity.

Spotify launches voice-controlled 'Car Thing'


Whether your road trip soundtracks consist of music, news, entertainment, or talk, Spotify's Car Thing has you covered. The new smart player, currently available to select users in the U.S., puts your audio library just a voice command, tap, turn, or swipe away. "Car Thing enables you to play your favorite audio faster, so you're already listening to that hit song or the latest podcast episode before you've even pulled out of the driveway," according to a Spotify blog announcement. "Switching between your favorite audio is effortless, allowing you to shift gears to something else as soon as the mood strikes." You will need a Spotify Premium account to use Car Thing, but setup is simple: plug the device into a 12-volt power outlet, sync it with your smartphone (iOS 14 and Android 8 or above), and connect your phone to the vehicle's stereo.

Spotify's voice-controlled 'Car Thing' is available for some subscribers


At this point, we've seen rumors, job listings, blog posts, FCC filings and more rumors about Spotify's in-car music player over the span of a few years. In fact, I was convinced it would never become a thing the public could actually use. When the company first revealed a piece of hardware called "Car Thing" in 2019, Spotify was clear the test was meant "to help us learn more about how people listen to music and podcasts." It also explained that there weren't "any current plans" to make that device available to consumers. Now Spotify is ready for select users to get their hands on a refined version of the voice-controlled in-car player.

Spotify rolls out its own hands-free voice assistant on iOS and Android


Spotify users on iOS and Android have another way to quickly play something. The audio streaming service has an in-app voice assistant you can operate hands free, building on the existing voice search function. After saying the "Hey, Spotify" wake word, you can ask the app to fire up a song or playlist or play music from a certain artist. You'll need to grant Spotify permission to access your microphone if you want to use the feature, which you can switch on from the voice interactions section of the menu. As GSM Arena notes, Spotify's privacy policy states that the service only stores recordings and transcriptions of your searches after you say the wake word or tap the voice button.

The Use of Voice Source Features for Sung Speech Recognition Artificial Intelligence

In this paper, we ask whether vocal source features (pitch, shimmer, jitter, etc) can improve the performance of automatic sung speech recognition, arguing that conclusions previously drawn from spoken speech studies may not be valid in the sung speech domain. We first use a parallel singing/speaking corpus (NUS-48E) to illustrate differences in sung vs spoken voicing characteristics including pitch range, syllables duration, vibrato, jitter and shimmer. We then use this analysis to inform speech recognition experiments on the sung speech DSing corpus, using a state of the art acoustic model and augmenting conventional features with various voice source parameters. Experiments are run with three standard (increasingly large) training sets, DSing1 (15.1 hours), DSing3 (44.7 hours) and DSing30 (149.1 hours). Pitch combined with degree of voicing produces a significant decrease in WER from 38.1% to 36.7% when training with DSing1 however smaller decreases in WER observed when training with the larger more varied DSing3 and DSing30 sets were not seen to be statistically significant. Voicing quality characteristics did not improve recognition performance although analysis suggests that they do contribute to an improved discrimination between voiced/unvoiced phoneme pairs.

DEEPF0: End-To-End Fundamental Frequency Estimation for Music and Speech Signals Artificial Intelligence

We propose a novel pitch estimation technique called DeepF0, which leverages the available annotated data to directly learns from the raw audio in a data-driven manner. F0 estimation is important in various speech processing and music information retrieval applications. Existing deep learning models for pitch estimations have relatively limited learning capabilities due to their shallow receptive field. The proposed model addresses this issue by extending the receptive field of a network by introducing the dilated convolutional blocks into the network. The dilation factor increases the network receptive field exponentially without increasing the parameters of the model exponentially. To make the training process more efficient and faster, DeepF0 is augmented with residual blocks with residual connections. Our empirical evaluation demonstrates that the proposed model outperforms the baselines in terms of raw pitch accuracy and raw chroma accuracy even using 77.4% fewer network parameters. We also show that our model can capture reasonably well pitch estimation even under the various levels of accompaniment noise.

Exploring the implications of AI with Mastercard's AI Garage - ideaXme


Artificial intelligence has become a technological buzzword, often solely referred to AI rather than depicting the possibly infinite amount of practical applications that artificial intelligence can actually provide, or the intricacies involved from industry to industry, and region to region. To discuss some of the many applications for artificial intelligence, as well as some of the considerations to be taken into account to create more accurate and less biased machine learning systems, I had the pleasure of speaking with Nitendra Rajput, VP and Head of Mastercard's AI Garage. Nitendra Rajput is the Vice President and Head of Mastercard's AI Garage, setting up the centre to enable it to solve problems across various business verticals globally with machine learning processes, increasing efficiencies across the business as well as mitigating instances of fraud. Nitendra has over 20 years experience working in the fields artificial intelligence, machine learning, and mobile interactions, after realising a gap in the market for developing speech recognition systems for vocally-led countries, such as India. Prior to Mastercard's AI Garage, he spent 18 years at IBM Research, working on different aspects of machine learning, human-computer interaction, software engineering and mobile sensing.