Speech Recognition


Roku might be building its own smart speaker

Engadget

Roku is currently advertising for multiple roles, with an eye on recruiting audio and voice experts. It's also on the hunt for a "voice user interface designer" who will act as its "expert on all things voice related." One such person is senior software engineer Jim Cortez, who is tinkering with "voice interfaces" for his employer. Notably, Cortez co-founded Ivee, a startup that produced "home voice assistants" such as the Ivee Voice and Ivee Sleek.


New AI research makes it easier to create fake footage of someone speaking

#artificialintelligence

An aspect of artificial intelligence that's sometimes overlooked is just how good it is at creating fake audio and video that's difficult to distinguish from reality. The latest example of AI's audiovisual magic comes from the University of Washington, where researchers have created a new tool that takes audio files, converts them into realistic mouth movements, and then grafts those movements onto existing video. Seventeen hours of footage were needed as data to track and replicate his mouth movements, researcher Ira Kemelmacher told The Verge over email, but in future this training constraint could be reduced to just an hour. The team from the University of Washington is understandably keen to distance themselves from these sorts of uses, and make it clear they only trained their neural nets on Obama's voice and video.


What is the future of voice recognition and AI? BankNXT

#artificialintelligence

Mike Townsend started his career working as a design engineer at Brookstone in Singapore, then as a mechanical engineer at ITT Corporation in Los Angeles, designing radar systems for military grade UAVs. He is the founder of ZingCheckout and Flowtab, and remains very active in the payment industry.


Samsung, Kakao to cooperate on AI, voice recognition

ZDNet

Samsung Electronics and chat giant Kakao will cooperate on AI and voice recognition, the companies have announced. Samsung last month said it wants to put more AI and voice recognition features in its home appliances going forward. Kakao is cooperating with POSCO and GS to put its AI platform in their smart homes. Kakao is preparing to launch is AI-based speaker Kakao Mini, while Samsung will launch theirs next year.


Why 500 Million People in China Are Talking to This AI

MIT Technology Review

Some also use it to send text messages through voice commands while driving, or to communicate with a speaker of another Chinese dialect. But while some impressive progress in voice recognition and instant translation has enabled Xu to talk with his Canadian tenant, language understanding and translation for machines remains an incredibly challenging task (see "AI's Language Problem"). In August, iFlytek launched a voice assistant for drivers called Xiaofeiyu (Little Flying Fish). Min Chu, the vice president of AISpeech, another Chinese company working on voice-based human-computer interaction technologies, says voice assistants for drivers are in some ways more promising than smart speakers and virtual assistants embedded in smartphones.


The Race for AI-Enabled, Natural-Language and Voice Interface Platforms

@machinelearnbot

In many of the company's recent announcements, Amazon's voice assistant Alexa plays a central role. Complementing Amazon's retail marketplace, AWS and Amazon Prime, Alexa and its 10,000-plus "skills" could become one of Amazon's strategic initiatives. Not only is it important to Amazon, but all the major tech companies are gearing up for a major competitive battle in this evolving platform war. This pending shift will impact UI and product design, required programming skills, development schedules, partnerships, and ultimately resources and budgets to build these new systems.


Apple's 'Neural Engine' Infuses the iPhone With AI Smarts

#artificialintelligence

Applications such as Siri have got much better at recognizing speech in the past few years as Apple, Google, and other tech companies have rebuilt their speech recognition systems around artificial neural networks. Custom circuits like those of Apple's neural engine allow machine-learning algorithms on a phone to analyze data more quickly, and reduce how much they sap a device's battery. In June, Apple announced new tools to help developers run machine-learning algorithms inside apps, including a new standard for neural networks called CoreML. Longer term, mobile hardware that can run machine learning software efficiently without will be important to the future of autonomous vehicles and wearable augmented-reality glasses--ideas Apple has recently signaled interest in.


Apple's 'Neural Engine' Infuses the iPhone With AI Smarts

WIRED

Applications such as Siri have got much better at recognizing speech in the past few years as Apple, Google, and other tech companies have rebuilt their speech recognition systems around artificial neural networks. Custom circuits like those of Apple's neural engine allow machine-learning algorithms on a phone to analyze data more quickly, and reduce how much they sap a device's battery. In June, Apple announced new tools to help developers run machine-learning algorithms inside apps, including a new standard for neural networks called CoreML. Longer term, mobile hardware that can run machine learning software efficiently without will be important to the future of autonomous vehicles and wearable augmented-reality glasses--ideas Apple has recently signaled interest in.


Microcontrollers Handle Automatic Speech Recognition

#artificialintelligence

One includes a 2D graphics engine with on-board VRAM while the other includes voice recognition software and hardware support. The graphics family includes support for Spansion's HyperBus, a fast, off-chip memory access system (see "How HyperBus Delivers 330 Mbyte/s Using A Dozen Signals"). The 160 MHz S6E2DH series (Figure 1) has the 2D graphics engine with 384 Kbytes of flash and 36 Kbytes of RAM. The 200 MHz S6E2CCA series (Figure 1) foregoes the 2D graphics engine and bulks up on memory to provide automatic speech recognition (ASR) support.


Watson Speech-to-Text is paying attention to what people are saying (even when you are not) - Watson

#artificialintelligence

Many conference calls, call center conversations and webinars are recorded for replay, but transcription can help listeners get more from calls. You can use the IBM Watson Speech to Text service to add speech transcription capabilities to your applications. In a new episode of the Building with Watson webinar series, Bhavik Shah, Senior Offering Manager for IBM Watson, talks with Zach Walchuk about some of the newest features of the Speech to Text service, including language model customization and diarization. With the Speech to Text Language Model Customization capability, you can train the service to learn from your input.