Collaborating Authors


Research Papers based on development in the Speech Recognition Industry


Abstract: Under noisy conditions, automatic speech recognition (ASR) can greatly benefit from the addition of visual signals coming from a video of the speaker's face. However, when multiple candidate speakers are visible this traditionally requires solving a separate problem, namely active speaker detection (ASD), which entails selecting at each moment in time which of the visible faces corresponds to the audio. Recent work has shown that we can solve both problems simultaneously by employing an attention mechanism over the competing video tracks of the speakers' faces, at the cost of sacrificing some accuracy on active speaker detection. This work closes this gap in active speaker detection accuracy by presenting a single model that can be jointly trained with a multi-task loss. Abstract: Automatic pronunciation assessment is an important technology to help self-directed language learners. While pronunciation quality has multiple aspects including accuracy, fluency, completeness, and prosody, previous efforts typically only model one aspect (e.g., accuracy) at one granularity (e.g., at the phoneme-level).

Sonos is adding Voice Control to its speakers, so pleading for everything to stop will actually do something


Sonos is introducing voice commands for its speakers, finally letting you start your depression playlist by groaning from the couch as God intended. Announced today, Sonos Voice Control will arrive in a free software update for all voice-capable Sonos speakers running the Sonos S2 operating system, including the Roam, Beam, Move, and Arc. This update will let you issue oral commands to find specific songs, ask what's playing, control the sound on their TV, and adjust volume and playback all without using your hands. You won't be able to set timers or reminders, as Sonos Voice Control isn't a fully fledged voice assistant. However if you have more than one Sonos speaker, you will be able to use it to change where your audio is playing.

Sonos launches cheaper Ray soundbar and new voice control system

The Guardian

Sonos, the wireless home-audio specialist, is launching a lower-cost model of its popular TV soundbars alongside its own new voice control system for its smart speakers after its public bust-up with Google. The new Ray soundbar is a more compact version of Sonos's popular Arc and Beam models, designed to fit neatly in TV stands without affecting sound quality. It connects to a TV through an optical cable, has wifi for streaming music and can be controlled with the Sonos app or a TV remote. The Ray will cost £279 in the UK or $279 in the US from 7 June, sitting below the £449 Beam as the firm's most affordable model. It has two tweeters and two midwoofer speakers, along with the company's Trueplay smart tuning system, promising balanced sound with solid bass and crisp dialogue.

The new Sonos voice assistant seems faster than the competition


Sonos devices have supported Amazon's Alexa voice assistant for almost five years now. The Sonos One from 2017 was the first speaker the company made with built-in microphones, and almost every speaker it's made since has worked with Alexa, not to mention Google Assistant. Despite supporting those popular services, though, Sonos has decided to build its own voice assistant. Dubbed Sonos Voice Control, the feature is specifically designed to work with music only, so this isn't exactly a competitor to Alexa and Google Assistant. Instead, it's meant to control your music as quickly as possible, and with privacy in mind.

Google Assistant's Future Is Looking Us Right in the Face


For years we've been promised a computing future where our commands aren't tapped, typed, or swiped, but spoken. Embedded in this promise is, of course, convenience; voice computing will not only be hands-free, but totally helpful and rarely ineffective. That hasn't quite panned out. The usage of voice assistants has gone up in recent years as more smartphone and smart home customers opt into (or in some cases, accidentally "wake up") the AI living in their devices. But ask most people what they use these assistants for, and the voice-controlled future sounds almost primitive, filled with weather reports and dinner timers.

AI-driven biometry and the infrastructures of everyday life


Over the past years, we have become witness to the exponentially growing proliferation of biometric technologies: facial recognition technology and fingerprint scanners in our phones, sleep-pattern detection technology on our wrists or speech-recognition software that facilitates auto-dictation such as captioning. What all these technologies do is measure and record some aspect of the human body or its function: facial recognition technology measures facial features, fingerprint scanners measure the distance between the ridges that make up a unique fingerprint, sleep-pattern detection measures movement in our sleep as a proxy for wakefulness, and so on. AI is fundamentally a scaling technology. It is walking in the footsteps of many other technologies that have deployed classification and categorisation in the name of making bureaucratic processes more efficient, from ancient library systems to punch cards, to modern computer-vision technologies that'know' the difference between a house, a road, a vehicle and a human. The basic idea of these scaling technologies is to minimise situations in which individual judgement is required (see also Lorraine Daston's seminal work on rules).

Create video subtitles with Amazon Transcribe using this no-code workflow


Subtitle creation on video content poses challenges no matter how big or small the organization. To address those challenges, Amazon Transcribe has a helpful feature that enables subtitle creation directly within the service. There is no machine learning (ML) or code writing required to get started. This post walks you through setting up a no-code workflow for creating video subtitles using Amazon Transcribe within your Amazon Web Services account. The terms subtitles and closed captions are commonly used interchangeably, and both refer to spoken text displayed on the screen.

Google now allows virtual food brands to have Google Business Profiles


Google has updated its Google Business profile guidelines page under the "guidelines for chains, departments & individual practitioners" to allow virtual food brands to be listed with "conditions." One of the more popular virtual food brands is MrBeast Burgers, a popular YouTube creator has deals with local burger shops to sell his own branded burgers but MrBeast does not have any official burger shop or workers. You buy virtual branded food items made by the local shop. Joy Hawkins said with these updated guidelines "Mr. Beast would be allowed listings and should set them up as service area listings (without an address)."

Reduce Speech Transcription Costs by up to 90% with CAI (WP030)


Conversational artificial intelligence (CAI) uses deep learning (DL), a subset of machine learning (ML), to automate speech recognition, natural language processing and text to speech using machines.

Sonos may roll out its own voice assistant next month


It seems Sonos is gearing up to roll out its own long-rumored voice assistant in the coming weeks. Sonos Voice is said to offer voice control for music playback on many of the company's devices, offering owners another option if they'd rather not use Amazon Alexa and Google Assistant. Sonos will first roll out Sonos Voice in the US on June 1st as part of a software update, according to The Verge. The feature should arrive in other countries later. A rumored $250 soundbar called Sonos Ray will likely be among those.