Goto

Collaborating Authors

Speech Recognition


Amazon Hardware Event 2023: Alexa, Echo Hub, Echo Frames, Eero, Fire TV

WIRED

Every fall, Amazon holds a "Devices and Services" media event where it unleashes a flood of new gadgets and software into the world. At the 2022 edition, Amazon announced a Kindle with a stylus, a robot dog, and a refreshed line of Echoes and Eeros, among other smart home gadgets. This year, the company was eager to prove that it hasn't been left behind by its rivals' recent advances in artificial intelligence and conversational interfaces. Executives showed off a smarter version of Alexa that's been given an AI boost, as well as new smart home products that harness Amazon's computer vision, machine intelligence, and face recognition technologies. There were some stumbling blocks during the presentation, but here are the highlights of what Amazon announced today.


Apple Watch Series 9 review: Freedom from touching your screen

Engadget

Have you seen the meme about people who dangle too many things on their fingers for no reason whatsoever? I'm not proud to admit it, but I'm one of those. No matter how big of a bag I'm carrying, I always find my hands full, making it difficult to interact with my phone or smartwatch on the go. Which is why voice controlled assistants and hands-free gestures are so appealing. With the Apple Watch Series 9, the company is introducing two new methods of interaction: Double Tap and Raise to Speak (to Siri).


Meta unveils 'Seamless' speech-to-speech translator

ZDNet

Meta, owner of Facebook, Instagram, and WhatsApp, on Tuesday unveiled its latest effort in machine translation, this one geared toward speech translation. The program, SeamlessM4T, surpasses existing models that are trained specifically for speech-to-speech translation between languages, as well as models that convert between speech and text in multiple language pairs. Hence, SeamlessM4T is an example not just of generality but of what is called multi-modality -- the ability for one program to operate on multiple data types, in this case, both speech and text data. Previously, Meta has focused on large language models that can translate text between 200 different languages. That focus on text is a problem, say lead author Loïc Barrault and colleagues at both Meta and UC California at Berkeley.


Meta's new AI model is a real-time translation expert

Mashable

Meta's latest AI output is a major advancement for real-time text and speech translation. On Tuesday, the company released SeamlessM4T: a multimodal model that translates text to speech and vice versa. Meta claims SeamlessM4T is "the first all-in-one multilingual multimodal AI translation and transcription model," meaning it is uniquely able to translate and transcribe languages at the same time. SeamlessM4T can translate speech-to-text, speech-to-speech, text-to-speech, and text-to-text inputs for up to 100 languages. Translations for speech-to-speech and text-to-speech translations outputs support 35 languages.


Meta's new multimodal translator uses a single model to speak 100 languages

Engadget

Though it's not quite ready to usher in the Doolittle future we've all been waiting for, modern AI translation methods are proving more than sufficient in accurately transforming humanity's roughly 6,500 spoken and written communication systems between one another. The problem is that each of these models tends to only do one or two tasks really well -- translate and convert text to speech, speech to text or between either of the two sets -- so you end up having to smash a bunch of models on top of each other to create the generalized performance seen in the likes of Google Translate or Facebook's myriad language services. That's a computationally intensive process, so Meta developed a single model that can do it all. SeamlessM4T is "a foundational multilingual and multitask model that seamlessly translates and transcribes across speech and text," Meta's blog from Tuesday reads. It can translate between any of nearly 100 languages for speech-to-text and text-to-text functions, speech-to-speech and text-to-speech supports those same languages as inputs and outputs them in any of 36 others tongues, including English.


Say cheese! How to take hands-free photos on your phone

FOX News

Kurt'The CyberGuy' Knutsson shows the best way to take voice control selfies. How cool would it be if you could just say "Cheese!" and voilà, your phone would capture the perfect selfie without you ever lifting a finger? Oh yes, you heard that right. Aside from selfies, there are several other benefits of setting up a voice-controlled camera. It can capture speech much faster than you can type.


Google now alerts you if your contact info appears online

PCWorld

Back in September, Google launched a way to remove search results showing personal contact info--details like your home address, phone number, and email address. You could make the request from any method of accessing Google search, then monitor the status of your request(s) from its new Results About You tool. But you still had to stay on top of search results about you. Now the company has rolled out some updates to the tool to alleviate some of that work (and stress)--as well as introducing a further way to protect your privacy online. Promised to arrive in "the coming days," the Results About You's new dashboard will both show you existing search results containing your contact info and also automatically alert you if those personal details again pop up later on.


For captioning, humans are still the key to accessible, AI-driven tech

Mashable

The case for human oversight of artificial intelligence (AI) services continues, with the intertwined world of audio transcription, captioning, and automatic speech recognition (ASR) joining the call for applications that complement, not replace, human input. Captions and subtitles serve a vital role in providing media and information access to viewers who are deaf or hard of hearing, and they've risen in popular use over the past several years. Disability advocates have pushed for better captioning options for decades, highlighting a need that's increasingly relevant with the proliferation of on-demand streaming services. Video-based platforms have quickly latched onto AI, as well, with YouTube announcing early tests of a new AI feature that summarizes entire videos and TikTok exploring its own chat bot. So with the growing craze over AI as a buoy to tech's limitations, involving the latest AI tools and services in automatic captioning might seem like a logical next step.


Meta's newest dataset will train speech recognition engines on 'clusters' of speakers

Engadget

It is 2023 and, sorry, Siri somehow still didn't catch that. Despite the tsunami of advancements generative AI systems have enjoyed in recent months, the synthetic assistants on our mobile devices remain nearly as hard of hearing as they were in 2011. A newly developed dataset from Meta AI, however, promises to improve the performance of such automatic speech recognition (ASR) tools by clustering speech at the "utterance level." Meta has long sought to improve its ASRs' performance, teaching them to train without the aid of transcripts, recognize more than 4,000 spoken languages and even read lips at a higher proficiency than human experts. However, many of the datasets used to train ASR models are organized by demographic -- age group, gender, nationality, English accent -- which limit the variation of pronunciations that models are trained on, ultimately hindering their function in understanding a broad cross section of users.


Should you get an Echo or Echo Dot? We compare the two.

Mashable

The smart home concept is increasingly becoming the norm. Over the past few years, we've seen an increase in voice recognition assistants and connected devices to make our busy lives more seamless. With a few speaking commands, we don't have to lift a finger to play music, stream content, turn off the lights, or pick up the phone to call family members. With these advanced products, we can take the stress out of daily life tasks and have time for things we really enjoy. With this technology boom though, it can be hard to transition your home from "basic" to "smart-home" status.