If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."
However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …
Nuance Communications has a new partner to help bring its voice recognition technology to more first responders. The conversational technology business is working with Nexgen, a Computer Aided Dispatch and Records Management Systems (CAD/RMS) provider, to bring AI-powered voice recognition to dispatch and public safety communications systems. Specifically, the companies will be combining Nuance's Dragon Law Enforcement speech recognition technology with Nexgen's set of software system offerings. Nuance's Dragon Law Enforcement is already used by thousands of law enforcement officers across the US, Mark Geremia, Nuance's general manager of Dragon, told ZDNet. By partnering with Nexgen, Nuance can expand its customer base beyond law enforcement to include fire departments, 911 call centers and EMT agencies.
Recent speech technology research has seen a growing interest in using WaveNets as statistical vocoders, i.e., generating speech waveforms from acoustic features. These models have been shown to improve the generated speech quality over classical vocoders in many tasks, such as text-to-speech synthesis and voice conversion. Furthermore, conditioning WaveNets with acoustic features allows sharing the waveform generator model across multiple speakers without additional speaker codes. However, multi-speaker WaveNet models require large amounts of training data and computation to cover the entire acoustic space. This paper proposes leveraging the source-filter model of speech production to more effectively train a speaker-independent waveform generator with limited resources. We present a multi-speaker 'GlotNet' vocoder, which utilizes a WaveNet to generate glottal excitation waveforms, which are then used to excite the corresponding vocal tract filter to produce speech. Listening tests show that the proposed model performs favourably to a direct WaveNet vocoder trained with the same model architecture and data.
The ASVspoof challenge series was born to spearhead research in anti-spoofing for automatic speaker verification (ASV). The two challenge editions in 2015 and 2017 involved the assessment of spoofing countermeasures (CMs) in isolation from ASV using an equal error rate (EER) metric. While a strategic approach to assessment at the time, it has certain shortcomings. First, the CM EER is not necessarily a reliable predictor of performance when ASV and CMs are combined. Second, the EER operating point is ill-suited to user authentication applications, e.g. telephone banking, characterised by a high target user prior but a low spoofing attack prior. We aim to migrate from CM- to ASV-centric assessment with the aid of a new tandem detection cost function (t-DCF) metric. It extends the conventional DCF used in ASV research to scenarios involving spoofing attacks. The t-DCF metric has 6 parameters: (i) false alarm and miss costs for both systems, and (ii) prior probabilities of target and spoof trials (with an implied third, nontarget prior). The study is intended to serve as a self-contained, tutorial-like presentation. We analyse with the t-DCF a selection of top-performing CM submissions to the 2015 and 2017 editions of ASVspoof, with a focus on the spoofing attack prior. Whereas there is little to choose between countermeasure systems for lower priors, system rankings derived with the EER and t-DCF show differences for higher priors. We observe some ranking changes. Findings support the adoption of the DCF-based metric into the roadmap for future ASVspoof challenges, and possibly for other biometric anti-spoofing evaluations.
AdMobilize will introduce its MATRIX Voice dev board to the digital signage industry at DSE 2018 in booth 2369 at the Las Vegas Convention Center. "Put simply, the company that introduced AI-powered audience analytics to the digital signage industry is now bringing voice recognition functionality to both manufacturers and systems integrators alike through its MATRIX product line," said AdMobilize co-founder and CEO Rodolfo Saccoman. "We believe that voice engagement technologies will make digital signage a more compelling and sticky communications solution for an even broader range of vertical markets. The combination of audience analytics and voice recognition functionality truly represents the next chapter in this constantly evolving industry and AdMobilize is at the forefront of making this chapter a reality." Available for $55.00, MATRIX Voice will integrate with any voice recognition service (Amazon Alexa, Google Assistant, or any other third-party service) at any time.
Since Apple developed Siri there have been great strides made in the science of voice recognition. Will we soon be throwing away our mice and keyboards and simply talking to our computers? Or will the problems I have with Alexa continue to haunt voice recognition? My wife and I are like all married couples at breakfast. We do not speak to each other.