Goto

Collaborating Authors

Google Brain Co-Founder Teams With Foxconn to Bring AI to Factories

#artificialintelligence

Consumers now experience AI mostly through image recognition to help categorize digital photographs and speech recognition that helps power digital voice assistants such as Apple Inc's Siri or Amazon.com But at a press briefing in San Francisco two days before Ng's Landing.ai In many factories, workers look over parts coming off an assembly line for defects. Ng showed a video in which a worker instead put a circuit board beneath a digital camera connected to a computer and the computer identified a defect in the part. Ng said that while typical computer vision systems might require thousands of sample images to become "trained," Landing.ai's


Google Brain co-founder teams with Foxconn to bring AI to factories

#artificialintelligence

Consumers now experience AI mostly through image recognition to help categorize digital photographs and speech recognition that helps power digital voice assistants such as Apple Inc's Siri or Amazon.com But at a press briefing in San Francisco two days before Ng's Landing.ai In many factories, workers look over parts coming off an assembly line for defects. Ng showed a video in which a worker instead put a circuit board beneath a digital camera connected to a computer and the computer identified a defect in the part. Ng said that while typical computer vision systems might require thousands of sample images to become "trained," Landing.ai's


Can We Use Speaker Recognition Technology to Attack Itself? Enhancing Mimicry Attacks Using Automatic Target Speaker Selection

arXiv.org Machine Learning

ABSTRACT We consider technology-assisted mimicry attacks in the context of automatic speaker verification (ASV). We use ASV itself to select targeted speakers to be attacked by human-based mimicry. We recorded 6 naive mimics for whom we select target celebrities from VoxCeleb1 and VoxCeleb2 corpora (7,365 potential targets) using an i-vector system. The attacker attempts to mimic the selected target, with the utterances subjected to ASV tests using an independently developed x-vector system. Our main finding is negative: even if some of the attacker scores against the target speakers were slightly increased, our mimics did not succeed in spoofing the x-vector system. Interestingly, however, the relative ordering of the selected targets (closest, furthest, median) are consistent between the systems, which suggests some level of transferability between the systems.


Automatic Speech Transcription And Speaker Recognition Simultaneously Using Apple AI

#artificialintelligence

Last year, Apple witnessed several controversies regarding its speech recognition technology. To provide quality control in the company's voice assistant Siri, Apple asked its contractors to regularly hear the confidential voice recordings in the name of the "Siri Grading Program". However, to this matter, the company later apologised and published a statement where it announced the changes in the Siri grading program. This year, the tech giant has been gearing up a number of researchers regarding speech recognition technology to upgrade its voice assistant. Recently, the researchers at Apple developed an AI model which can perform automatic speech transcription and speaker recognition simultaneously.


Deep Speaker Embeddings for Far-Field Speaker Recognition on Short Utterances

arXiv.org Machine Learning

Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions according to the results obtained for early NIST SRE (Speaker Recognition Evaluation) datasets. From the practical point of view, taking into account the increased interest in virtual assistants (such as Amazon Alexa, Google Home, AppleSiri, etc.), speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks. This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances. For these purposes, we considered deep neural network architectures based on TDNN (TimeDelay Neural Network) and ResNet (Residual Neural Network) blocks. We experimented with state-of-the-art embedding extractors and their training procedures. Obtained results confirm that ResNet architectures outperform the standard x-vector approach in terms of speaker verification quality for both long-duration and short-duration utterances. We also investigate the impact of speech activity detector, different scoring models, adaptation and score normalization techniques. The experimental results are presented for publicly available data and verification protocols for the VoxCeleb1, VoxCeleb2, and VOiCES datasets.