voiceprint
'The disruption is already happening!' Is AI about to ruin your favourite TV show?
Justine Bateman won't name names, but a TV showrunner friend once came to her with a dilemma: their show's team was well into filming its second season when a network executive had an idea. A character in the pilot hadn't tested well with audiences, so they were just going to go in, use a little AI, and swap in someone else. The showrunner – and Bateman, an actor and director – were understandably incensed. "When you change the beginning of something, you change the creative trajectory," says Bateman. "There's going to be whiplash for the viewer when they get to episode three or four because what was set up in the pilot got messed with and now doesn't make sense." Using AI might have seemed like a simple solution to the executive, but to the showrunner, it was catastrophic.
- Media > Television (1.00)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
How hackers are now targeting your voice and how to protect yourself
Kurt "The CyberGuy" Knutsson describes a situation in which a viewer was hacked and reveals what steps you can take to avoid this from happening to you. In today's digital chorus, your voice is the newest solo. It's not just for singing in the shower or whispering sweet nothings anymore. However, just as we're crooning over the idea of voice authentication, hackers are hitting a high note, mastering the art of mimicking it. CLICK TO GET KURT'S FREE CYBERGUY NEWSLETTER WITH SECURITY ALERTS, QUICK TIPS, TECH REVIEWS AND EASY HOW-TO'S TO MAKE YOU SMARTER When enrolling in voice authentication, you are asked to repeat a specific phrase in your own voice.
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.31)
Catch You and I Can: Revealing Source Voiceprint Against Voice Conversion
Deng, Jiangyi, Chen, Yanjiao, Zhong, Yinan, Miao, Qianhao, Gong, Xueluan, Xu, Wenyuan
Voice conversion (VC) techniques can be abused by malicious parties to transform their audios to sound like a target speaker, making it hard for a human being or a speaker verification/identification system to trace the source speaker. In this paper, we make the first attempt to restore the source voiceprint from audios synthesized by voice conversion methods with high credit. However, unveiling the features of the source speaker from a converted audio is challenging since the voice conversion operation intends to disentangle the original features and infuse the features of the target speaker. To fulfill our goal, we develop Revelio, a representation learning model, which learns to effectively extract the voiceprint of the source speaker from converted audio samples. We equip Revelio with a carefully-designed differential rectification algorithm to eliminate the influence of the target speaker by removing the representation component that is parallel to the voiceprint of the target speaker. We have conducted extensive experiments to evaluate the capability of Revelio in restoring voiceprint from audios converted by VQVC, VQVC+, AGAIN, and BNE. The experiments verify that Revelio is able to rebuild voiceprints that can be traced to the source speaker by speaker verification and identification systems. Revelio also exhibits robust performance under inter-gender conversion, unseen languages, and telephony networks.
- South America > Paraguay > Asunción > Asunción (0.04)
- North America (0.04)
- Europe (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Speech > Acoustic Processing (0.71)
- Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.70)
Privacy-Utility Balanced Voice De-Identification Using Adversarial Examples
Chen, Meng, Lu, Li, Yu, Jiadi, Chen, Yingying, Ba, Zhongjie, Lin, Feng, Ren, Kui
Faced with the threat of identity leakage during voice data publishing, users are engaged in a privacy-utility dilemma when enjoying convenient voice services. Existing studies employ direct modification or text-based re-synthesis to de-identify users' voices, but resulting in inconsistent audibility in the presence of human participants. In this paper, we propose a voice de-identification system, which uses adversarial examples to balance the privacy and utility of voice services. Instead of typical additive examples inducing perceivable distortions, we design a novel convolutional adversarial example that modulates perturbations into real-world room impulse responses. Benefit from this, our system could preserve user identity from exposure by Automatic Speaker Identification (ASI) while remaining the voice perceptual quality for non-intrusive de-identification. Moreover, our system learns a compact speaker distribution through a conditional variational auto-encoder to sample diverse target embeddings on demand. Combining diverse target generation and input-specific perturbation construction, our system enables any-to-any identify transformation for adaptive de-identification. Experimental results show that our system could achieve 98% and 79% successful de-identification on mainstream ASIs and commercial systems with an objective Mel cepstral distortion of 4.31dB and a subjective mean opinion score of 4.48.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > United Kingdom > England > Greater London > London (0.14)
- North America > United States > California > San Diego County > San Diego (0.04)
- (20 more...)
V-Cloak: Intelligibility-, Naturalness- & Timbre-Preserving Real-Time Voice Anonymization
Deng, Jiangyi, Teng, Fei, Chen, Yanjiao, Chen, Xiaofu, Wang, Zhaohui, Xu, Wenyuan
Voice data generated on instant messaging or social media applications contains unique user voiceprints that may be abused by malicious adversaries for identity inference or identity theft. Existing voice anonymization techniques, e.g., signal processing and voice conversion/synthesis, suffer from degradation of perceptual quality. In this paper, we develop a voice anonymization system, named V-Cloak, which attains real-time voice anonymization while preserving the intelligibility, naturalness and timbre of the audio. Our designed anonymizer features a one-shot generative model that modulates the features of the original audio at different frequency levels. We train the anonymizer with a carefully-designed loss function. Apart from the anonymity loss, we further incorporate the intelligibility loss and the psychoacoustics-based naturalness loss. The anonymizer can realize untargeted and targeted anonymization to achieve the anonymity goals of unidentifiability and unlinkability. We have conducted extensive experiments on four datasets, i.e., LibriSpeech (English), AISHELL (Chinese), CommonVoice (French) and CommonVoice (Italian), five Automatic Speaker Verification (ASV) systems (including two DNN-based, two statistical and one commercial ASV), and eleven Automatic Speech Recognition (ASR) systems (for different languages). Experiment results confirm that V-Cloak outperforms five baselines in terms of anonymity performance. We also demonstrate that V-Cloak trained only on the VoxCeleb1 dataset against ECAPA-TDNN ASV and DeepSpeech2 ASR has transferable anonymity against other ASVs and cross-language intelligibility for other ASRs. Furthermore, we verify the robustness of V-Cloak against various de-noising techniques and adaptive attacks. Hopefully, V-Cloak may provide a cloak for us in a prism world.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Europe > France (0.04)
- Asia > China > Hubei Province > Wuhan (0.04)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
A Novel Approach to Integrate Speech Recognition into Authentication Systems
Voice recognition provides a number of important benefits over other kinds of identity authentication, including iris scans, face recognition, and fingerprint scans. To begin with, it is generally used for authentication on mobile phones, since all phones come equipped with microphones. Second, it is cost-effective to integrate into other devices like household appliances and autos [19]; third, it is convenient and familiar to the majority of consumers due to the fast expansion of IoT devices. Finally, it has been shown to be very accurate in some settings [20]. A client-side application and a server are commonly used in a voice authentication system.
- Research Report > Promising Solution (0.40)
- Overview > Innovation (0.40)
Your voiceprint could be your new password as companies look to increase security for remote workers
As working from home moves from a temporary solution to the new normal, companies need new ways to secure data and protect internal networks . Banks are most likely to use voiceprints to authenticate users but more companies are considering this approach. Nuance Communications uses a voiceprint algorithm powered by a deep neural network to analyze 1,000 parameters of an individual's voice, including tone, pitch, pacing and fluctuations in the sound. The engine determines which parameters are most relevant for each individual and weights the appropriate elements accordingly. Simon Marchand, chief fraud prevention officer at Nuance, worked in fraud prevention for 10 years in the financial and telecom industries.
- Information Technology > Security & Privacy (1.00)
- Banking & Finance (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.40)
McDonald's being sued in Illinois for collecting customer's biometric data at AI-powered drive-thru
McDonald's is being sued for recording customers' biometric data at its new artificially intelligent-powered drive-thru windows without getting their consent. In court filings, Shannon Carpenter, a customer at a McDonald's in Lombard, Illinois, claims the system violates Illinois' Biometric Information Privacy Act, or BIPA, by not getting his approval before using voice-recognition technology to take his order. BIPA requires companies to inform customers their biometric information--including voiceprints, facial features, fingerprints and other unique physiological features--is being collected. Illinois is only one of a handful of states with biometric privacy laws, but they are considered the most stringent. A McDonald's customer in Chicago is suing the burger chain, claiming it records and stores users' voiceprints without their written consent, in violation of Illinois strict biometric privacy law In 2020, the fast-food chain began testing out using voice-recognition software in lieu of human servers at 10 locations in and around Chicago.
- North America > United States > Illinois > Cook County > Chicago (0.48)
- North America > United States > Illinois > DuPage County > Lombard (0.25)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Consumer Products & Services (1.00)
TikTok Has Started Collecting Your 'Faceprints' and 'Voiceprints.' Here's What It Could Do With Them
Recently, TikTok made a change to its U.S. privacy policy, allowing the company to "automatically" collect new types of biometric data, including what it describes as "faceprints" and "voiceprints." TikTok's unclear intent, the permanence of the biometric data and potential future uses for it have caused concern among experts who say users' security and privacy could be at risk. On June 2, TikTok updated the "Information we collect automatically" portion of its privacy policy to include a new section called "Image and Audio Information," giving itself permission to gather certain physical and behavioral characteristics from its users' content. The increasingly popular video sharing app may now collect biometric information such as "faceprints and voiceprints," but the update doesn't define these terms or what the company plans to do with the data. "Generally speaking, these policy changes are very concerning," Douglas Cuthbertson, a partner in Lieff Cabraser's Privacy & Cybersecurity practice group, tells TIME.
- Asia > China (0.16)
- North America > United States > Illinois (0.06)
- North America > United States > Wisconsin > Milwaukee County > Milwaukee (0.05)
- (4 more...)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Vision > Face Recognition (0.95)
WASE: Learning When to Attend for Speaker Extraction in Cocktail Party Environments
Hao, Yunzhe, Xu, Jiaming, Zhang, Peng, Xu, Bo
In the speaker extraction problem, it is found that additional information from the target speaker contributes to the tracking and extraction of the target speaker, which includes voiceprint, lip movement, facial expression, and spatial information. However, no one cares for the cue of sound onset, which has been emphasized in the auditory scene analysis and psychology. Inspired by it, we explicitly modeled the onset cue and verified the effectiveness in the speaker extraction task. We further extended to the onset/offset cues and got performance improvement. From the perspective of tasks, our onset/offset-based model completes the composite task, a complementary combination of speaker extraction and speaker-dependent voice activity detection. We also combined voiceprint with onset/offset cues. Voiceprint models voice characteristics of the target while onset/offset models the start/end information of the speech. From the perspective of auditory scene analysis, the combination of two perception cues can promote the integrity of the auditory object. The experiment results are also close to state-of-the-art performance, using nearly half of the parameters. We hope that this work will inspire communities of speech processing and psychology, and contribute to communication between them. Our code will be available in https://github.com/aispeech-lab/wase/.