AITopics

2509.22808

Country:

Asia > Middle East > UAE (0.28)
Africa > Middle East > Morocco > Casablanca-Settat Region > Casablanca (0.26)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.36)

arXiv.org Artificial IntelligenceMay-16-2025

SpecWav-Attack: Leveraging Spectrogram Resizing and Wav2Vec 2.0 for Attacking Anonymized Speech

Li, Yuqi, Zheng, Yuanzhong, Guo, Zhongtian, Wang, Yaoxuan, Yin, Jianjun, Fei, Haojun

--This paper presents SpecWav-Attack, an adversarial model for detecting speakers in anonymized speech. It leverages Wav2V ec2 for feature extraction [1] and incorporates spectrogram resizing and incremental training for improved performance. Evaluated on librispeech-dev and librispeech-test, SpecWav-Attack outperforms conventional attacks, revealing vulnerabilities in anonymized speech systems and emphasizing the need for stronger defenses, benchmarked against the ICASSP 2025 Attacker Challenge [2]. This paper introduces SpecWav-Attack, a tailored adversarial model for attacking anonymized speech with a focus on Effective Equal Error Rate (EER). Using the ECAP A-TDNN architecture [3], we integrate the Wav2V ec2 self-supervised model [1] to enrich speech representations, enhancing sensitivity to variations in anonymized data.

artificial intelligence, machine learning, specwav-attack, (14 more...)

2505.09616

Country: Asia > China (0.24)

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceJan-21-2025

An End-to-End Approach for Korean Wakeword Systems with Speaker Authentication

Seo, Geonwoo

Wakeword detection plays a critical role in enabling AI assistants to listen to user voices and interact effectively. However, for languages other than English, there is a significant lack of pre-trained wakeword models. Additionally, systems that merely determine the presence of a wakeword can pose serious privacy concerns. In this paper, we propose an end-to-end approach that trains wakewords for Non-English languages, particulary Korean, and uses this to develop a Voice Authentication model to protect user privacy. Our implementation employs an open-source platform OpenWakeWord, which performs wakeword detection using an FCN (Fully-Connected Network) architecture. Once a wakeword is detected, our custom-developed code calculates cosine similarity for robust user authentication. Experimental results demonstrate the effectiveness of our approach, achieving a 16.79% and a 6.6% Equal Error Rate (EER) each in the Wakeword Detection and the Voice Authentication. These findings highlight the model's potential in providing secure and accurate wakeword detection and authentication for Korean users.

artificial intelligence, machine learning, threshold, (13 more...)

2501.12194

Genre: Research Report > New Finding (0.66)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.37)

Solomon, Enoch, Woubie, Abraham, Emiru, Eyael Solomon

Unsupervised Deep Learning Image Verification Method

arXiv.org Artificial IntelligenceFeb-6-2024

Although deep learning are commonly employed for image recognition, usually huge amount of labeled training data is required, which may not always be readily available. This leads to a noticeable performance disparity when compared to state-of-the-art unsupervised face verification techniques. In this work, we propose a method to narrow this gap by leveraging an autoencoder to convert the face image vector into a novel representation. Notably, the autoencoder is trained to reconstruct neighboring face image vectors rather than the original input image vectors. These neighbor face image vectors are chosen through an unsupervised process based on the highest cosine scores with the training face image vectors. The proposed method achieves a relative improvement of 56\% in terms of EER over the baseline system on Labeled Faces in the Wild (LFW) dataset. This has successfully narrowed down the performance gap between cosine and PLDA scoring systems.

face image vector, image vector, vector, (14 more...)

2312.14395

Country:

North America > United States > Virginia > Richmond (0.04)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Styles, Suzy J., Chua, Victoria Y. H., Woon, Fei Ting, Liu, Hexin, Perera, Leibny Paola Garcia, Khudanpur, Sanjeev, Khong, Andy W. H., Dauwels, Justin

Investigating model performance in language identification: beyond simple error statistics

arXiv.org Artificial IntelligenceMay-30-2023

Language development experts need tools that can automatically identify languages from fluent, conversational speech, and provide reliable estimates of usage rates at the level of an individual recording. However, language identification systems are typically evaluated on metrics such as equal error rate and balanced accuracy, applied at the level of an entire speech corpus. These overview metrics do not provide information about model performance at the level of individual speakers, recordings, or units of speech with different linguistic characteristics. Overview statistics may therefore mask systematic errors in model performance for some subsets of the data, and consequently, have worse performance on data derived from some subsets of human speakers, creating a kind of algorithmic bias. In the current paper, we investigate how well a number of language identification systems perform on individual recordings and speech units with different linguistic properties in the MERLIon CCS Challenge. The Challenge dataset features accented English-Mandarin code-switched child-directed speech.

artificial intelligence, machine learning, speech, (18 more...)

2305.18925

Country:

Asia > Singapore (0.07)
North America > United States > New York (0.04)
Europe > Netherlands > South Holland > Delft (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.36)

#artificialintelligenceMar-5-2021, 04:35:06 GMT

Annual index finds AI is 'industrializing' but needs better metrics and testing

China has overtaken the United States in total number of AI research citations, fewer AI startups are receiving funding, and Congress is talking about AI more than ever. Those are three major trends highlighted in the 2021 AI Index, an annual report released today by Stanford University. Now in its fourth year, the AI Index attempts to document advances in artificial intelligence, as well as the technology's impact on education, startups, and government policy. The report details progress in the performance of major subdomains of AI, like deep learning, image recognition, and object detection, as well as in areas like protein folding. The AI Index is compiled by the Stanford Institute for Human-Centered Artificial Intelligence and an 11-member steering committee, with contributors from Harvard University, OECD, the Partnership on AI, and SRI International.

ai index, annual index find ai, language model, (12 more...)

#artificialintelligence

Country:

North America > United States (0.25)
Asia > China (0.25)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.51)

Coria, Juan M., Bredin, Hervé, Ghannay, Sahar, Rosset, Sophie

A Comparison of Metric Learning Loss Functions for End-To-End Speaker Verification

arXiv.org Machine LearningMar-31-2020

Despite the growing popularity of metric learning approaches, very little work has attempted to perform a fair comparison of these techniques for speaker verification. We try to fill this gap and compare several metric learning loss functions in a systematic manner on the VoxCeleb dataset. The first family of loss functions is derived from the cross entropy loss (usually used for supervised classification) and includes the congenerous cosine loss, the additive angular margin loss, and the center loss. The second family of loss functions focuses on the similarity between training samples and includes the contrastive loss and the triplet loss. We show that the additive angular margin loss function outperforms all other loss functions in the study, while learning more robust representations. Based on a combination of SincNet trainable features and the x-vector architecture, the network used in this paper brings us a step closer to a really-end-to-end speaker verification system, when combined with the additive angular margin loss, while still being competitive with the x-vector baseline. In the spirit of reproducible research, we also release open source Python code for reproducing our results, and share pretrained PyTorch models on torch.hub that can be used either directly or after fine-tuning.

loss function, recognition, representation, (9 more...)

2003.14021

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(2 more...)

Genre: Research Report > New Finding (0.89)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Speech > Acoustic Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Lazar, Claire, Vijaykumar, Suhas

A Resolution in Algorithmic Fairness: Calibrated Scores for Fair Classifications

arXiv.org Machine LearningFeb-18-2020

Calibration and equal error rates are fundamental conditions for algorithmic fairness that have been shown to conflict with each other, suggesting that they cannot be satisfied simultaneously. This paper shows that the two are in fact compatible and presents a method for reconciling them. In particular, we derive necessary and sufficient conditions for the existence of calibrated scores that yield classifications achieving equal error rates. We then present an algorithm that searches for the most informative score subject to both calibration and minimal error rate disparity. Applied empirically to credit lending, our algorithm provides a solution that is more fair and profitable than a common alternative that omits sensitive features.

calibrated score, equal error rate, error rate, (15 more...)

2002.07676

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.28)
Oceania > Australia > Western Australia > Perth (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Education (0.68)
Law (0.67)
Government > Regional Government > North America Government > United States Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Pevny, Tomas, Somol, Petr

Using Neural Network Formalism to Solve Multiple-Instance Problems

arXiv.org Machine LearningMar-7-2017

Many objects in the real world are difficult to describe by a single numerical vector of a fixed length, whereas describing them by a set of vectors is more natural. Therefore, Multiple instance learning (MIL) techniques have been constantly gaining on importance throughout last years. MIL formalism represents each object (sample) by a set (bag) of feature vectors (instances) of fixed length where knowledge about objects (e.g., class label) is available on bag level but not necessarily on instance level. Many standard tools including supervised classifiers have been already adapted to MIL setting since the problem got formalized in late nineties. In this work we propose a neural network (NN) based formalism that intuitively bridges the gap between MIL problem definition and the vast existing knowledge-base of standard models and classifiers. We show that the proposed NN formalism is effectively optimizable by a modified back-propagation algorithm and can reveal unknown patterns inside bags. Comparison to eight types of classifiers from the prior art on a set of 14 publicly available benchmark datasets confirms the advantages and accuracy of the proposed solution.

artificial intelligence, formalism, machine learning, (14 more...)

1609.07257

Country:

Europe (0.47)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Hafemann, Luiz G., Sabourin, Robert, Oliveira, Luiz S.

Analyzing features learned for Offline Signature Verification using Deep CNNs

arXiv.org Machine LearningAug-26-2016

Research on Offline Handwritten Signature Verification explored a large variety of handcrafted feature extractors, ranging from graphology, texture descriptors to interest points. In spite of advancements in the last decades, performance of such systems is still far from optimal when we test the systems against skilled forgeries - signature forgeries that target a particular individual. In previous research, we proposed a formulation of the problem to learn features from data (signature images) in a Writer-Independent format, using Deep Convolutional Neural Networks (CNNs), seeking to improve performance on the task. In this research, we push further the performance of such method, exploring a range of architectures, and obtaining a large improvement in state-of-the-art performance on the GPDS dataset, the largest publicly available dataset on the task. In the GPDS-160 dataset, we obtained an Equal Error Rate of 2.74%, compared to 6.97% in the best result published in literature (that used a combination of multiple classifiers). We also present a visual analysis of the feature space learned by the model, and an analysis of the errors made by the classifier. Our analysis shows that the model is very effective in separating signatures that have a different global appearance, while being particularly vulnerable to forgeries that very closely resemble genuine signatures, even if their line quality is bad, which is the case of slowly-traced forgeries.

artificial intelligence, machine learning, signature, (17 more...)

doi: 10.1109/ICPR.2016.7900092

1607.04573

Country: North America > Canada (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)