AITopics

doi: 10.3233/FAIA251115

2510.1716

Country: North America > United States (0.46)

Genre: Research Report > Promising Solution (0.66)

Industry: Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.46)

arXiv.org Artificial IntelligenceOct-26-2024

Angel or Devil: Discriminating Hard Samples and Anomaly Contaminations for Unsupervised Time Series Anomaly Detection

Zhang, Ruyi, Xu, Hongzuo, Jian, Songlei, Tan, Yusong, Zhou, Haifang, Xu, Rulin

Training in unsupervised time series anomaly detection is constantly plagued by the discrimination between harmful `anomaly contaminations' and beneficial `hard normal samples'. These two samples exhibit analogous loss behavior that conventional loss-based methodologies struggle to differentiate. To tackle this problem, we propose a novel approach that supplements traditional loss behavior with `parameter behavior', enabling a more granular characterization of anomalous patterns. Parameter behavior is formalized by measuring the parametric response to minute perturbations in input samples. Leveraging the complementary nature of parameter and loss behaviors, we further propose a dual Parameter-Loss Data Augmentation method (termed PLDA), implemented within the reinforcement learning paradigm. During the training phase of anomaly detection, PLDA dynamically augments the training data through an iterative process that simultaneously mitigates anomaly contaminations while amplifying informative hard normal samples. PLDA demonstrates remarkable versatility, which can serve as an additional component that seamlessly integrated with existing anomaly detectors to enhance their detection performance. Extensive experiments on ten datasets show that PLDA significantly improves the performance of four distinct detectors by up to 8\%, outperforming three state-of-the-art data augmentation methods.

artificial intelligence, data mining, machine learning, (17 more...)

2410.21322

Country: Asia > China > Beijing > Beijing (0.04)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.34)

Industry: Water & Waste Management > Water Management (0.67)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Li, Jiajin, Zhu, Linglingzhi, So, Anthony Man-Cho

Nonsmooth Nonconvex-Nonconcave Minimax Optimization: Primal-Dual Balancing and Iteration Complexity Analysis

arXiv.org Artificial IntelligenceJul-26-2023

Nonconvex-nonconcave minimax optimization has gained widespread interest over the last decade. However, most existing works focus on variants of gradient descent-ascent (GDA) algorithms, which are only applicable to smooth nonconvex-concave settings. To address this limitation, we propose a novel algorithm named smoothed proximal linear descent-ascent (smoothed PLDA), which can effectively handle a broad range of structured nonsmooth nonconvex-nonconcave minimax problems. Specifically, we consider the setting where the primal function has a nonsmooth composite structure and the dual function possesses the Kurdyka-Lojasiewicz (KL) property with exponent $\theta \in [0,1)$. We introduce a novel convergence analysis framework for smoothed PLDA, the key components of which are our newly developed nonsmooth primal error bound and dual error bound. Using this framework, we show that smoothed PLDA can find both $\epsilon$-game-stationary points and $\epsilon$-optimization-stationary points of the problems of interest in $\mathcal{O}(\epsilon^{-2\max\{2\theta,1\}})$ iterations. Furthermore, when $\theta \in [0,\frac{1}{2}]$, smoothed PLDA achieves the optimal iteration complexity of $\mathcal{O}(\epsilon^{-2})$. To further demonstrate the effectiveness and wide applicability of our analysis framework, we show that certain max-structured problem possesses the KL property with exponent $\theta=0$ under mild assumptions. As a by-product, we establish algorithm-independent quantitative relationships among various stationarity concepts, which may be of independent interest.

artificial intelligence, machine learning, optimization problem, (16 more...)

2209.10825

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.64)

Industry: Materials (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Sholokhov, Alexey, Kuzmin, Nikita, Lee, Kong Aik, Chng, Eng Siong

Probabilistic Back-ends for Online Speaker Recognition and Clustering

arXiv.org Artificial IntelligenceFeb-19-2023

This paper focuses on multi-enrollment speaker recognition which naturally occurs in the task of online speaker clustering, and studies the properties of different scoring back-ends in this scenario. First, we show that popular cosine scoring suffers from poor score calibration with a varying number of enrollment utterances. Second, we propose a simple replacement for cosine scoring based on an extremely constrained version of probabilistic linear discriminant analysis (PLDA). The proposed model improves over the cosine scoring for multi-enrollment recognition while keeping the same performance in the case of one-to-one comparisons. Finally, we consider an online speaker clustering task where each step naturally involves multi-enrollment recognition. We propose an online clustering algorithm allowing us to take benefits from the PLDA model such as the ability to handle uncertainty and better score calibration. Our experiments demonstrate the effectiveness of the proposed algorithm.

artificial intelligence, machine learning, pattern recognition, (18 more...)

2302.09523

Country:

Asia > Singapore (0.04)
North America > United States > New York (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Speech Recognition (0.63)

arXiv.org Artificial IntelligenceNov-2-2022

I4U System Description for NIST SRE'20 CTS Challenge

Lee, Kong Aik, Kinnunen, Tomi, Colibro, Daniele, Vair, Claudio, Nautsch, Andreas, Sun, Hanwu, He, Liang, Liang, Tianyu, Wang, Qiongqiong, Rouvier, Mickael, Bousquet, Pierre-Michel, Das, Rohan Kumar, Bailo, Ignacio Viñals, Liu, Meng, Deldago, Héctor, Liu, Xuechen, Sahidullah, Md, Cumani, Sandro, Zhang, Boning, Okabe, Koji, Yamamoto, Hitoshi, Tao, Ruijie, Li, Haizhou, Giménez, Alfonso Ortega, Wang, Longbiao, Buera, Luis

This manuscript describes the I4U submission to the 2020 NIST Speaker Recognition Evaluation (SRE'20) Conversational Telephone Speech (CTS) Challenge. The I4U's submission was resulted from active collaboration among researchers across eight research teams - I$^2$R (Singapore), UEF (Finland), VALPT (Italy, Spain), NEC (Japan), THUEE (China), LIA (France), NUS (Singapore), INRIA (France) and TJU (China). The submission was based on the fusion of top performing sub-systems and sub-fusion systems contributed by individual teams. Efforts have been spent on the use of common development and validation sets, submission schedule and milestone, minimizing inconsistency in trial list and score file format across sites.

artificial intelligence, deep learning, machine learning, (20 more...)

2211.01091

Country:

Europe > France (0.45)
Asia > China (0.45)
Europe > Spain (0.25)
(8 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Speech (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Silnova, Anna, Brümmer, Niko, Swart, Albert, Burget, Lukáš

Toroidal Probabilistic Spherical Discriminant Analysis

arXiv.org Machine LearningOct-27-2022

In speaker recognition, where speech segments are mapped to embeddings on the unit hypersphere, two scoring back-ends are commonly used, namely cosine scoring and PLDA. We have recently proposed PSDA, an analog to PLDA that uses Von Mises-Fisher distributions instead of Gaussians. In this paper, we present toroidal PSDA (T-PSDA). It extends PSDA with the ability to model within and between-speaker variabilities in toroidal submanifolds of the hypersphere. Like PLDA and PSDA, the model allows closed-form scoring and closed-form EM updates for training. On VoxCeleb, we find T-PSDA accuracy on par with cosine scoring, while PLDA accuracy is inferior. On NIST SRE'21 we find that T-PSDA gives large accuracy gains compared to both cosine scoring and PLDA.

artificial intelligence, cosine, machine learning, (13 more...)

2210.15441

Country:

Europe > Czechia > South Moravian Region > Brno (0.05)
Europe > Sweden > Stockholm > Stockholm (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(4 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

arXiv.org Machine LearningMar-28-2022

Probabilistic Spherical Discriminant Analysis: An Alternative to PLDA for length-normalized embeddings

Brümmer, Niko, Swart, Albert, Mošner, Ladislav, Silnova, Anna, Plchot, Oldřich, Stafylakis, Themos, Burget, Lukáš

In speaker recognition, where speech segments are mapped to embeddings on the unit hypersphere, two scoring backends are commonly used, namely cosine scoring or PLDA. Both have advantages and disadvantages, depending on the context. Cosine scoring follows naturally from the spherical geometry, but for PLDA the blessing is mixed -- length normalization Gaussianizes the between-speaker distribution, but violates the assumption of a speaker-independent within-speaker distribution. We propose PSDA, an analogue to PLDA that uses Von Mises-Fisher distributions on the hypersphere for both within and between-class distributions. We show how the self-conjugacy of this distribution gives closed-form likelihood-ratio scores, making it a drop-in replacement for PLDA at scoring time. All kinds of trials can be scored, including single-enroll and multi-enroll verification, as well as more complex likelihood-ratios that could be used in clustering and diarization. Learning is done via an EM-algorithm with closed-form updates. We explain the model and present some first experiments.

artificial intelligence, machine learning, plda, (14 more...)

2203.14893

Country:

Europe > Czechia > South Moravian Region > Brno (0.05)
Europe > Italy > Tuscany > Florence (0.04)
Europe > Greece > Attica > Athens (0.04)
(3 more...)

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Sholokhov, Alexey, Kinnunen, Tomi, Vestman, Ville, Lee, Kong Aik

Extrapolating false alarm rates in automatic speaker verification

arXiv.org Machine LearningAug-8-2020

Automatic speaker verification (ASV) vendors and corpus In this study we improve upon the generative model presented providers would both benefit from tools to reliably extrapolate in [3]. Despite demonstrating expected overall trends, performance metrics for large speaker populations without collecting the predicted false alarm rates were substantially overestimated, new speakers. We address false alarm rate extrapolation particularly at high ASV thresholds (proxies of high-security under a worst-case model whereby an adversary identifies the applications). To tackle this shortcoming, we propose a discriminative closest impostor for a given target speaker from a large population.

artificial intelligence, impostor, machine learning, (20 more...)

2008.0359

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States (0.04)
Europe > Finland > North Karelia > Joensuu (0.04)
(4 more...)

Genre: Research Report > New Finding (0.49)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Acoustic Processing (0.86)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.72)

Silnova, Anna, Brümmer, Niko, Rohdin, Johan, Stafylakis, Themos, Burget, Lukáš

Probabilistic embeddings for speaker diarization

arXiv.org Machine LearningApr-9-2020

Speaker embeddings (x-vectors) extracted from very short segments of speech have recently been shown to give competitive performance in speaker diarization. We generalize this recipe by extracting from each speech segment, in parallel with the x-vector, also a diagonal precision matrix, thus providing a path for the propagation of information about the quality of the speech segment into a PLDA scoring backend. These precisions quantify the uncertainty about what the values of the embeddings might have been if they had been extracted from high quality speech segments. The proposed probabilistic embeddings (x-vectors with precisions) are interfaced with the PLDA model by treating the x-vectors as hidden variables and marginalizing them out. We apply the proposed probabilistic embeddings as input to an agglomerative hierarchical clustering (AHC) algorithm to do diarization in the DIHARD'19 evaluation set. We compute the full PLDA likelihood 'by the book' for each clustering hypothesis that is considered by AHC. We do joint discriminative training of the PLDA parameters and of the probabilistic x-vector extractor. We demonstrate accuracy gains relative to a baseline AHC algorithm, applied to traditional xvectors (without uncertainty), and which uses averaging of binary log-likelihood-ratios, rather than by-the-book scoring.

diarization, extractor, speech segment, (16 more...)

2004.04096

Country:

Europe > Czechia > South Moravian Region > Brno (0.04)
Europe > Greece > Attica > Athens (0.04)
Europe > Austria > Styria > Graz (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Ferrer, Luciana, McLaren, Mitchell

A Speaker Verification Backend for Improved Calibration Performance across Varying Conditions

arXiv.org Machine LearningFeb-5-2020

In a recent work, we presented a discriminative backend for speaker verification that achieved good out-of-the-box calibration performance on most tested conditions containing varying levels of mismatch to the training conditions. This backend mimics the standard PLDA-based backend process used in most current speaker verification systems, including the calibration stage. All parameters of the backend are jointly trained to optimize the binary cross-entropy for the speaker verification task. Calibration robustness is achieved by making the parameters of the calibration stage a function of vectors representing the conditions of the signal, which are extracted using a model trained to predict condition labels. In this work, we propose a simplified version of this backend where the vectors used to compute the calibration parameters are estimated within the backend, without the need for a condition prediction model. We show that this simplified method provides similar performance to the previously proposed method while being simpler to implement, and having less requirements on the training data. Further, we provide an analysis of different aspects of the method including the effect of initialization, the nature of the vectors used to compute the calibration parameters, and the effect that the random seed and the number of training epochs has on performance. We also compare the proposed method with the trial-based calibration (TBC) method that, to our knowledge, was the state-of-the-art for achieving good calibration across varying conditions. We show that the proposed method outperforms TBC while also being several orders of magnitude faster to run, comparable to the standard PLDA baseline.

backend, calibration, vector, (16 more...)

2002.03802

Country:

Europe > Portugal > Lisbon > Lisbon (0.04)
South America > Argentina (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
(9 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Speech > Acoustic Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)