AITopics | van Leeuwen, David A.

Plotting

van Leeuwen, David A.

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Effect of Batch Size on Contrastive Self-Supervised Speech Representation Learning

Vaessen, Nik, van Leeuwen, David A.

arXiv.org Artificial IntelligenceFeb-21-2024

Foundation models in speech are often trained using many GPUs, which implicitly leads to large effective batch sizes. In this paper we study the effect of batch size on pre-training, both in terms of statistics that can be monitored during training, and in the effect on the performance of a downstream fine-tuning task. By using batch sizes varying from 87.5 seconds to 80 minutes of speech we show that, for a fixed amount of iterations, larger batch sizes result in better pre-trained models. However, there is lower limit for stability, and an upper limit for effectiveness. We then show that the quality of the pre-trained model depends mainly on the amount of speech data seen during training, i.e., on the product of batch size and number of iterations. All results are produced with an independent implementation of the wav2vec 2.0 architecture, which to a large extent reproduces the results of the original work (arXiv:2006.11477). Our extensions can help researchers choose effective operating conditions when studying self-supervised learning in speech, and hints towards benchmarking self-supervision with a fixed amount of seen data. Code and model checkpoints are available at https://github.com/nikvaessen/w2v2-batch-size.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2402.13723

Country:

Europe (0.28)
Oceania > Australia > Queensland (0.14)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Training speaker recognition systems with limited data

Vaessen, Nik, van Leeuwen, David A.

arXiv.org Artificial IntelligenceFeb-27-2023

This work considers training neural networks for speaker recognition with a much smaller dataset size compared to contemporary work. We artificially restrict the amount of data by proposing three subsets of the popular VoxCeleb2 dataset. These subsets are restricted to 50\,k audio files (versus over 1\,M files available), and vary on the axis of number of speakers and session variability. We train three speaker recognition systems on these subsets; the X-vector, ECAPA-TDNN, and wav2vec2 network architectures. We show that the self-supervised, pre-trained weights of wav2vec2 substantially improve performance when training data is limited. Code and data subsets are available at https://github.com/nikvaessen/w2v2-speaker-few-samples.

artificial intelligence, machine learning, pattern recognition, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.21437/Interspeech.2022-135

2203.14688

Country: North America > United States (0.28)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Speech Recognition (0.84)

Add feedback

Speaker and Language Change Detection using Wav2vec2 and Whisper

Berns, Tijn, Vaessen, Nik, van Leeuwen, David A.

arXiv.org Artificial IntelligenceFeb-18-2023

A penalty was needed to compensate for the difference in the number of parameters, but tuning the weight of this penalty was We investigate recent transformer networks pre-trained for automatic considered a weakness, that [3] cleverly circumvented by fixing speech recognition for their ability to detect speaker the number of model parameters when going from a single and language changes in speech. We do this by simply to two models. In the neural era, [4] applied an LSTM for the adding speaker (change) or language targets to the labels. For sole task of SCD, labelling individual frames with a speaker Wav2vec2 pre-trained networks, we also investigate if the representation change boolean, after convolving the single speaker change labels for the speaker change symbol can be conditioned to with a unit block function to account for class imbalance.

artificial intelligence, machine learning, speaker change, (17 more...)

arXiv.org Artificial Intelligence

2302.09381

Country: Europe (0.46)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Calibration of Phone Likelihoods in Automatic Speech Recognition

van Leeuwen, David A., van Doremalen, Joost

arXiv.org Machine LearningJun-14-2016

In this paper we study the probabilistic properties of the posteriors in a speech recognition system that uses a deep neural network (DNN) for acoustic modeling. We do this by reducing Kaldi's DNN shared pdf-id posteriors to phone likelihoods, and using test set forced alignments to evaluate these using a calibration sensitive metric. Individual frame posteriors are in principle well-calibrated, because the DNN is trained using cross entropy as the objective function, which is a proper scoring rule. When entire phones are assessed, we observe that it is best to average the log likelihoods over the duration of the phone. Further scaling of the average log likelihoods by the logarithm of the duration slightly improves the calibration, and this improvement is retained when tested on independent test data.

artificial intelligence, likelihood, speech recognition, (15 more...)

arXiv.org Machine Learning

1606.04317

Country: North America > Mexico > Puebla (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The "Sprekend Nederland" project and its application to accent location

van Leeuwen, David A., Orr, Rosemary

arXiv.org Machine LearningApr-8-2016

This paper describes the data collection effort that is part of the project Sprekend Nederland (The Netherlands Talking), and discusses its potential use in Automatic Accent Location. We define Automatic Accent Location as the task to describe the accent of a speaker in terms of the location of the speaker and its history. We discuss possible ways of describing accent location, the consequence these have for the task of automatic accent location, and potential evaluation metrics.

artificial intelligence, educational setting, participant, (17 more...)

arXiv.org Machine Learning

1602.02499

Country: Europe > Netherlands (0.67)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.69)
Information Technology > Communications > Mobile (0.68)
Information Technology > Artificial Intelligence > Speech (0.46)

Add feedback

Constrained speaker linking

van Leeuwen, David A., Brümmer, Niko

arXiv.org Machine LearningApr-2-2014

In this paper we study speaker linking (a.k.a.\ partitioning) given constraints of the distribution of speaker identities over speech recordings. Specifically, we show that the intractable partitioning problem becomes tractable when the constraints pre-partition the data in smaller cliques with non-overlapping speakers. The surprisingly common case where speakers in telephone conversations are known, but the assignment of channels to identities is unspecified, is treated in a Bayesian way. We show that for the Dutch CGN database, where this channel assignment task is at hand, a lightweight speaker recognition system can quite effectively solve the channel assignment problem, with 93% of the cliques solved. We further show that the posterior distribution over channel assignment configurations is well calibrated.

artificial intelligence, bayesian inference, configuration, (18 more...)

arXiv.org Machine Learning

1403.7084

Country: Europe > Netherlands (0.29)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

Add feedback