AITopics | Alumäe, Tanel

Collaborating Authors

Alumäe, Tanel

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Optimizing Estonian TV Subtitles with Semi-supervised Learning and LLMs

Fedorchenko, Artem, Alumäe, Tanel

arXiv.org Artificial IntelligenceJan-9-2025

For instance, Both iterative pseudo-labeling and LLM-based recent studies (Mykhalevych and Preply, 2024; post-editing have been an active area of research Kim et al., 2023) have revealed that 50% of Americans in the context of verbatim automatic speech and 85% of the Netflix users overall frequently recognition (ASR). Pseudo-labeling based semisupervised watch TV and streaming video content learning in ASR has been studied since with subtitles. Studies show that subtitles can enhance at least (Zavaliagkos et al., 1998) and has been understanding and memory retention. A lot later investigated in several works, e.g. by Veselỳ of viewers choose to enjoy their content quietly et al. (2013); Xu et al. (2020).

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2501.05234

Country:

Europe > Estonia (0.14)
Europe > Faroe Islands (0.14)

Genre: Research Report > New Finding (0.94)

Industry:

Leisure & Entertainment (1.00)
Media > Television (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Finetuning End-to-End Models for Estonian Conversational Spoken Language Translation

Sildam, Tiia, Velve, Andra, Alumäe, Tanel

arXiv.org Artificial IntelligenceJul-4-2024

This paper investigates the finetuning of end-to-end models for bidirectional Estonian-English and Estonian-Russian conversational speech-to-text translation. Due to the limited availability of speech translation data for Estonian, we created additional training data by web scraping and synthesizing data from speech recognition datasets using machine translation. We evaluated three publicly available end-to-end models: Whisper, OWSM 3.1, and SeamlessM4T. Our results indicate that fine-tuning with synthetic data enhances translation accuracy by a large margin, with SeamlessM4T matching or surpassing cascaded speech translation systems that use state-of-the-art speech recognition and machine translation models.

artificial intelligence, natural language, translation, (17 more...)

arXiv.org Artificial Intelligence

2407.03809

Country:

Europe (1.00)
Asia > Middle East > Qatar (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(2 more...)

Genre: Research Report > New Finding (0.48)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Dialect Adaptation and Data Augmentation for Low-Resource ASR: TalTech Systems for the MADASR 2023 Challenge

Alumäe, Tanel, Kong, Jiaming, Robnikov, Daniil

arXiv.org Artificial IntelligenceOct-26-2023

This paper describes Tallinn University of Technology (TalTech) systems developed for the ASRU MADASR 2023 Challenge. The challenge focuses on automatic speech recognition of dialect-rich Indian languages with limited training audio and text data. TalTech participated in two tracks of the challenge: Track 1 that allowed using only the provided training data and Track 3 which allowed using additional audio data. In both tracks, we relied on wav2vec2.0 models. Our methodology diverges from the traditional procedure of finetuning pretrained wav2vec2.0 models in two key points: firstly, through the implementation of the aligned data augmentation technique to enhance the linguistic diversity of the training data, and secondly, via the application of deep prefix tuning for dialect adaptation of wav2vec2.0 models. In both tracks, our approach yielded significant improvements over the provided baselines, achieving the lowest word error rates across all participating teams.

artificial intelligence, dialect adaptation and data augmentation, machine learning, (5 more...)

arXiv.org Artificial Intelligence

2310.17448

Country: Europe > Estonia > Harju County > Tallinn (0.24)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.73)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.53)

Add feedback

Robust Training of Vector Quantized Bottleneck Models

Łańcucki, Adrian, Chorowski, Jan, Sanchez, Guillaume, Marxer, Ricard, Chen, Nanxin, Dolfing, Hans J. G. A., Khurana, Sameer, Alumäe, Tanel, Laurent, Antoine

arXiv.org Machine LearningMay-18-2020

In this paper we demonstrate methods for reliable and efficient training of discrete representation using Vector-Quantized Variational Auto-Encoder models (VQ-VAEs). Discrete latent variable models have been shown to learn nontrivial representations of speech, applicable to unsupervised voice conversion and reaching state-of-the-art performance on unit discovery tasks. For unsupervised representation learning, they became viable alternatives to continuous latent variable models such as the Variational Auto-Encoder (VAE). However, training deep discrete variable models is challenging, due to the inherent non-differentiability of the discretization operation. In this paper we focus on VQ-VAE, a state-of-the-art discrete bottleneck model shown to perform on par with its continuous counterparts. It quantizes encoder outputs with on-line $k$-means clustering. We show that the codebook learning can suffer from poor initialization and non-stationarity of clustered encoder outputs. We demonstrate that these can be successfully overcome by increasing the learning rate for the codebook and periodic date-dependent codeword re-initialization. As a result, we achieve more robust training across different tasks, and significantly increase the usage of latent codewords even for large codebooks. This has practical benefit, for instance, in unsupervised representation learning, where large codebooks may lead to disentanglement of latent representations.

codebook, deep learning, neural network, (19 more...)

arXiv.org Machine Learning

2005.0852

Country:

Europe (0.94)
North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback