AITopics | Shchekotov, Ivan

Collaborating Authors

Shchekotov, Ivan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FINALLY: fast and universal speech enhancement with studio-like quality

Babaev, Nicholas, Tamogashev, Kirill, Saginbaev, Azat, Shchekotov, Ivan, Bae, Hanbin, Sung, Hosang, Lee, WonJun, Cho, Hoon-Young, Andreev, Pavel

arXiv.org Artificial IntelligenceOct-31-2024

In this paper, we address the challenge of speech enhancement in real-world recordings, which often contain various forms of distortion, such as background noise, reverberation, and microphone artefacts. We revisit the use of Generative Adversarial Networks (GANs) for speech enhancement and theoretically show that GANs are naturally inclined to seek the point of maximum density within the conditional clean speech distribution, which, as we argue, is essential for the speech enhancement task. We study various feature extractors for perceptual loss to facilitate the stability of adversarial training, developing a methodology for probing the structure of the feature space. This leads us to integrate WavLM-based perceptual loss into MS-STFT adversarial training pipeline, creating an effective and stable training procedure for the speech enhancement model. The resulting speech enhancement model, which we refer to as FINALLY, builds upon the HiFi++ architecture, augmented with a WavLM encoder and a novel training pipeline. Empirical results on various datasets confirm our model's ability to produce clear, high-quality speech at 48 kHz, achieving state-of-the-art performance in the field of speech enhancement.

artificial intelligence, machine learning, speech enhancement, (16 more...)

arXiv.org Artificial Intelligence

2410.0592

Country:

Asia (0.14)
North America > United States (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Regularized Distribution Matching Distillation for One-step Unpaired Image-to-Image Translation

Rakitin, Denis, Shchekotov, Ivan, Vetrov, Dmitry

arXiv.org Artificial IntelligenceJun-20-2024

Diffusion distillation methods aim to compress the diffusion models into efficient one-step generators while trying to preserve quality. Among them, Distribution Matching Distillation (DMD) offers a suitable framework for training general-form one-step generators, applicable beyond unconditional generation. In this work, we introduce its modification, called Regularized Distribution Matching Distillation, applicable to unpaired image-to-image (I2I) problems. We demonstrate its empirical performance in application to several translation tasks, including 2D examples and I2I between different image datasets, where it performs on par or better than multi-step diffusion baselines.

artificial intelligence, machine learning, regularized distribution matching distillation, (11 more...)

arXiv.org Artificial Intelligence

2406.14762

Country:

Europe > Russia (0.28)
Europe > Austria > Vienna (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Iterative autoregression: a novel trick to improve your low-latency speech enhancement model

Andreev, Pavel, Babaev, Nicholas, Saginbaev, Azat, Shchekotov, Ivan, Alanov, Aibek

arXiv.org Artificial IntelligenceDec-5-2023

Streaming models are an essential component of real-time speech enhancement tools. The streaming regime constrains speech enhancement models to use only a tiny context of future information. As a result, the low-latency streaming setup is generally considered a challenging task and has a significant negative impact on the model's quality. However, the sequential nature of streaming generation offers a natural possibility for autoregression, that is, utilizing previous predictions while making current ones. The conventional method for training autoregressive models is teacher forcing, but its primary drawback lies in the training-inference mismatch that can lead to a substantial degradation in quality. In this study, we propose a straightforward yet effective alternative technique for training autoregressive low-latency speech enhancement models. We demonstrate that the proposed approach leads to stable improvement across diverse architectures and training scenarios.

artificial intelligence, autoregression, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2211.01751

Country: Asia (0.14)

Genre: Research Report (0.70)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion Model

Iashchenko, Anastasiia, Andreev, Pavel, Shchekotov, Ivan, Babaev, Nicholas, Vetrov, Dmitry

arXiv.org Artificial IntelligenceOct-12-2023

This paper introduces UnDiff, a diffusion probabilistic model capable of solving various speech inverse tasks. Being once trained for speech waveform generation in an unconditional manner, it can be adapted to different tasks including degradation inversion, neural vocoding, and source separation. In this paper, we, first, tackle the challenging problem of unconditional waveform generation by comparing different neural architectures and preconditioning domains. After that, we demonstrate how the trained unconditional diffusion could be adapted to different tasks of speech processing by the means of recent developments in post-training conditioning of diffusion models. Finally, we demonstrate the performance of the proposed technique on the tasks of bandwidth extension, declipping, vocoding, and speech source separation and compare it to the baselines. The codes are publicly available.

artificial intelligence, diffusion model, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2306.00721

Country: Europe > Russia (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)

Add feedback