AITopics

2403.0553

Country:

Europe (1.00)
Asia > Middle East (0.67)
North America > United States > New York (0.27)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (0.92)
Personal (0.92)
Research Report > Experimental Study (0.67)

Industry:

Leisure & Entertainment > Sports > Football (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)
(9 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceNov-2-2023

Description and Discussion on DCASE 2023 Challenge Task 2: First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring

Dohi, Kota, Imoto, Keisuke, Harada, Noboru, Niizumi, Daisuke, Koizumi, Yuma, Nishida, Tomoya, Purohit, Harsh, Tanabe, Ryo, Endo, Takashi, Kawaguchi, Yohei

We present the task description of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2023 Challenge Task 2: ``First-shot unsupervised anomalous sound detection (ASD) for machine condition monitoring''. The main goal is to enable rapid deployment of ASD systems for new kinds of machines without the need for hyperparameter tuning. In the past ASD tasks, developed methods tuned hyperparameters for each machine type, as the development and evaluation datasets had the same machine types. However, collecting normal and anomalous data as the development dataset can be infeasible in practice. In 2023 Task 2, we focus on solving the first-shot problem, which is the challenge of training a model on a completely novel machine type. Specifically, (i) each machine type has only one section (a subset of machine type) and (ii) machine types in the development and evaluation datasets are completely different. Analysis of 86 submissions from 23 teams revealed that the keys to outperform baselines were: 1) sampling techniques for dealing with class imbalances across different domains and attributes, 2) generation of synthetic samples for robust detection, and 3) use of multiple large pre-trained models to extract meaningful embeddings for the anomaly detector.

artificial intelligence, machine learning, machine type, (16 more...)

2305.07828

Country: Europe > Finland (0.15)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

arXiv.org Artificial IntelligenceAug-14-2023

Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations

Koizumi, Yuma, Zen, Heiga, Karita, Shigeki, Ding, Yifan, Yatabe, Kohei, Morioka, Nobuyuki, Zhang, Yu, Han, Wei, Bapna, Ankur, Bacchiani, Michiel

Speech restoration (SR) is a task of converting degraded speech signals into high-quality ones. In this study, we propose a robust SR model called Miipher, and apply Miipher to a new SR application: increasing the amount of high-quality training data for speech generation by converting speech samples collected from the Web to studio-quality. To make our SR model robust against various degradation, we use (i) a speech representation extracted from w2v-BERT for the input feature, and (ii) a text representation extracted from transcripts via PnG-BERT as a linguistic conditioning feature. Experiments show that Miipher (i) is robust against various audio degradation and (ii) enable us to train a high-quality text-to-speech (TTS) model from restored speech samples collected from the Web. Audio samples are available at our demo page: google.github.io/df-conformer/miipher/

artificial intelligence, dataset, machine learning, (17 more...)

2303.01664

Country: Asia > Japan > Honshū (0.14)

Genre: Research Report > New Finding (0.35)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)

arXiv.org Artificial IntelligenceOct-3-2022

WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration

Koizumi, Yuma, Yatabe, Kohei, Zen, Heiga, Bacchiani, Michiel

Denoising diffusion probabilistic models (DDPMs) and generative adversarial networks (GANs) are popular generative models for neural vocoders. The DDPMs and GANs can be characterized by the iterative denoising framework and adversarial training, respectively. This study proposes a fast and high-quality neural vocoder called \textit{WaveFit}, which integrates the essence of GANs into a DDPM-like iterative framework based on fixed-point iteration. WaveFit iteratively denoises an input signal, and trains a deep neural network (DNN) for minimizing an adversarial loss calculated from intermediate outputs at all iterations. Subjective (side-by-side) listening tests showed no statistically significant differences in naturalness between human natural speech and those synthesized by WaveFit with five iterations. Furthermore, the inference speed of WaveFit was more than 240 times faster than WaveRNN. Audio demos are available at \url{google.github.io/df-conformer/wavefit/}.

artificial intelligence, iteration, machine learning, (20 more...)

2210.01029

Country: Asia > Japan (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Machine LearningJun-8-2021

Description and Discussion on DCASE 2021 Challenge Task 2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring under Domain Shifted Conditions

Kawaguchi, Yohei, Imoto, Keisuke, Koizumi, Yuma, Harada, Noboru, Niizumi, Daisuke, Dohi, Kota, Tanabe, Ryo, Purohit, Harsh, Endo, Takashi

We present the task description and discussion on the results of the DCASE 2021 Challenge Task 2. Last year, we organized unsupervised anomalous sound detection (ASD) task; identifying whether the given sound is normal or anomalous without anomalous training data. In this year, we organize an advanced unsupervised ASD task under domain-shift conditions which focuses on the inevitable problem for the practical use of ASD systems. The main challenge of this task is to detect unknown anomalous sounds where the acoustic characteristics of the training and testing samples are different, i.e. domain-shifted. This problem is frequently occurs due to changes in seasons, manufactured products, and/or environmental noise. After the challenge submission deadline, we will add challenge results and analysis of the submissions.

artificial intelligence, detection and classification, neural network, (15 more...)

2106.04492

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

arXiv.org Machine LearningAug-8-2020

A Transformer-based Audio Captioning Model with Keyword Estimation

Koizumi, Yuma, Masumura, Ryo, Nishida, Kyosuke, Yasuda, Masahiro, Saito, Shoichiro

One of the problems with automated audio captioning (AAC) is the indeterminacy in word selection corresponding to the audio event/scene. Since one acoustic event/scene can be described with several words, it results in a combinatorial explosion of possible captions and difficulty in training. To solve this problem, we propose a Transformer-based audio-captioning model with keyword estimation called TRACKE. It simultaneously solves the word-selection indeterminacy problem with the main task of AAC while executing the sub-task of acoustic event detection/acoustic scene classification (i.e., keyword estimation). TRACKE estimates keywords, which comprise a word set corresponding to audio events/scenes in the input audio, and generates the caption while referring to the estimated keywords to reduce word-selection indeterminacy. Experimental results on a public AAC dataset indicate that TRACKE achieved state-of-the-art performance and successfully estimated both the caption and its keywords.

deep learning, keyword, neural network, (19 more...)

2007.00222

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningAug-8-2020

Description and Discussion on DCASE2020 Challenge Task2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring

Koizumi, Yuma, Kawaguchi, Yohei, Imoto, Keisuke, Nakamura, Toshiki, Nikaido, Yuki, Tanabe, Ryo, Purohit, Harsh, Suefusa, Kaori, Endo, Takashi, Yasuda, Masahiro, Harada, Noboru

In this paper, we present the task description and discuss the results of the DCASE 2020 Challenge Task 2: Unsupervised Detection of Anomalous Sounds for Machine Condition Monitoring. The goal of anomalous sound detection (ASD) is to identify whether the sound emitted from a target machine is normal or anomalous. The main challenge of this task is to detect unknown anomalous sounds under the condition that only normal sound samples have been provided as training data. We have designed this challenge as the first benchmark of ASD research, which includes a large-scale dataset, evaluation metrics, and a simple baseline system. We received 117 submissions from 40 teams, and several novel approaches have been developed as a result of this challenge. On the basis of the analysis of the evaluation results, we discuss two new approaches and their problems.

artificial intelligence, detection, neural network, (17 more...)

2006.05822

Country:

Asia > Japan (0.16)
North America (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.74)

arXiv.org Machine LearningJul-1-2020

The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning with Keywords and Sentence Length Estimation

Koizumi, Yuma, Takeuchi, Daiki, Ohishi, Yasunori, Harada, Noboru, Kashino, Kunio

This technical report describes the system participating to the Detection and Classification of Acoustic Scenes and Events (DCASE) 2020 Challenge, Task 6: automated audio captioning. Our submission focuses on solving two indeterminacy problems in automated audio captioning: word selection indeterminacy and sentence length indeterminacy. We simultaneously solve the main caption generation and sub indeterminacy problems by estimating keywords and sentence length through multi-task learning. We tested a simplified model of our submission using the development-testing dataset. Our model achieved 20.7 SPIDEr score where that of the baseline system was 5.4.

deep learning, keyword, neural network, (17 more...)

2007.00225

Genre: Research Report (0.40)

Industry: Information Technology > Services (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)

arXiv.org Machine LearningOct-10-2019

First Order Ambisonics Domain Spatial Augmentation for DNN-based Direction of Arrival Estimation

Mazzon, Luca, Koizumi, Yuma, Yasuda, Masahiro, Harada, Noboru

In this paper, we propose a novel data augmentation method for training neural networks for Direction of Arrival (DOA) estimation. This method focuses on expanding the representation of the DOA subspace of a dataset. Given some input data, it applies a transformation to it in order to change its DOA information and simulate new potentially unseen one. Such transformation, in general, is a combination of a rotation and a reflection. It is possible to apply such transformation due to a well-known property of First Order Ambisonics (FOA). The same transformation is applied also to the labels, in order to maintain consistency between input data and target labels. Three methods with different level of generality are proposed for applying this augmentation principle. Experiments are conducted on two different DOA networks. Results of both experiments demonstrate the effectiveness of the novel augmentation strategy by improving the DOA error by around 40%.

augmentation, deep learning, neural network, (17 more...)

1910.04388

Country: Asia > Japan (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

arXiv.org Machine LearningOct-10-2019

DOA Estimation by DNN-based Denoising and Dereverberation from Sound Intensity Vector

Yasuda, Masahiro, Koizumi, Yuma, Mazzon, Luca, Saito, Shoichiro, Uematsu, Hisashi

DOA ESTIMA TION BY DNN-BASED DENOISING AND DEREVERBERA TION FROM SOUND INTENSITY VECTOR Masahiro Y asuda 1, Y uma Koizumi 1, Luca Mazzon 2, Shoichiro Saito 1 and Hisashi Uematsu 1 1 NTT Media Intelligence Laboratories, Tokyo, Japan 2 University of Padova, Padua, Italy ABSTRACT We propose a direction of arrival (DOA) estimation method that combines sound-intensity vector (IV)-based DOA estimation and DNN-based denoising and dereverberation. Since the accuracy of IV -based DOA estimation degrades due to environmental noise and reverberation, two DNNs are used to remove such effects from the observed IVs. DOA is then estimated from the refined IVs based on the physics of wave propagation. Experiments on an open dataset showed that the average DOA error of the proposed method was 0.528 degrees, and it outperformed a conventional IV -based and DNN-based DOA estimation method. Index T erms-- direction of arrival, deep neural network, sound intensity vector, sound activity detection 1. INTRODUCTION Time series direction-of-arrival (DOA) estimation, which is the task of identifying the relative position of the sound sources with respect to the microphone at every time frame, is an important technology for understanding the surrounding environment from sound recordings.

deep learning, estimation, neural network, (13 more...)

1910.04415

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.24)

Genre: Research Report (0.40)

Industry: Information Technology (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)