AITopics | hop size

ABSTRACT Sampling, the technique of reusing pieces of existing audio tracks to create new music content, is a very common practice in modern music production. In this paper, we tackle the challenging task of automatic sample identification, that is, detecting such sampled content and retrieving the material from which it originates. To do so, we adopt a self-supervised learning approach that leverages a multi-track dataset to create positive pairs of artificial mixes, and design a novel contrastive learning objective. We show that such method significantly outperforms previous state-of-the-art baselines, that is robust to various genres, and that scales well when increasing the number of noise songs in the reference database. In addition, we extensively analyze the contribution of the different components of our training pipeline and highlight, in particular, the need for high-quality separated stems for this task.

artificial intelligence, dataset, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2510.11507

Genre: Research Report (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Using Convolutional Neural Networks to Recognize Rhythm Stimuli from Electroencephalography Recordings

Sebastian Stober, Daniel J. Cameron, Jessica A. Grahn

Neural Information Processing SystemsOct-2-2025, 21:42:24 GMT

W e investigate the impact of the data representation and the pre-processing steps for this classification tasks and compare different network structures.

accuracy, convolutional layer, stimuli, (14 more...)

Neural Information Processing Systems

Country:

Africa > Rwanda > Kigali > Kigali (0.04)
Africa > East Africa (0.04)
North America > Canada > Ontario > Middlesex County > London (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Using Convolutional Neural Networks to Recognize Rhythm Stimuli from Electroencephalography Recordings

Sebastian Stober, Daniel J. Cameron, Jessica A. Grahn

Neural Information Processing SystemsFeb-9-2025, 14:14:45 GMT

Electroencephalography (EEG) recordings of rhythm perception might contain enough information to distinguish different rhythm types/genres or even identify the rhythms themselves. We apply convolutional neural networks (CNNs) to analyze and classify EEG data recorded within a rhythm perception study in Kigali, Rwanda which comprises 12 East African and 12 Western rhythmic stimuli - each presented in a loop for 32 seconds to 13 participants. We investigate the impact of the data representation and the pre-processing steps for this classification tasks and compare different network structures. Using CNNs, we are able to recognize individual rhythms from the EEG with a mean classification accuracy of 24.4% (chance level 4.17%) over all subjects by looking at less than three seconds from a single channel. Aggregating predictions for multiple channels, a mean accuracy of up to 50% can be achieved for individual subjects.

accuracy, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Africa > Rwanda > Kigali > Kigali (0.24)
Africa > East Africa (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.69)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Pitch-Aware RNN-T for Mandarin Chinese Mispronunciation Detection and Diagnosis

Wang, Xintong, Shi, Mingqian, Wang, Ye

arXiv.org Artificial IntelligenceJun-6-2024

Subsequently, Zhang et al. [1] adopted Mispronunciation Detection and Diagnosis (MDD) systems, an autoregressive model, the Recurrent Neural Network Transducer leveraging Automatic Speech Recognition (ASR), face two (RNN-T) [9], for MDD. This approach aims to capture main challenges in Mandarin Chinese: 1) The two-stage models the temporal dependence of mispronunciation patterns, showing create an information gap between the phoneme or tone classification better performance than Connectionist Temporal Classification stage and the MDD stage.

diagnosis, pitch encoder, pitch fusion block, (11 more...)

arXiv.org Artificial Intelligence

2406.04595

Country: Asia > Singapore > Central Region > Singapore (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Using Convolutional Neural Networks to Recognize Rhythm Stimuli from Electroencephalography Recordings

Neural Information Processing SystemsMar-13-2024, 11:01:58 GMT

Electroencephalography (EEG) recordings of rhythm perception might contain enough information to distinguish different rhythm types/genres or even identify the rhythms themselves. We apply convolutional neural networks (CNNs) to analyze and classify EEG data recorded within a rhythm perception study in Kigali, Rwanda which comprises 12 East African and 12 Western rhythmic stimuli - each presented in a loop for 32 seconds to 13 participants. We investigate the impact of the data representation and the pre-processing steps for this classification tasks and compare different network structures. Using CNNs, we are able to recognize individual rhythms from the EEG with a mean classification accuracy of 24.4% (chance level 4.17%) over all subjects by looking at less than three seconds from a single channel. Aggregating predictions for multiple channels, a mean accuracy of up to 50% can be achieved for individual subjects.

accuracy, convolutional layer, stimuli, (15 more...)

Neural Information Processing Systems

Country:

Africa > Rwanda > Kigali > Kigali (0.24)
Africa > East Africa (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.69)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning the Spectrogram Temporal Resolution for Audio Classification

#artificialintelligenceOct-6-2022, 14:43:35 GMT

The audio spectrogram is a time-frequency representation that has been widely used for audio classification. The temporal resolution of a spectrogram depends on hop size. Previous works generally assume the hop size should be a constant value such as ten milliseconds. However, a fixed hop size or resolution is not always optimal for different types of sound. This paper proposes a novel method, DiffRes, that enables differentiable temporal resolution learning to improve the performance of audio classification models.

audio classification, hop size, spectrogram temporal resolution, (2 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.87)

Add feedback

Learning the Spectrogram Temporal Resolution for Audio Classification

Liu, Haohe, Liu, Xubo, Kong, Qiuqiang, Wang, Wenwu, Plumbley, Mark D.

arXiv.org Artificial IntelligenceOct-5-2022

The audio spectrogram is a time-frequency representation that has been widely used for audio classification. The temporal resolution of a spectrogram depends on hop size. Previous works generally assume the hop size should be a constant value such as ten milliseconds. However, a fixed hop size or resolution is not always optimal for different types of sound. This paper proposes a novel method, DiffRes, that enables differentiable temporal resolution learning to improve the performance of audio classification models. Given a spectrogram calculated with a fixed hop size, DiffRes merges non-essential time frames while preserving important frames. DiffRes acts as a "drop-in" module between an audio spectrogram and a classifier, and can be end-to-end optimized. We evaluate DiffRes on the mel-spectrogram, followed by state-of-the-art classifier backbones, and apply it to five different subtasks. Compared with using the fixed-resolution mel-spectrogram, the DiffRes-based method can achieve the same or better classification accuracy with at least 25% fewer temporal dimensions on the feature level, which alleviates the computational cost at the same time. Starting from a high-temporal-resolution spectrogram such as one-millisecond hop size, we show that DiffRes can improve classification accuracy with the same computational complexity.

artificial intelligence, diffre, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2210.01719

Country:

Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (1.00)

Industry: Law > Environmental Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.87)

Add feedback

Graph2Seq: Graph to Sequence Learning with Attention-based Neural Networks

Xu, Kun, Wu, Lingfei, Wang, Zhiguo, Sheinin, Vadim

arXiv.org Machine LearningApr-3-2018

Celebrated \emph{Sequence to Sequence learning (Seq2Seq)} and its fruitful variants are powerful models to achieve excellent performance on the tasks that map sequences to sequences. However, these are many machine learning tasks with inputs naturally represented in a form of graphs, which imposes significant challenges to existing Seq2Seq models for lossless conversion from its graph form to the sequence. In this work, we present a general end-to-end approach to map the input graph to a sequence of vectors, and then another attention-based LSTM to decode the target sequence from these vectors. Specifically, to address inevitable information loss for data conversion, we introduce a novel graph-to-sequence neural network model that follows the encoder-decoder architecture. Our method first uses an improved graph-based neural network to generate the node and graph embeddings by a novel aggregation strategy to incorporate the edge direction information into the node embeddings. We also propose an attention based mechanism that aligns node embeddings and decoding sequence to better cope with large graphs. Experimental results on bAbI task, Shortest Path Task, and Natural Language Generation Task demonstrate that our model achieves the state-of-the-art performance and significantly outperforms other baselines. We also show that with the proposed aggregation strategy, our proposed model is able to quickly converge to good performance.

artificial intelligence, graph, machine learning, (18 more...)

arXiv.org Machine Learning

1804.00823

Country:

North America > United States (0.46)
North America > Canada > Quebec (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Using Convolutional Neural Networks to Recognize Rhythm Stimuli from Electroencephalography Recordings

Stober, Sebastian, Cameron, Daniel J., Grahn, Jessica A.

Neural Information Processing SystemsDec-31-2014

Electroencephalography (EEG) recordings of rhythm perception might contain enough information to distinguish different rhythm types/genres or even identify the rhythms themselves. We apply convolutional neural networks (CNNs) to analyze and classify EEG data recorded within a rhythm perception study in Kigali, Rwanda which comprises 12 East African and 12 Western rhythmic stimuli - each presented in a loop for 32 seconds to 13 participants. We investigate the impact of the data representation and the pre-processing steps for this classification tasks and compare different network structures. Using CNNs, we are able to recognize individual rhythms from the EEG with a mean classification accuracy of 24.4% (chance level 4.17%) over all subjects by looking at less than three seconds from a single channel. Aggregating predictions for multiple channels, a mean accuracy of up to 50% can be achieved for individual subjects.

accuracy, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America (0.68)
Africa > Rwanda > Kigali > Kigali (0.24)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.69)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Collaborating Authors

hop size

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Automatic Music Sample Identification with Multi-Track Contrastive Learning

Using Convolutional Neural Networks to Recognize Rhythm Stimuli from Electroencephalography Recordings

Using Convolutional Neural Networks to Recognize Rhythm Stimuli from Electroencephalography Recordings

Pitch-Aware RNN-T for Mandarin Chinese Mispronunciation Detection and Diagnosis

Using Convolutional Neural Networks to Recognize Rhythm Stimuli from Electroencephalography Recordings

Learning the Spectrogram Temporal Resolution for Audio Classification

Learning the Spectrogram Temporal Resolution for Audio Classification

Graph2Seq: Graph to Sequence Learning with Attention-based Neural Networks

Using Convolutional Neural Networks to Recognize Rhythm Stimuli from Electroencephalography Recordings