AITopics | Dixon, Simon

Collaborating Authors

Dixon, Simon

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation

Stoller, Daniel, Ewert, Sebastian, Dixon, Simon

arXiv.org Machine LearningJun-8-2018

Models for audio source separation usually operate on the magnitude spectrum, which ignores phase information and makes separation performance dependant on hyper-parameters for the spectral front-end. Therefore, we investigate end-to-end source separation in the time-domain, which allows modelling phase information and avoids fixed spectral transformations. Due to high sampling rates for audio, employing a long temporal input context on the sample level is difficult, but required for high quality separation results because of long-range temporal correlations. In this context, we propose the Wave-U-Net, an adaptation of the U-Net to the one-dimensional time domain, which repeatedly resamples feature maps to compute and combine features at different time scales. We introduce further architectural improvements, including an output layer that enforces source additivity, an upsampling technique and a context-aware prediction framework to reduce output artifacts. Experiments for singing voice separation indicate that our architecture yields a performance comparable to a state-of-the-art spectrogram-based U-Net architecture, given the same data. Finally, we reveal a problem with outliers in the currently used SDR evaluation metrics and suggest reporting rank-based statistics to alleviate this problem.

deep learning, neural network, separation, (19 more...)

arXiv.org Machine Learning

1806.03185

Country:

North America > Canada (0.14)
Europe > France (0.14)

Genre: Research Report (0.64)

Industry:

Media > Music (0.46)
Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Note Value Recognition for Piano Transcription Using Markov Random Fields

Nakamura, Eita, Yoshii, Kazuyoshi, Dixon, Simon

arXiv.org Artificial IntelligenceJul-7-2017

This paper presents a statistical method for use in music transcription that can estimate score times of note onsets and offsets from polyphonic MIDI performance signals. Because performed note durations can deviate largely from score-indicated values, previous methods had the problem of not being able to accurately estimate offset score times (or note values) and thus could only output incomplete musical scores. Based on observations that the pitch context and onset score times are influential on the configuration of note values, we construct a context-tree model that provides prior distributions of note values using these features and combine it with a performance model in the framework of Markov random fields. Evaluation results show that our method reduces the average error rate by around 40 percent compared to existing/simple methods. We also confirmed that, in our model, the score model plays a more important role than the performance model, and it automatically captures the voice structure by unsupervised learning.

artificial intelligence, machine learning, note value, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TASLP.2017.2722103

1703.08144

Country: Asia > Japan > Honshū (0.14)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)

Add feedback

An End-to-End Neural Network for Polyphonic Piano Music Transcription

Sigtia, Siddharth, Benetos, Emmanouil, Dixon, Simon

arXiv.org Machine LearningFeb-11-2016

We present a supervised neural network model for polyphonic piano music transcription. The architecture of the proposed model is analogous to speech recognition systems and comprises an acoustic model and a music language model. The acoustic model is a neural network used for estimating the probabilities of pitches in a frame of audio. The language model is a recurrent neural network that models the correlations between pitch combinations over time. The proposed model is general and can be used to transcribe polyphonic music without imposing any constraints on the polyphony. The acoustic and language model predictions are combined using a probabilistic graphical model. Inference over the output variables is performed using the beam search algorithm. We perform two sets of experiments. We investigate various neural network architectures for the acoustic models and also investigate the effect of combining acoustic and music language model predictions using the proposed architecture. We compare performance of the neural network based acoustic models with two popular unsupervised acoustic models. Results show that convolutional neural network acoustic models yields the best performance across all evaluation metrics. We also observe improved performance with the application of the music language models. Finally, we present an efficient variant of beam search that improves performance and reduces run-times by an order of magnitude, making the model suitable for real-time applications.

acoustic model, deep learning, neural network, (19 more...)

arXiv.org Machine Learning

1508.01774

Country: Oceania > Australia (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

Identifying Cover Songs Using Information-Theoretic Measures of Similarity

Foster, Peter, Dixon, Simon, Klapuri, Anssi

arXiv.org Machine LearningMay-17-2015

This paper investigates methods for quantifying similarity between audio signals, specifically for the task of of cover song detection. We consider an information-theoretic approach, where we compute pairwise measures of predictability between time series. We compare discrete-valued approaches operating on quantised audio features, to continuous-valued approaches. In the discrete case, we propose a method for computing the normalised compression distance, where we account for correlation between time series. In the continuous case, we propose to compute information-based measures of similarity as statistics of the prediction error between time series. We evaluate our methods on two cover song identification tasks using a data set comprised of 300 Jazz standards and using the Million Song Dataset. For both datasets, we observe that continuous-valued approaches outperform discrete-valued approaches. We consider approaches to estimating the normalised compression distance (NCD) based on string compression and prediction, where we observe that our proposed normalised compression distance with alignment (NCDA) improves average performance over NCD, for sequential compression algorithms. Finally, we demonstrate that continuous-valued distances may be combined to improve performance with respect to baseline approaches. Using a large-scale filter-and-refine approach, we demonstrate state-of-the-art performance for cover song identification using the Million Song Dataset.

artificial intelligence, machine learning, sequence, (17 more...)

arXiv.org Machine Learning

doi: 10.1109/TASLP.2015.2416655

1407.2433

Country: Europe > Finland (0.14)

Genre: Research Report (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

In Search of the Horowitz Factor

Widmer, Gerhard, Dixon, Simon, Goebl, Werner, Pampalk, Elias, Tobudic, Asmir

AI MagazineSep-15-2003

The article introduces the reader to a large interdisciplinary research project whose goal is to use AI to gain new insight into a complex artistic phenomenon. We study fundamental principles of expressive music performance by measuring performance aspects in large numbers of recordings by highly skilled musicians (concert pianists) and analyzing the data with state-of-the-art methods from areas such as machine learning, data mining, and data visualization. The article first introduces the general research questions that guide the project and then summarizes some of the most important results achieved to date, with an emphasis on the most recent and still rather speculative work. Our current results show that it is possible for machines to make novel and interesting discoveries even in a domain such as music and that even if we might never find the "Horowitz Factor," AI can give us completely new insights into complex artistic behavior.

artificial intelligence, Horowitz Factor, machine learning, (3 more...)

AI Magazine

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.44)

Add feedback

In Search of the Horowitz Factor

Widmer, Gerhard, Dixon, Simon, Goebl, Werner, Pampalk, Elias, Tobudic, Asmir

AI MagazineSep-15-2003

The article introduces the reader to a large interdisciplinary research project whose goal is to use AI to gain new insight into a complex artistic phenomenon. We study fundamental principles of expressive music performance by measuring performance aspects in large numbers of recordings by highly skilled musicians (concert pianists) and analyzing the data with state-of-the-art methods from areas such as machine learning, data mining, and data visualization. The article first introduces the general research questions that guide the project and then summarizes some of the most important results achieved to date, with an emphasis on the most recent and still rather speculative work. A broad view of the discovery process is given, from data acquisition through data visualization to inductive model building and pattern discovery, and it turns out that AI plays an important role in all stages of such an ambitious enterprise. Our current results show that it is possible for machines to make novel and interesting discoveries even in a domain such as music and that even if we might never find the "Horowitz Factor," AI can give us completely new insights into complex artistic behavior.

artificial intelligence, inductive learning, pianist, (19 more...)

AI Magazine

Country:

Europe (1.00)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.86)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.93)

Add feedback