AITopics | Slaney, Malcolm

Collaborating Authors

Slaney, Malcolm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Disentangling speech from surroundings with neural embeddings

Omran, Ahmed, Zeghidour, Neil, Borsos, Zalán, Quitry, Félix de Chaumont, Slaney, Malcolm, Tagliasacchi, Marco

arXiv.org Artificial IntelligenceJun-4-2023

We present a method to separate speech signals from noisy environments in the embedding space of a neural audio codec. We introduce a new training procedure that allows our model to produce structured encodings of audio waveforms given by embedding vectors, where one part of the embedding vector represents the speech signal, and the rest represent the environment. We achieve this by partitioning the embeddings of different input waveforms and training the model to faithfully reconstruct audio from mixed partitions, thereby ensuring each partition encodes a separate audio attribute. As use cases, we demonstrate the separation of speech from background noise or from reverberation characteristics. Our method also allows for targeted adjustments of the audio output characteristics.

artificial intelligence, machine learning, speech, (16 more...)

arXiv.org Artificial Intelligence

2203.15578

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech (0.88)

Add feedback

Neural Architecture Search for Energy Efficient Always-on Audio Models

Speckhard, Daniel T., Misiunas, Karolis, Perel, Sagi, Zhu, Tenghui, Carlile, Simon, Slaney, Malcolm

arXiv.org Artificial IntelligenceJun-1-2023

Mobile and edge computing devices for always-on classification tasks require energy-efficient neural network architectures. In this paper we present several changes to neural architecture searches (NAS) that improve the chance of success in practical situations. Our search simultaneously optimizes for network accuracy, energy efficiency and memory usage. We benchmark the performance of our search on real hardware, but since running thousands of tests with real hardware is difficult we use a random forest model to roughly predict the energy usage of a candidate network. We present a search strategy that uses both Bayesian and regularized evolutionary search with particle swarms, and employs early-stopping to reduce the computational burden. Our search, evaluated on a sound-event classification dataset based upon AudioSet, results in an order of magnitude less energy per inference and a much smaller memory footprint than our baseline MobileNetV1/V2 implementations while slightly improving task accuracy. We also demonstrate how combining a 2D spectrogram with a convolution with many filters causes a computational bottleneck for audio classification and that alternative approaches reduce the computational burden but sacrifice task accuracy.

artificial intelligence, evolutionary algorithm, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2202.05397

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry: Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

CNN Architectures for Large-Scale Audio Classification

Hershey, Shawn, Chaudhuri, Sourish, Ellis, Daniel P. W., Gemmeke, Jort F., Jansen, Aren, Moore, R. Channing, Plakal, Manoj, Platt, Devin, Saurous, Rif A., Seybold, Bryan, Slaney, Malcolm, Weiss, Ron J., Wilson, Kevin

arXiv.org Machine LearningJan-10-2017

Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. We examine fully connected Deep Neural Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We investigate varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on our audio classification task, and larger training and label sets help up to a point. A model using embeddings from these classifiers does much better than raw features on the Audio Set [5] Acoustic Event Detection (AED) classification task.

deep learning, neural network, video, (15 more...)

arXiv.org Machine Learning

1609.0943

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Collaborative Filtering and the Missing at Random Assumption

Marlin, Benjamin, Zemel, Richard S., Roweis, Sam, Slaney, Malcolm

arXiv.org Machine LearningJun-20-2012

Rating prediction is an important application, and a popular research topic in collaborative filtering. However, both the validity of learning algorithms, and the validity of standard testing procedures rest on the assumption that missing ratings are missing at random (MAR). In this paper we present the results of a user study in which we collect a random sample of ratings from current users of an online radio service. An analysis of the rating data collected in the study shows that the sample of random ratings has markedly different properties than ratings of user-selected songs. When asked to report on their own rating behaviour, a large number of users indicate they believe their opinion of a song does affect whether they choose to rate that song, a violation of the MAR condition. Finally, we present experimental results showing that incorporating an explicit model of the missing data mechanism can lead to significant improvements in prediction performance on the random sample of ratings.

artificial intelligence, machine learning, missing data, (17 more...)

arXiv.org Machine Learning

1206.5267

Country: North America > Canada > Ontario > Toronto (0.29)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Leisure & Entertainment (1.00)
Media > Radio (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

FaceSync: A Linear Operator for Measuring Synchronization of Video Facial Images and Audio Tracks

Slaney, Malcolm, Covell, Michele

Neural Information Processing SystemsDec-31-2001

FaceSync is an optimal linear algorithm that finds the degree of synchronization between the audio and image recordings of a human speaker. Using canonical correlation, it finds the best direction to combine all the audio and image data, projecting them onto a single axis. FaceSync uses Pearson's correlation to measure the degree of synchronization between the audio and image data. We derive the optimal linear transform to combine the audio and visual information and describe an implementation that avoids the numerical problems caused by computing the correlation matrices.

artificial intelligence, correlation, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Europe (0.28)
North America > United States > California (0.14)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

FaceSync: A Linear Operator for Measuring Synchronization of Video Facial Images and Audio Tracks

Slaney, Malcolm, Covell, Michele

Neural Information Processing SystemsDec-31-2001

FaceSync is an optimal linear algorithm that finds the degree of synchronization betweenthe audio and image recordings of a human speaker. Using canonical correlation, it finds the best direction to combine allthe audio and image data, projecting them onto a single axis. FaceSync uses Pearson's correlation to measure the degree of synchronization betweenthe audio and image data. We derive the optimal linear transform to combine the audio and visual information and describe an implementation that avoids the numerical problems caused by computing thecorrelation matrices.

artificial intelligence, correlation, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Europe (0.28)
North America > United States > California (0.14)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.85)

Add feedback