Goto

Collaborating Authors

 Undirected Networks



Single-Channel Multi-Speaker Separation using Deep Clustering

arXiv.org Machine Learning

Deep clustering is a recently introduced deep learning architecture that uses discriminatively trained embeddings as the basis for clustering. It was recently applied to spectrogram segmentation, resulting in impressive results on speaker-independent multi-speaker separation. In this paper we extend the baseline system with an end-to-end signal approximation objective that greatly improves performance on a challenging speech separation. We first significantly improve upon the baseline system performance by incorporating better regularization, larger temporal context, and a deeper architecture, culminating in an overall improvement in signal to distortion ratio (SDR) of 10.3 dB compared to the baseline of 6.0 dB for two-speaker separation, as well as a 7.1 dB SDR improvement for three-speaker separation. We then extend the model to incorporate an enhancement layer to refine the signal estimates, and perform end-to-end training through both the clustering and enhancement stages to maximize signal fidelity. We evaluate the results using automatic speech recognition. The new signal approximation objective, combined with end-to-end training, produces unprecedented performance, reducing the word error rate (WER) from 89.1% down to 30.8%. This represents a major advancement towards solving the cocktail party problem.


A Beginner's Tutorial for Restricted Boltzmann Machines - Deeplearning4j: Open-source, distributed deep learning for the JVM

#artificialintelligence

Invented by Geoff Hinton, a Restricted Boltzmann machine is an algorithm useful for dimensionality reduction, classification, regression, collaborative filtering, feature learning and topic modeling. Given their relative simplicity and historical importance, restricted Boltzmann machines are the first neural network we'll tackle. In the paragraphs below, we describe in diagrams and plain language how they work. RBMs are shallow, two-layer neural nets that constitute the building blocks of deep-belief networks. The first layer of the RBM is called the visible, or input, layer, and the second is the hidden layer. Each circle in the graph above represents a neuron-like unit called a node, and nodes are simply where calculations take place.


How to Build a Neuron: Exploring AI in JavaScript Pt 2 -- JavaScript Scene

#artificialintelligence

In this series, we're discussing a topic that will transform the world we live in over the course of the next 25 years. We're going to see lots of drones, self driving cars, VR, and AR devices changing how we get around, how we transport things, and how we see and interact with the world, and it will all be powered by AI and neural nets. In part 1, we talked a little bit about what neurons are and how they work, and wrapped it up by showing a trivial example of how to sum synapse inputs and determine whether or not the neuron should fire, and finished off the article by suggesting a question: What about time? From here on out I'll be recording these adventures in a library called neurolib. If you're at all familiar with traditional neural nets, you're probably wondering when I'm going to start talking about gradient descent or Hidden Markov Models (HMM).


Missing Data Estimation in High-Dimensional Datasets: A Swarm Intelligence-Deep Neural Network Approach

arXiv.org Machine Learning

In this paper, we examine the problem of missing data in high-dimensional datasets by taking into consideration the Missing Completely at Random and Missing at Random mechanisms, as well as the Arbitrary missing pattern. Additionally, this paper employs a methodology based on Deep Learning and Swarm Intelligence algorithms in order to provide reliable estimates for missing data. The deep learning technique is used to extract features from the input data via an unsupervised learning approach by modeling the data distribution based on the input. This deep learning technique is then used as part of the objective function for the swarm intelligence technique in order to estimate the missing data after a supervised fine-tuning phase by minimizing an error function based on the interrelationship and correlation between features in the dataset. The investigated methodology in this paper therefore has longer running times, however, the promising potential outcomes justify the tradeoff. Also, basic knowledge of statistics is presumed.


Optimising The Input Window Alignment in CD-DNN Based Phoneme Recognition for Low Latency Processing

arXiv.org Machine Learning

A key factor that determines the usability of applications based on speech recognition is the latency or lag of the system. In dialogue systems, e.g., long latencies may disrupt the natural turntaking in the human-machine conversation. In other specific applications the lag may even be more critical. A typical example involves systems that use ASR to drive the lip movements of an avatar in real time to support telepresence [3, 4, 5]. The latency in a typical speech recogniser based on a hybrid between Neural Networks (NNs) and Hidden Markov Models (HMMs) is determined by a number of factors: - the hardware (sound card) introduces some lag in digitising the speech samples and making them available to the drivers. Typical values are in the order of milliseconds; - the speech samples are returned by the driver in buffers of a certain size (this could be as long as half a second, but can be reduced to a few ms); - in spectral based feature extraction, speech samples are grouped into windows (frames) often around 25-40 ms in length; - many methods for feature extraction also compute time derivatives of the features, which require a number of frames in the past and the future.


Imitation neurones, genuine potential

#artificialintelligence

This structural design can support calculations being made upon thousands of layers, and it was this aspect of the architecture that gave rise to the name'deep learning'. Marchand-Maillet explains: "Each artificial neurone is assigned an input value, which it computes using a mathematical function, only firing if the output exceeds a pre-defined threshold." In this way, it reproduces the behaviour of real neurones, which only fire and transmit information when the input signal (the potential difference across the entire neural circuit) reaches a certain level. In the artificial model, the results of a single layer are weighted, added up and then sent as the input signal to the following layer, which processes that input using different functions, and so on and so forth. For example, if a system is trained with great quantities of photos of apples and watermelons, it will progressively learn to distinguish them on the basis of diameter, says Marchand-Maillet. If it cannot decide (e.g., when processing a picture of a tiny watermelon), the subsequent layers take over by analysing the colours or textures of the fruit in the photo, and so on.


Deep Learning Libraries by Language

#artificialintelligence

Gensim is deep learning toolkit implemented in python programming language intended for handling large text collections, using efficient algorithms.


Modeling Group Dynamics Using Probabilistic Tensor Decompositions

arXiv.org Machine Learning

In this paper, we consider the problem of modeling discrete social network data and learning the underlying group dynamics. The goal is to develop probabilistic profiles of large collections of data while preserving the essential temporal relationships that provide insights for various applications of interest. For example, in social network analysis, we want to analyze relationships between social agents and their behaviors over time and on various social media sites (i.e., Facebook, Twitter, Instagram, Google, etc.). In web advertising analysis, we want to analyze the relationships between customers and the types of products they buy from different shopping sites to capture customers' buying behaviors and learn the intrinsic factors that effect their buying decision process. In the study of scientific collaboration, using co-authorship networks from multiple journals on related subjects, one can analyze relationships between subjects and authors.


Explainable Restricted Boltzmann Machines for Collaborative Filtering

arXiv.org Machine Learning

Most accurate recommender systems are black-box models, hiding the reasoning behind their recommendations. Yet explanations have been shown to increase the user's trust in the system in addition to providing other benefits such as scrutability, meaning the ability to verify the validity of recommendations. This gap between accuracy and transparency or explainability has generated an interest in automated explanation generation methods. Restricted Boltzmann Machines (RBM) are accurate models for CF that also lack interpretability. In this paper, we focus on RBM based collaborative filtering recommendations, and further assume the absence of any additional data source, such as item content or user attributes. We thus propose a new Explainable RBM technique that computes the top-n recommendation list from items that are explainable. Experimental results show that our method is effective in generating accurate and explainable recommendations.