Learning Graphical Models
Sarcasm Detection with Machine Learning in Spark
This post is inspired by a site I found whilst searching for a way to detect sarcasm within sentences. As humans we sometimes struggle detecting sarcasm when we have a lot more contextual information available to us. People are emotive when they speak, they use certain tones and these traits can help us understand when someone is being sarcastic. However we don't always catch it! So how the hell could a computer detect this, when all it has is text.
10 Machine Learning Experts You Need to Know - Dataconomy
Machine learning- to put it mildly- is an incredibly broad and varied field, with multitudes of applications. Thus, writing a list entitled "10 Machine Learning Experts You Need to Know" proves challenging for a number of reasons. Firstly, I've restricted my ten picks to those currently working in the field- if I extended it to those living and passed, I never would have been able to identify only ten worthy of mention. Secondly, this list is in no way ranked- how would I decide which is more remarkable? Third, this is by no means an exhaustive list of people currently making significant contributions to the field of machine learning, or the wider world.
Spectral Clustering using PCKID - A Probabilistic Cluster Kernel for Incomplete Data
Løkse, Sigurd, Bianchi, Filippo Maria, Salberg, Arnt-Børre, Jenssen, Robert
In this paper, we propose PCKID, a novel, robust, kernel function for spectral clustering, specifically designed to handle incomplete data. By combining posterior distributions of Gaussian Mixture Models for incomplete data on different scales, we are able to learn a kernel for incomplete data that does not depend on any critical hyperparameters, unlike the commonly used RBF kernel. To evaluate our method, we perform experiments on two real datasets. PCKID outperforms the baseline methods for all fractions of missing values and in some cases outperforms the baseline methods with up to 25 percentage points.
Scalable Inference for Nested Chinese Restaurant Process Topic Models
Chen, Jianfei, Zhu, Jun, Lu, Jie, Liu, Shixia
Nested Chinese Restaurant Process (nCRP) topic models are powerful nonparametric Bayesian methods to extract a topic hierarchy from a given text corpus, where the hierarchical structure is automatically determined by the data. Hierarchical Latent Dirichlet Allocation (hLDA) is a popular instance of nCRP topic models. However, hLDA has only been evaluated at small scale, because the existing collapsed Gibbs sampling and instantiated weight variational inference algorithms either are not scalable or sacrifice inference quality with mean-field assumptions. Moreover, an efficient distributed implementation of the data structures, such as dynamically growing count matrices and trees, is challenging. In this paper, we propose a novel partially collapsed Gibbs sampling (PCGS) algorithm, which combines the advantages of collapsed and instantiated weight algorithms to achieve good scalability as well as high model quality. An initialization strategy is presented to further improve the model quality. Finally, we propose an efficient distributed implementation of PCGS through vectorization, pre-processing, and a careful design of the concurrent data structures and communication strategy. Empirical studies show that our algorithm is 111 times more efficient than the previous open-source implementation for hLDA, with comparable or even better model quality. Our distributed implementation can extract 1,722 topics from a 131-million-document corpus with 28 billion tokens, which is 4-5 orders of magnitude larger than the previous largest corpus, with 50 machines in 7 hours.
Thinking Deeply to Make Better Speech
A humanoid robot, named Aiko Chihira by its creators at Toshiba and Osaka University, at a 2015 trial in Tokyo's Mitsukoshi department store. Toshiba says it will incorporate speech recognition and synthesis into the robot by 2020. Machines that speak are nothing new. Siri has been answering questions from iPhone users since 2011, and text-to-voice programs have been around even longer. People with speaking disabilities--most famously, Stephen Hawking--have used computers to generate speech for decades.
The Mathematics of Machine Learning
In the last few months, I have had several people contact me about their enthusiasm for venturing into the world of data science and using Machine Learning (ML) techniques to probe statistical regularities and build impeccable data-driven products. However, I've observed that some actually lack the necessary mathematical intuition and framework to get useful results. This is the main reason I decided to write this blog post. Recently, there has been an upsurge in the availability of many easy-to-use machine and deep learning packages such as scikit-learn, Weka, Tensorflow etc. Machine Learning theory is a field that intersects statistical, probabilistic, computer science and algorithmic aspects arising from learning iteratively from data and finding hidden insights which can be used to build intelligent applications. Despite the immense possibilities of Machine and Deep Learning, a thorough mathematical understanding of many of these techniques is necessary for a good grasp of the inner workings of the algorithms and getting good results. There are many reasons why the mathematics of Machine Learning is important and I'll highlight some of them below: The main question when trying to understand an interdisciplinary field such as Machine Learning is the amount of maths necessary and the level of maths needed to understand these techniques.
Social Learning and Diffusion of Pervasive Goods: An Empirical Study of an African App Store
Nia, Meisam Hejazi, Ratchford, Brian T., Bruce, Norris
In this study, the authors develop a structural model that combines a macro diffusion model with a micro choice model to control for the effect of social influence on the mobile app choices of customers over app stores. Social influence refers to the density of adopters within the proximity of other customers. Using a large data set from an African app store and Bayesian estimation methods, the authors quantify the effect of social influence and investigate the impact of ignoring this process in estimating customer choices. The findings show that customer choices in the app store are explained better by offline than online density of adopters and that ignoring social influence in estimations results in biased estimates. Furthermore, the findings show that the mobile app adoption process is similar to adoption of music CDs, among all other classic economy goods. A counterfactual analysis shows that the app store can increase its revenue by 13.6% through a viral marketing policy (e.g., a sharing with friends and family button).
Generative Temporal Models with Memory
Gemici, Mevlana, Hung, Chia-Chun, Santoro, Adam, Wayne, Greg, Mohamed, Shakir, Rezende, Danilo J., Amos, David, Lillicrap, Timothy
We consider the general problem of modeling temporal data with long-range dependencies, wherein new observations are fully or partially predictable based on temporally-distant, past observations. A sufficiently powerful temporal model should separate predictable elements of the sequence from unpredictable elements, express uncertainty about those unpredictable elements, and rapidly identify novel elements that may help to predict the future. To create such models, we introduce Generative T emporal Modelsaugmented with external memory systems. They are developed within the variational inference framework, which provides both a practical training methodology and methods to gain insight into the models' operation. We show, on a range of problems with sparse, long-term temporal dependencies, that these models store information from early in a sequence, and reuse this stored information efficiently. This allows them to perform substantially better than existing models based on well-known recurrent neural networks, like LSTMs. Many of the data sets we use in machine learning applications are sequential, whether these be natural language and speech processing data, streams of high-definition video, longitudinal time-series from medical diagnostics, or spatiotemporal data in climate forecasting. Generative Temporal Models (GTMs) are a core requirement for these applications. Generative Temporal Models are also important components of intelligent agents, as they permit counterfactual reasoning, physical predictions, robot localisation, and simulation-based planning among other capacities (Sutton, 1991; Deisenroth and Rasmussen, 2011; Watter et al., 2015; Levine and Abbeel, 2014; Assael et al., 2015). These tasks require models of high-dimensional observation sequences and contain complex, long temporal dependencies--requirements that most available GTMs are unable to fulfil. Developing such GTMs is the aim of this paper. Many GTMs--whether they are linear or nonlinear, deterministic or stochastic--assume that the underlying temporal dynamics is governed by low-order Markov transitions and use fixed-dimensional sufficient statistics. Examples of such models include Hidden Markov Models (Rabiner, 1989), and linear dynamical systems such as Kalman filters and their nonlinear extensions (Kalman, 1960; Ghahramani and Hinton, 1996; Krishnan et al., 2015). The fixed-order Markov assumption used in these models is insufficient for characterising many systems of practical relevance.
Amortised MAP Inference for Image Super-resolution
Sønderby, Casper Kaae, Caballero, Jose, Theis, Lucas, Shi, Wenzhe, Huszár, Ferenc
Image super-resolution (SR) is an underdetermined inverse problem, where a large number of plausible high-resolution images can explain the same downsampled image. Most current single image SR methods use empirical risk minimisation, often with a pixel-wise mean squared error (MSE) loss. However, the outputs from such methods tend to be blurry, over-smoothed and generally appear implausible. A more desirable approach would employ Maximum a Posteriori (MAP) inference, preferring solutions that always have a high probability under the image prior, and thus appear more plausible. Direct MAP estimation for SR is non-trivial, as it requires us to build a model for the image prior from samples. Furthermore, MAP inference is often performed via optimisation-based iterative algorithms which don't compare well with the efficiency of neural-network-based alternatives. Here we introduce new methods for amortised MAP inference whereby we calculate the MAP estimate directly using a convolutional neural network. We first introduce a novel neural network architecture that performs a projection to the affine subspace of valid SR solutions ensuring that the high resolution output of the network is always consistent with the low resolution input. We show that, using this architecture, the amortised MAP inference problem reduces to minimising the cross-entropy between two distributions, similar to training generative models. We propose three methods to solve this optimisation problem: (1) Generative Adversarial Networks (GAN) (2) denoiser-guided SR which backpropagates gradient-estimates from denoising to train the network, and (3) a baseline method using a maximum-likelihood-trained image prior. Our experiments show that the GAN based approach performs best on real image data. Lastly, we establish a connection between GANs and amortised variational inference as in e.g. variational autoencoders.