Learning Graphical Models
End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF
State-of-the-art sequence labeling systems traditionally require large amounts of task-specific knowledge in the form of handcrafted features and data pre-processing. In this paper, we introduce a novel neutral network architecture that benefits from both word-and character-level representations automatically, by using combination of bidirectional LSTM, CNN and CRF. Our system is truly end-to-end, requiring no feature engineering or data pre-processing, thus making it applicable to a wide range of sequence labeling tasks. We evaluate our system on two data sets for two sequence labeling tasks -- Penn Treebank WSJ corpus for part-of-speech (POS) tagging and CoNLL 2003 corpus for named entity recognition (NER). We obtain state-of-the-art performance on both datasets -- 97.55% accuracy for POS tagging and 91.21% F1 for NER. 1 Introduction Linguistic sequence labeling, such as part-of- speech (POS) tagging and named entity recognition (NER), is one of the first stages in deep language understanding and its importance has been well recognized in the natural language processing community. Most traditional high performance sequence labeling models are linear statistical models, including Hidden Markov Models (HMM) and Conditional Random Fields (CRF) (Ratinov and Roth, 2009; Passos et al., 2014; Luo et al., 2015), which rely heavily on handcrafted features and task-specific resources. For example, English POS taggers benefit from carefully designed word spelling features; orthographic features and external resources such as gazetteers are widely used in NER. However, such task-specific knowledge is costly to develop (Ma and Xia, 2014), making sequence labeling models difficult to adapt to new tasks or new domains. In the past few years, nonlinear neural networks with as input distributed word representations, also known as word embeddings, have been broadly applied to NLP problems with great success.
Spatial Semantic Scan: Jointly Detecting Subtle Events and their Spatial Footprint
Many methods have been proposed for detecting emerging events in text streams using topic modeling. However, these methods have shortcomings that make them unsuitable for rapid detection of locally emerging events on massive text streams. We describe Spatially Compact Semantic Scan (SCSS) that has been developed specifically to overcome the shortcomings of current methods in detecting new spatially compact events in text streams. SCSS employs alternating optimization between using semantic scan (Liu and Neill (2011)) to estimate contrastive foreground topics in documents, and discovering spatial neighborhoods (Shao et al. (2011)) with high occurrence of documents containing the foreground topics. We evaluate our method on Emergency Department chief complaints dataset (ED dataset) to verify the effectiveness of our method in detecting real-world disease outbreaks from free-text ED chief complaint data.
Variational Tempering
Mandt, Stephan, McInerney, James, Abrol, Farhan, Ranganath, Rajesh, Blei, David
Variational inference (VI) combined with data subsampling enables approximate posterior inference over large data sets, but suffers from poor local optima. We first formulate a deterministic annealing approach for the generic class of conditionally conjugate exponential family models. This approach uses a decreasing temperature parameter which deterministically deforms the objective during the course of the optimization. A well-known drawback to this annealing approach is the choice of the cooling schedule. We therefore introduce variational tempering, a variational algorithm that introduces a temperature latent variable to the model. In contrast to related work in the Markov chain Monte Carlo literature, this algorithm results in adaptive annealing schedules. Lastly, we develop local variational tempering, which assigns a latent temperature to each data point; this allows for dynamic annealing that varies across data. Compared to the traditional VI, all proposed approaches find improved predictive likelihoods on held-out data.
Naïve-Bayes Technique for Machine Learning
"We are to admit no more causes of natural things than such as are both true and sufficient to explain their appearances." "When you have two competing theories that make exactly the same predictions, the simpler one is the better." One famous example of Occam's Razor in action is found in conspiracy theories surrounding the NASA moon landings. Many conspiracy theorists believe that the first Moon Landing was staged and filmed in a studio, part of an elaborate hoax. Their justification relies upon many twisted and convoluted theories, whereas the NASA argument is fairly straightforward.
Let Me Hear Your Voice and I'll Tell You How You Feel
Creating mood sensing technology has become very popular in recent years. There is a wide range of companies trying to detect your emotions from what you write, the tone of your voice, or from the expressions on your face. All of these companies offer their technology online through cloud-based programming interfaces (APIs). As part of my offline emotion sensing hardware (Project Jammin), I have already built early prototypes of facial expression and speech content recognition for emotion detection. In this short article I describe the missing part, a voice tone analyzer.
Mastering Machine Learning With scikit-learn
If you are a software developer who wants to learn how machine learning models work and how to apply them effectively, this book is for you. Familiarity with machine learning fundamentals and Python will be helpful, but is not essential. This book examines machine learning models including logistic regression, decision trees, and support vector machines, and applies them to common problems such as categorizing documents and classifying images. It begins with the fundamentals of machine learning, introducing you to the supervised-unsupervised spectrum, the uses of training and test data, and evaluating models. You will learn how to use generalized linear models in regression problems, as well as solve problems with text and categorical features. You will be acquainted with the use of logistic regression, regularization, and the various loss functions that are used by generalized linear models.
Variational Bayesian Inference for Hidden Markov Models With Multivariate Gaussian Output Distributions
Gruhl, Christian, Sick, Bernhard
Hidden Markov Models (HMM) are a standard technique in time series analysis or data mining. Given a (set of) time series sample data, they are typically trained by means of a special variant of an expectation maximization (EM) algorithm, the Baum-Welch algorithm. HMM are used for gesture recognition, machine tool monitoring, or speech recognition, for instance. Second-order techniques are used to find values for parameters of probabilistic models from sample data. The parameters are regarded as random variables, and distributions are defined over these variables. These type of these second-order distributions depends on the type of the underlying probabilistic models. Typically, so called conjugate distributions are chosen, e.g., a Gaussian-Wishart distribution for an underlying Gaussian for which mean and covariance matrix have to be determined. Second-order techniques have some advantages over conventional approaches, e.g.,
Particle Metropolis-adjusted Langevin algorithms
Nemeth, Christopher, Sherlock, Chris, Fearnhead, Paul
Markov chain Monte Carlo algorithms are a popular and well-studied methodology that can be used to draw samples from posterior distributions. Over the past few years these algorithms have been extended to tackle problems where the model likelihood is intractable (Beaumont, 2003). Andrieu and Roberts (2009) showed that within the Metropolis-Hastings algorithm, if the likelihood is replaced with an unbiased estimate, then the sampler still targets the correct stationary distribution. Andrieu et al. (2010) extended this work further to create a class of 1 Markov chain algorithms that use sequential Monte Carlo methods, also known as particle filters. Current implementations of pseudo-marginal and particle Markov chain Monte Carlo use random-walk proposals to update the parameters (e.g., Golightly and Wilkinson, 2011; Knape and de Valpine, 2012) and shall be referred to herein as particle random-walk Metropolis algorithms. Random walk-based algorithms propose a new value from some symmetric density centred on the current value.
EEF: Exponentially Embedded Families with Class-Specific Features for Classification
Tang, Bo, Kay, Steven, He, Haibo, Baggenstoss, Paul M.
Classification is one of fundamental problems in the fields of machine learning and signal processing. The commonly used classifier assigns a sample or a signal to the class with maximum posterior probability, which usually requires probability density function (PDF) estimation in an either model-driven or data-driven manner [1] [2] [3]. For high-dimensional data sets, it is necessary to perform feature reduction to estimate the PDFs robustly in a lowdimensional feature subspace. However, feature reduction may lose pertinent information for discrimination. For example, data samples from different classes that could be well separated in the raw data space may be overlapped in the feature subspace, causing classification errors. The PDF reconstruction approach provides a solution to address this information loss issue in feature reduction by reconstructing the PDF on raw data and making classification in raw data space, which could improve classification performance. Several approaches have been developed along this track.
Job opportunities (The University of Manchester)
This is an exciting opportunity for a researcher at post-doctoral level with experience of machine learning and data mining. You will work with senior data scientists based within the local NHS trusts, the University of Manchester Health eResearch Centre, and Health Innovation Manchester to automate data extraction of predetermined features for all patients diagnosed with ovarian and colorectal cancer in the conurbation. Machine learning tools including neural networks, support vector machines and naïve Bayes algorithms will be refined and tested using the datasets accrued and optimised for clinical practice. Accuracy of prediction will be assessed using predefined criteria. Knowledge of cancer treatment would be useful but is not essential, as the team has extensive expertise in this area.