Goto

Collaborating Authors

 Text Classification



Classification with scikit-learn

@machinelearnbot

For python programmers, scikit-learn is one of the best libraries to build Machine Learning applications with. It is ideal for beginners because it has a really simple interface, it is well documented with many examples and tutorials. Besides supervised machine learning (classification and regression), it can also be used for clustering, dimensionality reduction, feature extraction and engineering, and pre-processing the data. The interface is consistent over all of these methods, so it is not only easy to use, but it is also easy to construct a large ensemble of classifiers/regression models and train them with the same commands. In this blog lets have a look at how to build, train, evaluate and validate a classifier with scikit-learn and in this way get familiar with the scikit-learn library.


Email Spam Filtering : A python implementation with scikit-learn

#artificialintelligence

This article was written by ML bot2 on Machine Learning in Action. Automation of a number of applications like sentiment analysis, document classification, topic classification, text summarization, machine translation, etc has been done using machine learning models. Spam filtering is a beginner's example of document classification task which involves classifying an email as spam or non-spam (a.k.a. I have extracted equal number of spam and non-spam emails from Ling-spam corpus.


Email Spam Filtering: An Implementation with Python and Scikit-learn

@machinelearnbot

Text mining (deriving information from text) is a wide field which has gained popularity with the huge text data being generated. Automation of a number of applications like sentiment analysis, document classification, topic classification, text summarization, machine translation, etc has been done using machine learning models. Spam filtering is a beginner's example of document classification task which involves classifying an email as spam or non-spam (a.k.a. Spam box in your Gmail account is the best example of this. So lets get started in building a spam filter on a publicly available mail corpus.


High Recall Text Classification for Public Health Systematic Review

AAAI Conferences

Some information retrieval applications demand manageable levels of precision at high levels of recall. Examples include e-discovery, patent search, and systematic review. In this paper we present a real-world case study supporting a broad topic systematic review in the public health domain. We provide experimental results that demonstrate how retrieval performance on bibliographic citations can be materially improved. We attained an average precision of 0.57 and recall approaching 80% at a very reasonable screening depth. These results represent 18% and 23% relative gains over a baseline classifier. We also address pragmatic issues that arise when working on “noisy” real-world data, such as coping with citation records that often have empty fields.


A Longitudinal Study of Topic Classification on Twitter

AAAI Conferences

Twitter represents a massively distributed information source over a kaleidoscope of topics ranging from social and political events to entertainment and sports news. While recent work has suggested that variations on standard classifiers can be effectively trained as topical filters (Lin, Snow, and Morgan 2011; Yang et al. 2014; Magdy and Elsayed 2014), there remain many open questions about the efficacy of such classification-based filtering approaches. For example, over a year or more after training, how well do such classifiers generalize to future novel topical content, and are such results stable across a range of topics? Furthermore, what features and feature classes are most critical for long-term classifier performance? To answer these questions, we collected a corpus of over 800 million English Tweets via the Twitter streaming API during 2013 and 2014 and learned topic classifiers for 10 diverse themes ranging from social issues to celebrity deaths to the “Iran nuclear deal”. The results of this long-term study of topic classifier performance provide a number of important insights, among them that (1) such classifiers can indeed generalize to novel topical content with high precision over a year or more after training and (2) simple terms and locations are the most informative feature classes (despite training on classes labeled via hashtags).


Document Classification with scikit-learn

@machinelearnbot

Document classification is a fundamental machine learning task. It is used for all kinds of applications, like filtering spam, routing support request to the right support rep, language detection, genre classification, sentiment analysis, and many more. To demonstrate text classification with scikit-learn, we're going to build a simple spam filter. While the filters in production for services like Gmail are vastly more sophisticated, the model we'll have by the end of this tutorial is effective, and surprisingly accurate. Spam filtering is kind of like the "Hello world" of document classification. However, something to be aware of is that you aren't limited to two classes.


Text Analysis 101: Document Classification

@machinelearnbot

Document classification is an example of Machine Learning (ML) in the form of Natural Language Processing (NLP). By classifying text, we are aiming to assign one or more classes or categories to a document, making it easier to manage and sort.


Email Spam Filtering : A python implementation with scikit-learn

@machinelearnbot

This article was written by ML bot2 on Machine Learning in Action. Text mining (deriving information from text) is a wide field which has gained popularity with the huge text data being generated. Automation of a number of applications like sentiment analysis, document classification, topic classification, text summarization, machine translation, etc has been done using machine learning models. Spam filtering is a beginner's example of document classification task which involves classifying an email as spam or non-spam (a.k.a. Spam box in your Gmail account is the best example of this.


Learn how to create Text Analytics solutions with Azure ML Templates

#artificialintelligence

The Microsoft Azure ML team recently announced the availability of 3 ML templates on the Azure ML Studio – for online fraud detection, retail forecasting and text classification. These templates demonstrate industry best practices and common building blocks used in an ML solution for a specific domain, starting from data preparation, data processing, feature engineering, model training to model deployment (as a web service) . The goal for Azure ML templates is to make data scientists more productive and faster in building and deploying their custom ML solutions on the cloud. Templates include a collection of pre-configured Azure ML modules as well as custom R scripts in the Execute R Script modules to enable an end-to-end solution. We'll walk through these templates in detail in this and future webinars.