AITopics | Text Classification

Collaborating Authors

Text Classification

"A text classifier is an automated means of determining some metadata about a document. Text classifiers are used for such diverse needs as spam filtering, suggesting categories for indexing a document created in a content management system, or automatically sorting help desk requests."
– John Graham-Cumming, Naive Bayesian Text Classification. Dr. Dobb's. May 1 2005.

News Overviews Instructional Materials AI-Alerts Classics

Walmart Competition: Trip Type Classification

@machinelearnbotJun-6-2017, 18:40:15 GMT

They took the NYC Data Science Academy 12-week full-time data science bootcamp program from Sep. 23 to Dec. 18, 2015. The post was based on their fourth in-class project (due after the 8th week of the program). Walmart uses trip type classification to segment its shoppers and their store visits to better improve the shopping experience. Walmart's trip types are created from a combination of existing customer insights and purchase history data. The purpose of the Kaggle competition is to use only the purchase data provided to derive Walmart's classification labels.

machine learning, natural language, text classification, (15 more...)

@machinelearnbot

Industry: Retail (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.46)

Add feedback

Text Classification & Sentiment Analysis tutorial / blog

@machinelearnbotJun-2-2017, 17:10:07 GMT

For a more technical explanation, this and this article can be read. Here you can find a good explanation as well as a list of the mostly used Kernel functions.

machine learning, natural language, text classification, (14 more...)

@machinelearnbot

Country:

North America > United States (0.14)
Europe > Netherlands > South Holland > The Hague (0.05)

Genre: Instructional Material > Course Syllabus & Notes (0.64)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.68)
(3 more...)

Add feedback

Classification with scikit-learn

@machinelearnbotJun-1-2017, 18:46:17 GMT

For python programmers, scikit-learn is one of the best libraries to build Machine Learning applications with. It is ideal for beginners because it has a really simple interface, it is well documented with many examples and tutorials. Besides supervised machine learning (classification and regression), it can also be used for clustering, dimensionality reduction, feature extraction and engineering, and pre-processing the data. The interface is consistent over all of these methods, so it is not only easy to use, but it is also easy to construct a large ensemble of classifiers/regression models and train them with the same commands. In this blog lets have a look at how to build, train, evaluate and validate a classifier with scikit-learn and in this way get familiar with the scikit-learn library.

machine learning, natural language, text classification, (5 more...)

@machinelearnbot

Technology:

Information Technology > Communications > Social Media (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.62)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.40)

Add feedback

Email Spam Filtering : A python implementation with scikit-learn

#artificialintelligenceMay-21-2017, 20:32:07 GMT

This article was written by ML bot2 on Machine Learning in Action. Automation of a number of applications like sentiment analysis, document classification, topic classification, text summarization, machine translation, etc has been done using machine learning models. Spam filtering is a beginner's example of document classification task which involves classifying an email as spam or non-spam (a.k.a. I have extracted equal number of spam and non-spam emails from Ling-spam corpus.

natural language, spam filtering, text classification, (5 more...)

#artificialintelligence

Technology:

Information Technology > Security & Privacy > Spam Filtering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.91)

Add feedback

Email Spam Filtering: An Implementation with Python and Scikit-learn

@machinelearnbotMay-17-2017, 21:40:09 GMT

Text mining (deriving information from text) is a wide field which has gained popularity with the huge text data being generated. Automation of a number of applications like sentiment analysis, document classification, topic classification, text summarization, machine translation, etc has been done using machine learning models. Spam filtering is a beginner's example of document classification task which involves classifying an email as spam or non-spam (a.k.a. Spam box in your Gmail account is the best example of this. So lets get started in building a spam filter on a publicly available mail corpus.

machine learning, natural language, spam filtering, (16 more...)

@machinelearnbot

Technology:

Information Technology > Security & Privacy > Spam Filtering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.53)
(2 more...)

Add feedback

High Recall Text Classification for Public Health Systematic Review

McNamee, Paul (Johns Hopkins University) | Mayfield, James (Johns Hopkins University) | Rowe, Samantha Y. (U.S. Centers for Disease Control and Prevention) | Rowe, Alexander K. (U.S. Centers for Disease Control and Prevention) | Jackson, Hannah L. (U.S. Centers for Disease Control and Prevention) | Baker, Megan (Johns Hopkins University)

AAAI ConferencesMay-16-2017

Some information retrieval applications demand manageable levels of precision at high levels of recall. Examples include e-discovery, patent search, and systematic review. In this paper we present a real-world case study supporting a broad topic systematic review in the public health domain. We provide experimental results that demonstrate how retrieval performance on bibliographic citations can be materially improved. We attained an average precision of 0.57 and recall approaching 80% at a very reasonable screening depth. These results represent 18% and 23% relative gains over a baseline classifier. We also address pragmatic issues that arise when working on “noisy” real-world data, such as coping with citation records that often have empty fields.

high recall text classification, public health systematic review

AAAI Conferences

The Thirtieth International Flairs Conference

Industry:

Health & Medicine > Public Health (0.60)
Health & Medicine > Health Care Providers & Services (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.40)

Add feedback

A Longitudinal Study of Topic Classification on Twitter

Iman, Zahra (Oregon State University) | Sanner, Scott (University of Toronto) | Bouadjenek, Mohamed Reda (University of Melbourne) | Xie, Lexing (Australian National University and Data61)

AAAI ConferencesMay-11-2017

Twitter represents a massively distributed information source over a kaleidoscope of topics ranging from social and political events to entertainment and sports news. While recent work has suggested that variations on standard classifiers can be effectively trained as topical filters (Lin, Snow, and Morgan 2011; Yang et al. 2014; Magdy and Elsayed 2014), there remain many open questions about the efficacy of such classification-based filtering approaches. For example, over a year or more after training, how well do such classifiers generalize to future novel topical content, and are such results stable across a range of topics? Furthermore, what features and feature classes are most critical for long-term classifier performance? To answer these questions, we collected a corpus of over 800 million English Tweets via the Twitter streaming API during 2013 and 2014 and learned topic classifiers for 10 diverse themes ranging from social issues to celebrity deaths to the “Iran nuclear deal”. The results of this long-term study of topic classifier performance provide a number of important insights, among them that (1) such classifiers can indeed generalize to novel topical content with high precision over a year or more after training and (2) simple terms and locations are the most informative feature classes (despite training on classes labeled via hashtags).

social media, text classification, topic classification, (4 more...)

AAAI Conferences

Eleventh International AAAI Conference on Web and Social Media

Country: Asia > Middle East > Iran (0.53)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.40)

Add feedback

Document Classification with scikit-learn

@machinelearnbotMay-9-2017, 18:05:06 GMT

Document classification is a fundamental machine learning task. It is used for all kinds of applications, like filtering spam, routing support request to the right support rep, language detection, genre classification, sentiment analysis, and many more. To demonstrate text classification with scikit-learn, we're going to build a simple spam filter. While the filters in production for services like Gmail are vastly more sophisticated, the model we'll have by the end of this tutorial is effective, and surprisingly accurate. Spam filtering is kind of like the "Hello world" of document classification. However, something to be aware of is that you aren't limited to two classes.

classifier, machine learning, natural language, (18 more...)

@machinelearnbot

Genre: Instructional Material > Course Syllabus & Notes (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)

Add feedback

Text Analysis 101: Document Classification

@machinelearnbotMay-7-2017, 17:50:10 GMT

Document classification is an example of Machine Learning (ML) in the form of Natural Language Processing (NLP). By classifying text, we are aiming to assign one or more classes or categories to a document, making it easier to manage and sort.

document classification, natural language, text classification, (2 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.84)

Add feedback

Email Spam Filtering : A python implementation with scikit-learn

@machinelearnbotMay-6-2017, 17:05:13 GMT

This article was written by ML bot2 on Machine Learning in Action. Text mining (deriving information from text) is a wide field which has gained popularity with the huge text data being generated. Automation of a number of applications like sentiment analysis, document classification, topic classification, text summarization, machine translation, etc has been done using machine learning models. Spam filtering is a beginner's example of document classification task which involves classifying an email as spam or non-spam (a.k.a. Spam box in your Gmail account is the best example of this.

natural language, python implementation, text classification, (5 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.92)

Add feedback