AITopics | Text Classification

Collaborating Authors

Text Classification

"A text classifier is an automated means of determining some metadata about a document. Text classifiers are used for such diverse needs as spam filtering, suggesting categories for indexing a document created in a content management system, or automatically sorting help desk requests."
– John Graham-Cumming, Naive Bayesian Text Classification. Dr. Dobb's. May 1 2005.

News Overviews Instructional Materials AI-Alerts Classics

Automatic Classification of Poetry by Meter and Rhyme

Tanasescu, Chris (University of Ottawa) | Paget, Bryan (University of Ottawa) | Inkpen, Diana (University of Ottawa)

AAAI ConferencesMay-8-2016

In this paper, we focus on large scale poetry classification by meter. We repurposed an open source poetry scanning program (the Scandroid by Charles O. Hartman) as a feature extractor. Our machine learning experiments show a useful ability to classify poems by poetic meter. We also made our own rhyme detector using the Carnegie Melon University Pronouncing Dictionary as our primary source of pronunciation information. Future work will involve classifying rhyme and assembling a graph (or graphs) as part of the Graph Poem Project depicting the interconnected nature of poetry across history, geography, genre, etc.

automatic classification, meter and rhyme, poetry

AAAI Conferences

The Twenty-Ninth International Flairs Conference

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.40)

Add feedback

Semisupervised Text Classification Using Unsupervised Topic Information

Dorado, Rubén (École de Technologie Supérieure, Université du Québec) | Ratté, Sylvie

AAAI ConferencesMay-8-2016

Labeling corpora is a time consuming and recurring problem while developing practical NLP applications. In this paper, we present a semi-supervised method to build a text classifier using unsupervised topic information. The objective is to use the least amount of labeled data to accelerate the creation of corpus for classification in specific domains. We show that it is possible to obtain a performance similar to state-of-the-art methods, despite the limited quantity of data.Labeling corpora is a time consuming and recurring problem while developing practical NLP applications. In this paper, we present a semi-supervised method to build a text classifier using unsupervised topic information. The objective is to use the least amount of labeled data to accelerate the creation of corpus for specific classification process. We show that it is possible to obtain a performance similar to state-of-the-art methods, despite the limited quantity of data.

semisupervised text classification, unsupervised topic information

AAAI Conferences

The Twenty-Ninth International Flairs Conference

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.73)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.40)

Add feedback

Text Analysis 101; A Basic Understanding for Business Users: Clustering and Unsupervised Methods

@machinelearnbotMay-6-2016, 01:26:19 GMT

This blog was originally posted as part of our Text Analysis 101 blog series. It aims to explain how the classification of text works as part of Natural Language Processing. It was the second blog on harnessing Machine Learning (ML) in the form of Natural Language Processing (NLP) for the Automatic Classification of documents. By classifying text, we aim to assign a document or piece of text to one or more classes or categories making it easier to manage or sort. A Document Classifier often returns or assigns a category "label" or "code" to a document or piece of text.

machine learning, natural language, text classification, (15 more...)

@machinelearnbot

Country: North America > United States > New York (0.05)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.31)

Add feedback

A judge has partially dismissed Twitter's surveillance case against the government

PCWorldMay-2-2016, 23:10:21 GMT

A California court has dismissed part of a lawsuit brought by Twitter that challenges U.S. government restrictions on what it can say about surveillance requests on its users. Twitter sued the government in 2014, alleging that the restrictions, which are common to all Internet service providers, infringe its First Amendment right to free speech. Earlier this year, the Department of Justice asked the federal district court in Oakland, California, to toss out the lawsuit. It argued that the Foreign Intelligence Surveillance Court (FISC) is a more suitable venue to hear the dispute, and that part of Twitter's argument didn't stand because the company isn't disputing document classification decisions made by the government. On Monday, a judge agreed with the government's latter argument but denied its request to shift the case to FISC.

artificial intelligence, natural language, text classification, (12 more...)

PCWorld

Country: North America > United States > California > Alameda County > Oakland (0.27)

Industry:

Law (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.60)

Add feedback

Automatic Webpage Classification • /r/MachineLearning

@machinelearnbotApr-21-2016, 23:48:10 GMT

I'm trying to create a document classifier but I'm not able to think of features to use. Anybody has experience with this? I used beautiful soup to remove the tags. I know tf-idf can be used, but not exactly sure how. Suggestions on how to'clean' the data better (eg removing stop words, stemming, etc) are also welcome.

machine learning, machinelearning, natural language, (2 more...)

@machinelearnbot

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.40)
Information Technology > Artificial Intelligence > Machine Learning (0.37)

Add feedback

Semi-Supervised Multinomial Naive Bayes for Text Classification by Leveraging Word-Level Statistical Constraint

Zhao, Li (Tsinghua University) | Huang, Minlie (Tsinghua University) | Yao, Ziyu (Beijing University of Posts and Telecommunications) | Su, Rongwei (Samsung Research and Development Institute China - Beijing) | Jiang, Yingying (Samsung Research and Development Institute China - Beijing) | Zhu, Xiaoyan (Tsinghua University)

AAAI ConferencesApr-19-2016

Multinomial Naive Bayes with Expectation Maximization (MNB-EM) is a standard semi-supervised learning method to augment Multinomial Naive Bayes (MNB) for text classification. Despite its success, MNB-EM is not stable, and may succeed or fail to improve MNB. We believe that this is because MNB-EM lacks the ability to preserve the class distribution on words. In this paper, we propose a novel method to augment MNB-EM by leveraging the word-level statistical constraint to preserve the class distribution on words. The word-level statistical constraints are further converted to constraints on document posteriors generated by MNB-EM. Experiments demonstrate that our method can consistently improve MNB-EM, and outperforms state-of-art baselines remarkably.

constraint, machine learning, natural language, (16 more...)

AAAI Conferences

Thirtieth AAAI Conference on Artificial Intelligence

Country:

North America > United States (1.00)
Asia (0.69)

Genre:

Research Report > Experimental Study (0.46)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.92)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.88)

Add feedback

Text Classification with Heterogeneous Information Network Kernels

Wang, Chenguang (Peking University) | Song, Yangqiu (West Virginia University) | Li, Haoran (Peking University) | Zhang, Ming (Peking University) | Han, Jiawei (University of Illinois at Urbana-Champaign)

AAAI ConferencesApr-19-2016

Text classification is an important problem with many applications. Traditional approaches represent text as a bag-of-words and build classifiers based on this representation. Rather than words, entity phrases, the relations between the entities, as well as the types of the entities and relations carry much more information to represent the texts. This paper presents a novel text as network classification framework, which introduces 1) a structured and typed heterogeneous information networks (HINs) representation of texts, and 2) a meta-path based approach to link texts. We show that with the new representation and links of texts, the structured and typed information of entities and relations can be incorporated into kernels. Particularly, we develop both simple linear kernel and indefinite kernel based on meta-paths in the HIN representation of texts, where we call them HIN-kernels. Using Freebase, a well-known world knowledge base, to construct HIN for texts, our experiments on two benchmark datasets show that the indefinite HIN kernel based on weighted meta-paths outperforms the state-of-the-art methods and other HIN-kernels.

machine learning, natural language, text classification, (19 more...)

AAAI Conferences

Thirtieth AAAI Conference on Artificial Intelligence

Country: North America > United States (0.46)

Genre: Research Report (0.34)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Robust Text Classification in the Presence of Confounding Bias

Landeiro, Virgile (Illinois Institute of Technology) | Culotta, Aron (Illinois Institute of Technology)

AAAI ConferencesApr-19-2016

As text classifiers become increasingly used in real-time applications, it is critical to consider not only their accuracy but also their robustness to changes in the data distribution. In this paper, we consider the case where there is a confounding variable Z that influences both the text features X and the class variable Y. For example, a classifier trained to predict the health status of a user based on their online communications may be confounded by socioeconomic variables. When the influence of Z changes from training to testing data, we find that classifier accuracy can degrade rapidly. Our approach, based on Pearl's back-door adjustment, estimates the underlying effect of a text variable on the class variable while controlling for the confounding variable. Although our goal is prediction, not causal inference, we find that such adjustments are essential to building text classifiers that are robust to confounding variables. On three diverse text classifications tasks, we find that covariate adjustment results in higher accuracy than competing baselines over a range of confounding relationships (e.g., in one setting, accuracy improves from 60% to 81%).

accuracy, machine learning, natural language, (20 more...)

AAAI Conferences

Thirtieth AAAI Conference on Artificial Intelligence

Country: North America > United States (1.00)

Genre:

Research Report > Experimental Study (0.95)
Research Report > New Finding (0.69)

Industry:

Health & Medicine (1.00)
Media > Film (0.70)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)

Add feedback

Natural Language Processing for programmers: part 3 -- World Writable

#artificialintelligenceApr-4-2016, 21:35:59 GMT

Previously, I experimented with text generation using context-free grammars, one of the oldest techniques in natural language processing. I'll come back to CFGs in a future post. In this one I'm going to try my hand at classifiers. Automatic classification is the process by which a computer is trained to categorize an item into one or more defined buckets. A common type of classification is no doubt working on your behalf right this moment: spam filtering.

artificial intelligence, classification, text classification, (15 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.37)

Add feedback

5 Text Classification Case Studies Using SciKit Learn

@machinelearnbotApr-3-2016, 12:45:05 GMT

News Classification for Startup Intelligence: CB Insights, a startup intelligence data provider, shows an example of classifying news into HR & employee related classifications. CB Insights, a startup intelligence data provider, assessment of private company health includes tracking of their human resources activities. This includes programmatic monitoring of hiring activity as evidenced by job postings & key hires and departures. They used Sci-Kit learn to help in their activities. Human Resources classification is binary classification problem in the sense that the news should be able to discriminate human resources events that for companies from the all other news.

machine learning, natural language, text classification case study, (6 more...)

@machinelearnbot

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.40)

Add feedback