AITopics | Text Classification

Collaborating Authors

Text Classification

"A text classifier is an automated means of determining some metadata about a document. Text classifiers are used for such diverse needs as spam filtering, suggesting categories for indexing a document created in a content management system, or automatically sorting help desk requests."
– John Graham-Cumming, Naive Bayesian Text Classification. Dr. Dobb's. May 1 2005.

News Overviews Instructional Materials AI-Alerts Classics

Learn how to create Text Analytics solutions with Azure ML Templates

#artificialintelligenceMar-9-2017, 05:50:24 GMT

The Microsoft Azure ML team recently announced the availability of 3 ML templates on the Azure ML Studio – for online fraud detection, retail forecasting and text classification. These templates demonstrate industry best practices and common building blocks used in an ML solution for a specific domain, starting from data preparation, data processing, feature engineering, model training to model deployment (as a web service) . The goal for Azure ML templates is to make data scientists more productive and faster in building and deploying their custom ML solutions on the cloud. Templates include a collection of pre-configured Azure ML modules as well as custom R scripts in the Execute R Script modules to enable an end-to-end solution. We'll walk through these templates in detail in this and future webinars.

machine learning, natural language, template, (13 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.53)

Industry: Information Technology (0.61)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.71)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.56)

Add feedback

Character-level Convolutional Networks for Text Classification

#artificialintelligenceFeb-25-2017, 18:05:16 GMT

One of the common natural language understanding problems is text classification. Over last few decades, machine learning researchers have been moving from the simplest "bag of words" model to more sophisticated models for text classification. Bag of words model uses only information about which words are used in the text. Adding TFIDF to the bag of words helps to track relevancy of each word to the document. Bag of n-grams enables using partial information about structure of the text. Recurrent neural networks, like LSTM, can capture dependencies between words even if they are far from each other.

machine learning, natural language, text classification, (11 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improving Multi-Document Summarization via Text Classification

Cao, Ziqiang (The Hong Kong Polytechnic University) | Li, Wenjie (The Hong Kong Polytechnic University) | Li, Sujian (Peking University) | Wei, Furu (Microsoft Research)

AAAI ConferencesFeb-14-2017

Developed so far, multi-document summarization has reached its bottleneck due to the lack of sufficient training data and diverse categories of documents. Text classification just makes up for these deficiencies. In this paper, we propose a novel summarization system called TCSum, which leverages plentiful text classification data to improve the performance of multi-document summarization. TCSum projects documents onto distributed representations which act as a bridge between text classification and summarization. It also utilizes the classification results to produce summaries of different styles. Extensive experiments on DUC generic multi-document summarization datasets show that, TCSum can achieve the state-of-the-art performance without using any hand-crafted features and has the capability to catch the variations of summary styles with respect to different text categories.

category, proceedings, summarization, (14 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country:

Asia > China > Hong Kong (0.05)
North America > United States > New York (0.04)
Indian Ocean > Arabian Gulf (0.04)
(6 more...)

Industry: Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Wikitop: Using Wikipedia Category Network to Generate Topic Trees

Kumar, Saravana (College of Engineering, Guindy) | Rengarajan, Prasath (College of Engineering, Guindy) | Annie, Arockia Xavier (College of Engineering, Guindy)

AAAI ConferencesFeb-14-2017

Automated topic identification is an essential component invarious information retrieval and knowledge representationtasks such as automated summary generation, categorization search and document indexing. In this paper, we present the Wikitop system to automatically generate topic trees from the input text by performing hierarchical classification using the Wikipedia Category Network (WCN). Our preliminary results over a collection of 125 articles are encouraging and show potential of a robust methodology for automated topic tree generation.

information retrieval, machine learning, natural language, (18 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country: Asia > India > Tamil Nadu (0.15)

Industry: Media > News (0.91)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.49)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.31)

Add feedback

Cross-Domain Sentiment Classification via Topic-Related TrAdaBoost

Huang, Xingchang (Sun Yat-sen University) | Rao, Yanghui (Sun Yat-sen University) | Xie, Haoran (The Education University of Hong Kong) | Wong, Tak-Lam (The Education University of Hong Kong) | Wang, Fu Lee (Caritas Institute of Higher Education)

AAAI ConferencesFeb-14-2017

Cross-domain sentiment classification aims to tag sentiments for a target domain by labeled data from a source domain. Due to the difference between domains, the accuracy of a trained classifier may be very low. In this paper, we propose a boosting-based learning framework named TR-TrAdaBoost for cross-domain sentiment classification. We firstly explore the topic distribution of documents, and then combine it with the unigram TrAdaBoost. The topic distribution captures the domain information of documents, which is valuable for cross-domain sentiment classification. Experimental results indicate that TR-TrAdaBoost represents documents well and boost the performance and robustness of TrAdaBoost.

natural language, sentiment classification, text classification, (16 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country: Asia > China > Hong Kong (0.17)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Active Discriminative Text Representation Learning

Zhang, Ye (University of Texas at Austin) | Lease, Matthew (University of Texas at Austin) | Wallace, Byron C. (Northeastern University)

AAAI ConferencesFeb-14-2017

We propose a new active learning (AL) method for text classification with convolutional neural networks (CNNs). In AL, one selects the instances to be manually labeled with the aim of maximizing model performance with minimal effort. Neural models capitalize on word embeddings as representations (features), tuning these to the task at hand. We argue that AL strategies for multi-layered neural models should focus on selecting instances that most affect the embedding space (i.e., induce discriminative word representations). This is in contrast to traditional AL approaches (e.g., entropy-based uncertainty sampling), which specify higher level objectives. We propose a simple approach for sentence classification that selects instances containing words whose embeddings are likely to be updated with the greatest magnitude, thereby rapidly learning discriminative, task-specific embeddings. We extend this approach to document classification by jointly considering: (1) the expected changes to the constituent word representations; and (2) the model’s current overall uncertainty regarding the instance. The relative emphasis placed on these criteria is governed by a stochastic process that favors selecting instances likely to improve representations at the outset of learning, and then shifts toward general uncertainty sampling as AL progresses. Empirical results show that our method outperforms baseline AL approaches on both sentence and document classification tasks. We also show that, as expected, the method quickly learns discriminative word embeddings. To the best of our knowledge, this is the first work on AL addressing neural models for text classification.

classification, machine learning, natural language, (18 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country: North America > United States > Texas (0.14)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Semi-Supervised Multi-View Correlation Feature Learning with Application to Webpage Classification

Jing, Xiao-Yuan (Wuhan University; Nanjing University of Posts and Telecommunications) | Wu, Fei (Nanjing University of Posts and Telecommunications) | Dong, Xiwei (Nanjing University of Posts and Telecommunications) | Shan, Shiguang (Chinese Academy of Sciences (CAS)) | Chen, Songcan (Nanjing University of Aeronautics and Astronautics)

AAAI ConferencesFeb-14-2017

Webpage classification has attracted a lot of research interest. Webpage data is often multi-view and high-dimensional, and the webpage classification application is usually semi-supervised. Due to these characteristics, using semi-supervised multi-view feature learning (SMFL) technique to deal with the webpage classification problem has recently received much attention. However, there still exists room for improvement for this kind of feature learning technique. How to effectively utilize the correlation information among multi-view of webpage data is an important research topic. Correlation analysis on multi-view data can facilitate extraction of the complementary information. In this paper, we propose a novel SMFL approach, named semi-supervised multi-view correlation feature learning (SMCFL), for webpage classification. SMCFL seeks for a discriminant common space by learning a multi-view shared transformation in a semi-supervised manner. In the discriminant space, the correlation between intra-class samples is maximized, and the correlation between inter-class samples and the global correlation among both labeled and unlabeled samples are minimized simultaneously. We transform the matrix-variable based nonconvex objective function of SMCFL into a convex quadratic programming problem with one real variable, and can achieve a global optimal solution. Experiments on widely used datasets demonstrate the effectiveness and efficiency of the proposed approach.

classification, machine learning, natural language, (15 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country: Asia > China (0.29)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Text classification and Naive Bayes

AITopics Original LinksJan-19-2017, 10:55:07 GMT

Several of the preprocessing steps necessary for indexing as discussed in Chapter 2: detecting a document's encoding (ASCII, Unicode UTF-8 etc; page 2.1.1);

machine learning, natural language, text classification, (14 more...)

AITopics Original Links

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.40)
Asia > China (0.16)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.40)

Add feedback

Naive Bayesian Text Classification

AITopics Original LinksJan-18-2017, 11:20:52 GMT

Paul Graham popularized the term "Bayesian Classification" (or more accurately "Naïve Bayesian Classification") after his "A Plan for Spam" article was published (http://www.paulgraham.com/spam.html). In fact, text classifiers based on naïve Bayesian and other techniques have been around for many years. Companies such as Autonomy and Interwoven incorporate machine-learning techniques to automatically classify documents of all kinds; one such machine-learning technique is naïve Bayesian text classification. Naïve Bayesian text classifiers are fast, accurate, simple, and easy to implement. In this article, I present a complete naïve Bayesian text classifier written in 100 lines of commented, nonobfuscated Perl.

machine learning, natural language, text classification, (14 more...)

AITopics Original Links

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)

Industry: Health & Medicine > Therapeutic Area (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)

Add feedback

Supervised Word Mover's Distance

Huang, Gao, Guo, Chuan, Kusner, Matt J., Sun, Yu, Sha, Fei, Weinberger, Kilian Q.

Neural Information Processing SystemsDec-31-2016

Accurately measuring the similarity between text documents lies at the core of many real world applications of machine learning. These include web-search ranking, document recommendation, multi-lingual document matching, and article categorization. Recently, a new document metric, the word mover's distance (WMD), has been proposed with unprecedented results on kNN-based document classification. The WMD elevates high quality word embeddings to document metrics by formulating the distance between two documents as an optimal transport problem between the embedded words. However, the document distances are entirely unsupervised and lack a mechanism to incorporate supervision when available. In this paper we propose an efficient technique to learn a supervised metric, which we call the Supervised WMD (S-WMD) metric. Our algorithm learns document distances that measure the underlying semantic differences between documents by leveraging semantic differences between individual words discovered during supervised training. This is achieved with an linear transformation of the underlying word embedding space and tailored word-specific weights, learned to minimize the stochastic leave-one-out nearest neighbor classification error on a per-document level. We evaluate our metric on eight real-world text classification tasks on which S-WMD consistently outperforms almost all of our 26 competitive baselines.

machine learning, natural language, text classification, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.28)
Europe (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback