AITopics | Text Classification

Collaborating Authors

Text Classification

"A text classifier is an automated means of determining some metadata about a document. Text classifiers are used for such diverse needs as spam filtering, suggesting categories for indexing a document created in a content management system, or automatically sorting help desk requests."
– John Graham-Cumming, Naive Bayesian Text Classification. Dr. Dobb's. May 1 2005.

News Overviews Instructional Materials AI-Alerts Classics

A Short Introduction to Using Word2Vec for Text Classification

#artificialintelligenceMar-26-2016, 17:25:26 GMT

Machine learning applications on natural language are an extremely important tool in the data scientist's toolbox. Use cases can include auto-detecting the language of a website, detecting spam in your spam filter, or auto-completing search queries. When you're working with text data, an important use case is text classification, where the data scientist is tasked with creating an algorithm that can figure out what a bit of text is all about (what is the tagline) based on what is written in the document. This can be used in a myriad of examples we see everyday, tagging things such as blog articles, app descriptions, and reviews. In many cases traditional text classification can be difficult to scale, because as the order of the taxonomy count increases, the amount of training required increases as well.

machine learning, natural language, text classification, (16 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.84)

Add feedback

Walmart Kaggle: Trip Type Classification

@machinelearnbotMar-23-2016, 09:30:53 GMT

They took the NYC Data Science Academy 12-week full-time data science bootcamp program from Sep. 23 to Dec. 18, 2015. The post was based on their fourth in-class project (due after the 8th week of the program). Walmart uses trip type classification to segment its shoppers and their store visits to better improve the shopping experience. Walmart's trip types are created from a combination of existing customer insights and purchase history data. The purpose of the Kaggle competition is to use only the purchase data provided to derive Walmart's classification labels.

machine learning, natural language, text classification, (16 more...)

@machinelearnbot

Industry: Retail (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.43)

Add feedback

Text Classification & Sentiment Analysis tutorial / blog

@machinelearnbotMar-22-2016, 02:23:18 GMT

For a more technical explanation, this and this article can be read. Here you can find a good explanation as well as a list of the mostly used Kernel functions.

machine learning, natural language, text classification, (14 more...)

@machinelearnbot

Country:

North America > United States (0.14)
Europe > Netherlands > South Holland > The Hague (0.05)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.68)
(3 more...)

Add feedback

Document Type Classification in Online Digital Libraries

Caragea, Cornelia (University of North Texas) | Wu, Jian (Pennsylvania State University) | Gollapalli, Sujatha Das (Institute for Infocomm Research, A*STAR) | Giles, C. Lee (Pennsylvania State University)

AAAI ConferencesFeb-10-2016

Online digital libraries make it easier for researchers to search for scientific information. They have been proven as powerful resources in many data mining, machine learning and information retrieval applications that require high-quality data. The quality of the data highly depends on the accuracy of classifiers that identify the types of documents that are crawled from the Web, e.g., as research papers, slides, books, etc., for appropriate indexing. These classifiers in turn depend on the choice of the feature representation. We propose novel features that result in high-accuracy classifiers for document type classification. Experimental results on several datasets show that our classifiers outperform models that are employed in current systems.

classification, natural language, text classification, (21 more...)

AAAI Conferences

Twenty-Eighth IAAI Conference

Country:

North America > United States > Texas > Denton County > Denton (0.14)
North America > United States > Pennsylvania > Centre County > University Park (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(3 more...)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)

Add feedback

Distributional Correspondence Indexing for Cross-Lingual and Cross-Domain Sentiment Classification.

Moreo Fernández, Alejandro, Esuli, Andrea, Sebastiani, Fabrizio

Journal of Artificial Intelligence ResearchJan-20-2016

Domain Adaptation (DA) techniques aim at enabling machine learning methods learn effective classifiers for a "target'' domain when the only available training data belongs to a different "source'' domain. In this paper we present the Distributional Correspondence Indexing (DCI) method for domain adaptation in sentiment classification. DCI derives term representations in a vector space common to both domains where each dimension reflects its distributional correspondence to a pivot, i.e., to a highly predictive term that behaves similarly across domains. Term correspondence is quantified by means of a distributional correspondence function (DCF). We propose a number of efficient DCFs that are motivated by the distributional hypothesis, i.e., the hypothesis according to which terms with similar meaning tend to have similar distributions in text. Experiments show that DCI obtains better performance than current state-of-the-art techniques for cross-lingual and cross-domain sentiment classification. DCI also brings about a significantly reduced computational cost, and requires a smaller amount of human intervention. As a final contribution, we discuss a more challenging formulation of the domain adaptation problem, in which both the cross-domain and cross-lingual dimensions are tackled simultaneously.

adaptation, dataset, proceedings, (15 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.4762

AI Access Foundation

10977

Journal of Artificial Intelligence Research

Country:

North America > United States > Nevada > Clark County > Las Vegas (0.04)
Asia > South Korea (0.04)
Asia > Singapore (0.04)
(13 more...)

Genre:

Research Report > Experimental Study (0.93)
Research Report > Promising Solution (0.66)

Industry:

Media (0.67)
Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Character-level Convolutional Networks for Text Classification

Zhang, Xiang, Zhao, Junbo, LeCun, Yann

Neural Information Processing SystemsDec-31-2015

This article offers an empirical exploration on the use of character-level convolutional networks (ConvNets) for text classification. We constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results. Comparisons are offered against traditional models such as bag of words, n-grams and their TFIDF variants, and deep learning models such as word-based ConvNets and recurrent neural networks.

machine learning, natural language, text classification, (18 more...)

Neural Information Processing Systems

Country:

Europe (0.46)
Asia (0.28)
North America > United States > New York (0.14)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Subspace Learning Framework for Cross-Lingual Sentiment Classification with Partial Parallel Data

Zhou, Guangyou (Central China Normal University) | He, Tingting (Central China Normal University) | Zhao, Jun (National Laboratory of Pattern Recognition, CASIA) | Wu, Wensheng (University of Southern California)

AAAI ConferencesJul-15-2015

Cross-lingual sentiment classification aims to automatically predict sentiment polarity (e.g., positive or negative) of data in a label-scarce target language by exploiting labeled data from a label-rich language. The fundamental challenge of cross-lingual learning stems from a lack of overlap between the feature spaces of the source language data and that of the target language data. To address this challenge, previous work in the literature mainly relies on the large amount of bilingual parallel corpora to bridge the language gap. In many real applications, however, it is often the case that we have some partial parallel data but it is an expensive and time-consuming job to acquire large amount of parallel data on different languages. In this paper, we propose a novel subspace learning framework by leveraging the partial parallel data for cross-lingual sentiment classification. The proposed approach is achieved by jointly learning the document-aligned review data and un-aligned data from the source language and the target language via a non-negative matrix factorization framework. We conduct a set of experiments with cross-lingual sentiment classification tasks on multilingual Amazon product reviews. Our experimental results demonstrate the efficacy of the proposed cross-lingual approach.

classification, sentiment classification, target language, (13 more...)

AAAI Conferences

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Asia > China > Beijing > Beijing (0.04)
Asia > India (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Feature Ensemble Plus Sample Selection: Domain Adaptation for Sentiment Classification (Extended Abstract)

Xia, Rui (Nanjing University of Science and Technology) | Zong, Chengqing (Chinese Academy of Sciences) | Hu, Xuelei (Nanjing University of Science and Technology) | Cambria, Erik (Nanyang Technological University)

AAAI ConferencesJul-15-2015

The domain adaptation problem arises often in the field of sentiment classification. There are two distinct needs in domain adaptation, namely labeling adaptation and instance adaptation. Most of current research focuses on the former one, while neglects the latter one. In this work, we propose a joint approach, named feature ensemble plus sample selection (SS-FE), which takes both types of adaptation into account. A feature ensemble (FE) model is first proposed to learn a new labeling function in a feature re-weighting manner. Furthermore, a PCA-based sample selection (PCA-SS) method is proposed as an aid to FE for instance adaptation. Experimental results show that the proposed SS-FE approach could gain significant improvements, compared to individual FE and PCA-SS, due to its comprehensive consideration of both labeling adaptation and instance adaptation.

adaptation, domain adaptation, sentiment classification, (13 more...)

AAAI Conferences

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Europe > Switzerland (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.88)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.88)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.63)

Add feedback

Recurrent Convolutional Neural Networks for Text Classification

Lai, Siwei (Chinese Academy of Sciences) | Xu, Liheng (Chinese Academy of Sciences) | Liu, Kang (Chinese Academy of Sciences) | Zhao, Jun (Chinese Academy of Sciences)

AAAI ConferencesMar-6-2015

Text classification is a foundational task in many NLP applications. Traditional text classifiers often rely on many human-designed features, such as dictionaries, knowledge bases and special tree kernels. In contrast to traditional methods, we introduce a recurrent convolutional neural network for text classification without human-designed features. In our model, we apply a recurrent structure to capture contextual information as far as possible when learning word representations, which may introduce considerably less noise compared to traditional window-based neural networks. We also employ a max-pooling layer that automatically judges which words play key roles in text classification to capture the key components in texts. We conduct experiments on four commonly used datasets. The experimental results show that the proposed method outperforms the state-of-the-art methods on several datasets, particularly on document-level datasets.

machine learning, natural language, text classification, (18 more...)

AAAI Conferences

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country:

Asia > China (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Czechia > South Moravian Region > Brno (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Dataless Text Classification with Descriptive LDA

Chen, Xingyuan (Leshan Normal University) | Xia, Yunqing (Tsinghua University) | Jin, Peng (Leshan Normal University) | Carroll, John (University of Sussex)

AAAI ConferencesMar-6-2015

Manually labeling documents for training a text classifier is expensive and time-consuming. Moreover, a classifier trained on labeled documents may suffer from overfitting and adaptability problems. Dataless text classification (DLTC) has been proposed as a solution to these problems, since it does not require labeled documents. Previous research in DLTC has used explicit semantic analysis of Wikipedia content to measure semantic distance between documents, which is in turn used to classify test documents based on nearest neighbours. The semantic-based DLTC method has a major drawback in that it relies on a large-scale, finely-compiled semantic knowledge base, which is difficult to obtain in many scenarios. In this paper we propose a novel kind of model, descriptive LDA (DescLDA), which performs DLTC with only category description words and unlabeled documents. In DescLDA, the LDA model is assembled with a describing device to infer Dirichlet priors from prior descriptive documents created with category description words. The Dirichlet priors are then used by LDA to induce category-aware latent topics from unlabeled documents. Experimental results with the 20Newsgroups and RCV1 datasets show that: (1) our DLTC method is more effective than the semantic-based DLTC baseline method; and (2) the accuracy of our DLTC method is very close to state-of-the-art supervised text classification methods. As neither external knowledge resources nor labeled documents are required, our DLTC method is applicable to a wider range of scenarios.

desclda, machine learning, natural language, (19 more...)

AAAI Conferences

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
(3 more...)

Add feedback