AITopics | Text Classification

Collaborating Authors

Text Classification

"A text classifier is an automated means of determining some metadata about a document. Text classifiers are used for such diverse needs as spam filtering, suggesting categories for indexing a document created in a content management system, or automatically sorting help desk requests."
– John Graham-Cumming, Naive Bayesian Text Classification. Dr. Dobb's. May 1 2005.

News Overviews Instructional Materials AI-Alerts Classics

Text Classification for Azerbaijani Language Using Machine Learning and Embedding

Suleymanov, Umid, Kalejahi, Behnam Kiani, Amrahov, Elkhan, Badirkhanli, Rashid

arXiv.org Machine LearningDec-26-2019

Text classification systems will help to solve the text clustering problem in the Azerbaijani language. There are some text-classification applications for foreign languages, but we tried to build a newly developed system to solve this problem for the Azerbaijani language. Firstly, we tried to find out potential practice areas. The system will be useful in a lot of areas. It will be mostly used in news feed categorization. News websites can automatically categorize news into classes such as sports, business, education, science, etc. The system is also used in sentiment analysis for product reviews. For example, the company shares a photo of a new product on Facebook and the company receives a thousand comments for new products. The systems classify the comments into categories like positive or negative. The system can also be applied in recommended systems, spam filtering, etc. Various machine learning techniques such as Naive Bayes, SVM, Decision Trees have been devised to solve the text classification problem in Azerbaijani language.

category, classification, text classification, (13 more...)

arXiv.org Machine Learning

1912.13362

Country:

Asia > Azerbaijan > Baku Economic Region > Baku (0.05)
Asia > Middle East > Iran > East Azerbaijan Province > Tabriz (0.04)

Genre: Research Report (0.82)

Industry:

Media > News (0.48)
Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Simultaneous Identification of Tweet Purpose and Position

Iyer, Rahul Radhakrishnan, Pei, Yulong, Sycara, Katia

arXiv.org Machine LearningDec-24-2019

Tweet classification has attracted considerable attention recently. Most of the existing work on tweet classification focuses on topic classification, which classifies tweets into several predefined categories, and sentiment classification, which classifies tweets into positive, negative and neutral. Since tweets are different from conventional text in that they generally are of limited length and contain informal, irregular or new words, so it is difficult to determine user intention to publish a tweet and user attitude towards certain topic. In this paper, we aim to simultaneously classify tweet purpose, i.e., the intention for user to publish a tweet, and position, i.e., supporting, opposing or being neutral to a given topic. By transforming this problem to a multi-label classification problem, a multi-label classification method with post-processing is proposed. Experiments on real-world data sets demonstrate the effectiveness of this method and the results outperform the individual classification methods.

classification, post-processing strategy, tweet, (14 more...)

arXiv.org Machine Learning

2001.00051

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Oregon (0.04)
Asia (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Law (0.48)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.68)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.50)

Add feedback

TextNAS: A Neural Architecture Search Space tailored for Text Representation

Wang, Yujing, Yang, Yaming, Chen, Yiren, Bai, Jing, Zhang, Ce, Su, Guinan, Kou, Xiaoyu, Tong, Yunhai, Yang, Mao, Zhou, Lidong

arXiv.org Machine LearningDec-23-2019

Learning text representation is crucial for text classification and other language related tasks. There are a diverse set of text representation networks in the literature, and how to find the optimal one is a non-trivial problem. Recently, the emerging Neural Architecture Search (NAS) techniques have demonstrated good potential to solve the problem. Nevertheless, most of the existing works of NAS focus on the search algorithms and pay little attention to the search space. In this paper, we argue that the search space is also an important human prior to the success of NAS in different applications. Thus, we propose a novel search space tailored for text representation. Through automatic search, the discovered network architecture outperforms state-of-the-art models on various public datasets on text classification and natural language inference tasks. Furthermore, some of the design principles found in the automatic network agree well with human intuition.

architecture, arxiv preprint arxiv, search space, (13 more...)

arXiv.org Machine Learning

1912.10729

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > China (0.04)

Genre:

Research Report > New Finding (0.47)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

Tutorial: Analyze sentiment of movie reviews using a pre-trained TensorFlow model - ML.NET

#artificialintelligenceDec-19-2019, 00:05:34 GMT

Once the model is loaded, you can extract its input and output schema. The schemas are displayed for interest and learning only. The input schema is the fixed-length array of integer encoded words. The output schema is a float array of probabilities indicating whether a review's sentiment is negative, or positive . These values sum to 1, as the probability of being positive is the complement of the probability of the sentiment being negative.

analyze sentiment, pre-trained tensorflow model, sentiment, (5 more...)

#artificialintelligence

Industry:

Media > Film (0.40)
Leisure & Entertainment (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.96)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.40)

Add feedback

A Framework for Explainable Text Classification in Legal Document Review

Mahoney, Christian J., Zhang, Jianping, Huber-Fliflet, Nathaniel, Gronvall, Peter, Zhao, Haozhen

arXiv.org Artificial IntelligenceDec-19-2019

Companies regularly spend millions of dollars producing electronically-stored documents in legal matters. Recently, parties on both sides of the 'legal aisle' are accepting the use of machine learning techniques like text classification to cull massive volumes of data and to identify responsive documents for use in these matters. While text classification is regularly used to reduce the discovery costs in legal matters, it also faces a peculiar perception challenge: amongst lawyers, this technology is sometimes looked upon as a "black box", little information provided for attorneys to understand why documents are classified as responsive. In recent years, a group of AI and ML researchers have been actively researching Explainable AI, in which actions or decisions are human understandable. In legal document review scenarios, a document can be identified as responsive, if one or more of its text snippets are deemed responsive. In these scenarios, if text classification can be used to locate these snippets, then attorneys could easily evaluate the model's classification decision. When deployed with defined and explainable results, text classification can drastically enhance overall quality and speed of the review process by reducing the review time. Moreover, explainable predictive coding provides lawyers with greater confidence in the results of that supervised learning task. This paper describes a framework for explainable text classification as a valuable tool in legal services: for enhancing the quality and efficiency of legal document review and for assisting in locating responsive snippets within responsive documents. This framework has been implemented in our legal analytics product, which has been used in hundreds of legal matters. We also report our experimental results using the data from an actual legal matter that used this type of document review.

classification, rationale, snippet, (15 more...)

arXiv.org Artificial Intelligence

1912.09501

Country:

North America > United States > District of Columbia > Washington (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Law > Litigation (0.91)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

PySS3: A Python package implementing a novel text classifier with visualization tools for Explainable AI

Burdisso, Sergio G., Errecalde, Marcelo, Montes-y-Gómez, Manuel

arXiv.org Artificial IntelligenceDec-19-2019

A recently introduced text classifier, called SS3, has obtained state-of-the-art performance on the CLEF's eRisk tasks. SS3 was created to deal with risk detection over text streams and therefore not only supports incremental training and classification but also can visually explain its rationale. However, little attention has been paid to the potential use of SS3 as a general classifier. We believe this could be due to the unavailability of an open-source implementation of SS3. In this work, we introduce PySS3, a package that not only implements SS3 but also comes with visualization tools that allow researchers deploying robust, explainable and trusty machine learning models for text classification.

classifier, confidence vector, pyss3, (14 more...)

arXiv.org Artificial Intelligence

1912.09322

Country:

South America > Argentina (0.04)
North America > Mexico > Puebla (0.04)
Europe > Switzerland (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.40)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.40)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.35)

Add feedback

Dual-Attention Graph Convolutional Network

Zhang, Xueya, Zhang, Tong, Zhao, Wenting, Cui, Zhen, Yang, Jian

arXiv.org Machine LearningNov-27-2019

Graph convolutional networks (GCNs) have shown the powerful ability in text structure representation and effectively facilitate the task of text classification. However, challenges still exist in adapting GCN on learning discriminative features from texts due to the main issue of graph variants incurred by the textual complexity and diversity. In this paper, we propose a dual-attention GCN to model the structural information of various texts as well as tackle the graph-invariant problem through embedding two types of attention mechanisms, i.e. the connection-attention and hop-attention, into the classic GCN. To encode various connection patterns between neighbour words, connection-attention adaptively imposes different weights specified to neighbourhoods of each word, which captures the short-term dependencies. On the other hand, the hop-attention applies scaled coefficients to different scopes during the graph diffusion process to make the model learn more about the distribution of context, which captures long-term semantics in an adaptive way. Extensive experiments are conducted on five widely used datasets to evaluate our dual-attention GCN, and the achieved state-of-the-art performance verifies the effectiveness of dual-attention mechanisms.

classification, mechanism, text classification, (14 more...)

arXiv.org Machine Learning

1911.12486

Country: Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.70)

Add feedback

Word-Class Embeddings for Multiclass Text Classification

Moreo, Alejandro, Esuli, Andrea, Sebastiani, Fabrizio

arXiv.org Machine LearningNov-26-2019

Pre-trained word embeddings encode general word semantics and lexical regularities of natural language, and have proven useful across many NLP tasks, including word sense disambiguation, machine translation, and sentiment analysis, to name a few. In supervised tasks such as multiclass text classification (the focus of this article) it seems appealing to enhance word representations with ad-hoc embeddings that encode task-specific information. We propose (supervised) word-class embeddings (WCEs), and show that, when concatenated to (unsupervised) pre-trained word embeddings, they substantially facilitate the training of deep-learning models in multiclass classification by topic. We show empirical evidence that WCEs yield a consistent improvement in multiclass classification accuracy, using four popular neural architectures and six widely used and publicly available datasets for multiclass text classification. Our code that implements WCEs is publicly available at https://github.com/AlexMoreo/word-class-embeddings

classification, proceedings, wce, (17 more...)

arXiv.org Machine Learning

1911.11506

Country:

Europe > Denmark > Capital Region > Copenhagen (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Canada > Quebec > Montreal (0.04)
(11 more...)

Genre: Research Report > New Finding (0.93)

Industry: Government (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

How Do You #relax When You're #stressed? A Content Analysis and Infodemiology Study of Stress-Related Tweets

Doan, Son, Ritchart, Amanda, Perry, Nicholas, Chaparro, Juan D, Conway, Mike

arXiv.org Artificial IntelligenceNov-22-2019

Background: Stress is a contributing factor to many major health problems in the United States, such as heart disease, depression, and autoimmune diseases. Relaxation is often recommended in mental health treatment as a frontline strategy to reduce stress, thereby improving health conditions. Objective: The objective of our study was to understand how people express their feelings of stress and relaxation through Twitter messages. Methods: We first performed a qualitative content analysis of 1326 and 781 tweets containing the keywords "stress" and "relax", respectively. We then investigated the use of machine learning algorithms to automatically classify tweets as stress versus non stress and relaxation versus non relaxation. Finally, we applied these classifiers to sample datasets drawn from 4 cities with the goal of evaluating the extent of any correlation between our automatic classification of tweets and results from public stress surveys. Results: Content analysis showed that the most frequent topic of stress tweets was education, followed by work and social relationships. The most frequent topic of relaxation tweets was rest and vacation, followed by nature and water. When we applied the classifiers to the cities dataset, the proportion of stress tweets in New York and San Diego was substantially higher than that in Los Angeles and San Francisco. Conclusions: This content analysis and infodemiology study revealed that Twitter, when used in conjunction with natural language processing techniques, is a useful data source for understanding stress and stress management strategies, and can potentially supplement infrequently collected survey-based stress data.

machine learning, natural language, text classification, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.2196/publichealth.5939

1911.09242

Country:

North America > United States > California > San Diego County > San Diego (0.27)
North America > United States > California > San Francisco County > San Francisco (0.26)
North America > United States > California > Los Angeles County > Los Angeles (0.26)
(8 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.69)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Consumer Health (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.34)

Add feedback

Deep Discriminative Fine-Tuning for Cancer Type Classification

Harley, Alena

arXiv.org Machine LearningNov-15-2019

Determining the primary site of origin for metastatic tumors is one of the open problems in cancer care because the efficacy of treatment often depends on the cancer tissue of origin. Classification methods that can leverage tumor genomic data and predict the site of origin are therefore of great value. Because tumor DNA point mutation data is very sparse, only limited accuracy (64.5% for 12 tumor classes) was previously demonstrated by methods that rely on point mutations as features (1). Tumor classification accuracy can be greatly improved (to over 90% for 33 classes) by relying on gene expression data (2). However, this additional data is often not readily available in clinical setting, because point mutations are better profiled and targeted by clinical mutational profiling. Here we sought to develop an accurate deep transfer learning and fine-tuning method for tumor sub-type classification, where predicted class is indicative of the primary site of origin. Our method significantly outperforms the state-of-the-art for tumor classification using DNA point mutations, reducing the error by more than 30% at the same time discriminating over many more classes on The Cancer Genome Atlas (TCGA) dataset. Using our method, we achieve state-of-the-art tumor type classification accuracy of 78.3% for 29 tumor classes relying on DNA point mutations in the tumor only.

classification, mutation, point mutation, (12 more...)

arXiv.org Machine Learning

1911.07654

Country: North America > United States > California > Santa Clara County > Mountain View (0.04)

Genre: Research Report (0.51)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback