AITopics | Text Classification

Collaborating Authors

Text Classification

"A text classifier is an automated means of determining some metadata about a document. Text classifiers are used for such diverse needs as spam filtering, suggesting categories for indexing a document created in a content management system, or automatically sorting help desk requests."
– John Graham-Cumming, Naive Bayesian Text Classification. Dr. Dobb's. May 1 2005.

News Overviews Instructional Materials AI-Alerts Classics

Unsupervised automatic classification of Scanning Electron Microscopy (SEM) images of CD4+ cells with varying extent of HIV virion infection

Wandeto, John M., Dresp-Langley, Birgitta

arXiv.org Artificial IntelligenceApr-30-2019

Archiving large sets of medical or cell images in digital libraries may require ordering randomly scattered sets of image data according to specific criteria, such as the spatial extent of a specific local color or contrast content that reveals different meaningful states of a physiological structure, tissue, or cell in a certain order, indicating progression or recession of a pathology, or the progressive response of a cell structure to treatment. Here we used a Self Organized Map (SOM)-based, fully automatic and unsupervised, classification procedure described in our earlier work and applied it to sets of minimally processed grayscale and/or color processed Scanning Electron Microscopy (SEM) images of CD4+ T-lymphocytes (so-called helper cells) with varying extent of HIV virion infection. It is shown that the quantization error in the SOM output after training permits to scale the spatial magnitude and the direction of change (+ or -) in local pixel contrast or color across images of a series with a reliability that exceeds that of any human expert. The procedure is easily implemented and fast, and represents a promising step towards low-cost automatic digital image archiving with minimal intervention of a human operator.

classification, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

1905.037

Country:

Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.07)
Europe > Italy > Friuli Venezia Giulia > Trieste Province > Trieste (0.05)
Europe > Germany > Berlin (0.05)
Africa > Kenya > Nyeri County > Nyeri (0.05)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology > HIV (0.64)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.43)

Add feedback

Text Classification Algorithms: A Survey

Kowsari, Kamran, Meimandi, Kiana Jafari, Heidarysafa, Mojtaba, Mendu, Sanjana, Barnes, Laura E., Brown, Donald E.

arXiv.org Artificial IntelligenceApr-25-2019

In recent years, there has been an exponential growth in the number of complex documents and texts that require a deeper understanding of machine learning methods to be able to accurately classify texts in many applications. Many machine learning approaches have achieved surpassing results in natural language processing. The success of these learning algorithms relies on their capacity to understand complex models and non-linear relationships within data. However, finding suitable structures, architectures, and techniques for text classification is a challenge for researchers. In this paper, a brief overview of text classification algorithms is discussed. This overview covers different text feature extractions, dimensionality reduction methods, existing algorithms and techniques, and evaluations methods. Finally, the limitations of each technique and their application in the real-world problem are discussed.

classification, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.3390/info10040150

1904.08067

Country:

North America > United States > California (1.00)
Asia > Middle East (0.92)
Europe > United Kingdom (0.67)
North America > United States > Massachusetts > Middlesex County (0.46)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)
(2 more...)

Industry:

Law (1.00)
Information Technology (1.00)
Health & Medicine > Therapeutic Area (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Add feedback

Summarizing Event Sequences with Serial Episodes: A Statistical Model and an Application

Mitra, Soumyajit, Sastry, P S

arXiv.org Machine LearningMar-31-2019

In this paper we address the problem of discovering a small set of frequent serial episodes from sequential data so as to adequately characterize or summarize the data. We discuss an algorithm based on the Minimum Description Length (MDL) principle and the algorithm is a slight modification of an earlier method, called CSC-2. We present a novel generative model for sequence data containing prominent pairs of serial episodes and, using this, provide some statistical justification for the algorithm. We believe this is the first instance of such a statistical justification for an MDL based algorithm for summarizing event sequence data. We then present a novel application of this data mining algorithm in text classification. By considering text documents as temporal sequences of words, the data mining algorithm can find a set of characteristic episodes for all the training data as a whole. The words that are part of these characteristic episodes could then be considered the only relevant words for the dictionary thus resulting in a considerably reduced feature vector dimension. We show, through simulation experiments using benchmark data sets, that the discovered frequent episodes can be used to achieve more than four-fold reduction in dictionary size without losing any classification accuracy.

data mining, machine learning, pattern recognition, (22 more...)

arXiv.org Machine Learning

1904.00516

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (1.00)
Media > Television (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.68)
(5 more...)

Add feedback

Learning to Weight for Text Classification

Fernández, Alejandro Moreo, Esuli, Andrea, Sebastiani, Fabrizio

arXiv.org Machine LearningMar-28-2019

In information retrieval (IR) and related tasks, term weighting approaches typically consider the frequency of the term in the document and in the collection in order to compute a score reflecting the importance of the term for the document. In tasks characterized by the presence of training data (such as text classification) it seems logical that the term weighting function should take into account the distribution (as estimated from training data) of the term across the classes of interest. Although `supervised term weighting' approaches that use this intuition have been described before, they have failed to show consistent improvements. In this article we analyse the possible reasons for this failure, and call consolidated assumptions into question. Following this criticism we propose a novel supervised term weighting approach that, instead of relying on any predefined formula, learns a term weighting function optimised on the training set of interest; we dub this approach \emph{Learning to Weight} (LTW). The experiments that we run on several well-known benchmarks, and using different learning methods, show that our method outperforms previous term weighting approaches in text classification.

machine learning, natural language, text classification, (17 more...)

arXiv.org Machine Learning

1903.1209

Country: Europe (0.92)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Text classification using TensorFlow.js: An example of detecting offensive language in browser

#artificialintelligenceMar-16-2019, 20:37:05 GMT

Online communication platforms are increasingly overwhelmed by rude or offensive comments, which can make people give up on discussion altogether. In response to this issue, the Jigsaw team and Google's Counter Abuse technology team collaborated with sites that have shared their comments and moderation decisions to create the Perspective API. Perspective helps online communities host better conversations by, for example, enabling moderators to more quickly identify which comments might violate their community guidelines. Several publishers have also worked on systems that provide feedback to users before they publish comments (e.g. the Coral Talk Plugin). Showing a pre-submit message that a comment might violate community guidelines, or that it will be held for review before publication, has already proven to be an effective tool to encourage users to think more carefully about how their comments might be received.

machine learning, natural language, text classification, (16 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.50)

Add feedback

The 50 Best Free Datasets for Machine Learning Gengo AI

#artificialintelligenceMar-11-2019, 07:40:57 GMT

This article is also available in Japanese and Simplified Chinese. Gengo.ai has assembled a wealth of resources for machine learning and natural language processing activities. In our previous articles, we explained why datasets are such an integral part of machine learning and natural language processing. Without training datasets, machine learning algorithms would have no way of learning how to do text mining, text classification, or categorize products. So far, Gengo.ai has compiled lists of the best open datasets by industry.

machine learning, natural language, text classification, (13 more...)

#artificialintelligence

Country: North America > United States > California (0.15)

Industry:

Education (0.98)
Banking & Finance > Trading (0.31)
Transportation > Ground > Road (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.35)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.30)

Add feedback

Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification

Borkan, Daniel, Dixon, Lucas, Sorensen, Jeffrey, Thain, Nithum, Vasserman, Lucy

arXiv.org Machine LearningMar-11-2019

Machine learning systems, if not constrained, will compounding existing challenges to fairness in society often learn the simplest associations that can predict the labels, so at large. In this paper, we introduce a suite of threshold-agnostic any incorrect associations present in the training data can produce metrics that provide a nuanced view of this unintended bias, by unintended associations in the final model. Toxicity models specifically considering the various ways that a classifier's score distribution have been shown to capture and reproduce biases common can vary across designated groups. We also introduce a large new in society, for example mis-associating the names of frequently test set of online comments with crowd-sourced annotations for attacked identity groups (such as "gay", and "muslim" etc.) with identity references. We use this to show how our metrics can be toxicity [5, 17]. This unintended model bias could be due to the used to find new and potentially subtle unintended bias in existing demographic composition of the online user pool, the latent or public models.

machine learning, natural language, text classification, (21 more...)

arXiv.org Machine Learning

1903.04561

Country: North America > United States (0.47)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.41)
Information Technology > Communications > Social Media > Crowdsourcing (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.30)

Add feedback

Efficient Path Prediction for Semi-Supervised and Weakly Supervised Hierarchical Text Classification

Xiao, Huiru, Liu, Xin, Song, Yangqiu

arXiv.org Machine LearningFeb-25-2019

Hierarchical text classification has many real-world applications. However, labeling a large number of documents is costly. In practice, we can use semi-supervised learning or weakly supervised learning (e.g., dataless classification) to reduce the labeling cost. In this paper, we propose a path cost-sensitive learning algorithm to utilize the structural information and further make use of unlabeled and weakly-labeled data. We use a generative model to leverage the large amount of unlabeled data and introduce path constraints into the learning algorithm to incorporate the structural information of the class hierarchy. The posterior probabilities of both unlabeled and weakly labeled data can be incorporated with path-dependent scores. Since we put a structure-sensitive cost to the learning algorithm to constrain the classification consistent with the class hierarchy and do not need to reconstruct the feature vectors for different structures, we can significantly reduce the computational cost compared to structural output learning. Experimental results on two hierarchical text classification benchmarks show that our approach is not only effective but also efficient to handle the semi-supervised and weakly supervised hierarchical text classification.

classification, machine learning, natural language, (15 more...)

arXiv.org Machine Learning

doi: 10.1145/3308558.3313658

1902.09347

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > China > Hong Kong (0.05)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.96)

Add feedback

Fully Convolutional Networks for Text Classification

Anderson, Jacob

arXiv.org Machine LearningFeb-14-2019

In this work I propose a new way of using fully convolutional networks for classification while allowing for input of any size. I additionally propose two modifications on the idea of attention and the benefits and detriments of using the modifications. Finally, I show suboptimal results on the ITAmoji 2018 tweet to emoji task and provide a discussion about why that might be the case as well as a proposed fix to further improve results.

classification, convolution, local attention, (16 more...)

arXiv.org Machine Learning

1902.05575

Country: North America > United States > Ohio > Franklin County > Columbus (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Report on Text Classification using CNN, RNN & HAN – Jatana – Medium

#artificialintelligenceFeb-4-2019, 16:16:00 GMT

I recently joined Jatana.ai as NLP Researcher (Intern) and I was asked to work on the text classification use cases using Deep learning models. In this article I will share my experiences and learnings while experimenting with various neural networks architectures. Text classification was performed on datasets having Danish, Italian, German, English and Turkish languages. One of the widely used Natural Language Processing & Supervised Machine Learning (ML) task in different business problems is "Text Classification", it's an example of Supervised Machine Learning task since a labelled dataset containing text documents and their labels is used for training a classifier. The goal of text classification is to automatically classify the text documents into one or more predefined categories. Text Classification is a very active research area both in academia and industry.

classification, machine learning, natural language, (18 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback