Goto

Collaborating Authors

 Text Classification


Text Classification with TensorFlow Estimators

@machinelearnbot

Note: This post was written together with the awesome Julian Eisenschlos and was originally published on the TensorFlow blog. Throughout this post we will show you how to classify text using Estimators in TensorFlow. Welcome to Part 4 of a blog series that introduces TensorFlow Datasets and Estimators. You don't need to read all of the previous material, but take a look if you want to refresh any of the following concepts. Part 1 focused on pre-made Estimators, Part 2 discussed feature columns, and Part 3 how to create custom Estimators.


Text classification based on ensemble extreme learning machine

arXiv.org Artificial Intelligence

In this paper, we propose a novel approach based on cost-sensitive ensemble weighted extreme learning machine; we call this approach AE1-WELM. We apply this approach to text classification. AE1-WELM is an algorithm including balanced and imbalanced multiclassification for text classification. Weighted ELM assigning the different weights to the different samples improves the classification accuracy to a certain extent, but weighted ELM considers the differences between samples in the different categories only and ignores the differences between samples within the same categories. We measure the importance of the documents by the sample information entropy, and generate cost-sensitive matrix and factor based on the document importance, then embed the cost-sensitive weighted ELM into the AdaBoost.M1 framework seamlessly. Vector space model(VSM) text representation produces the high dimensions and sparse features which increase the burden of ELM. To overcome this problem, we develop a text classification framework combining the word vector and AE1-WELM. The experimental results show that our method provides an accurate, reliable and effective solution for text classification.


Hybrid Adaptive Fuzzy Extreme Learning Machine for text classification

arXiv.org Artificial Intelligence

In traditional ELM and its improved versions suffer from the problems of outliers or noises due to overfitting and imbalance due to distribution. We propose a novel hybrid adaptive fuzzy ELM(HA-FELM), which introduces a fuzzy membership function to the traditional ELM method to deal with the above problems. We define the fuzzy membership function not only basing on the distance between each sample and the center of the class but also the density among samples which based on the quantum harmonic oscillator model. The proposed fuzzy membership function overcomes the shortcoming of the traditional fuzzy membership function and could make itself adjusted according to the specific distribution of different samples adaptively. Experiments show the proposed HA-FELM can produce better performance than SVM, ELM, and RELM in text classification.


Machine Learning Helps Humans Perform Text Analysis

#artificialintelligence

To augment that approach, we've found that we can use machine learning to improve the semantic data models as the data set evolves. Our specific use-case is text data in millions of documents. We've found that machine learning facilitates the storage and exploration of data that would otherwise be too vast to support valuable insights. Machine Learning (ML) allows for a model to improve over time given new training data, without requiring more human effort. For example, a common text-classification benchmark task is to train a model on messages for multiple discussion board threads and then later use it to predict what the topic of discussion was (space, computers, religion, etc).


ClassiNet -- Predicting Missing Features for Short-Text Classification

arXiv.org Artificial Intelligence

The fundamental problem in short-text classification is \emph{feature sparseness} -- the lack of feature overlap between a trained model and a test instance to be classified. We propose \emph{ClassiNet} -- a network of classifiers trained for predicting missing features in a given instance, to overcome the feature sparseness problem. Using a set of unlabeled training instances, we first learn binary classifiers as feature predictors for predicting whether a particular feature occurs in a given instance. Next, each feature predictor is represented as a vertex $v_i$ in the ClassiNet where a one-to-one correspondence exists between feature predictors and vertices. The weight of the directed edge $e_{ij}$ connecting a vertex $v_i$ to a vertex $v_j$ represents the conditional probability that given $v_i$ exists in an instance, $v_j$ also exists in the same instance. We show that ClassiNets generalize word co-occurrence graphs by considering implicit co-occurrences between features. We extract numerous features from the trained ClassiNet to overcome feature sparseness. In particular, for a given instance $\vec{x}$, we find similar features from ClassiNet that did not appear in $\vec{x}$, and append those features in the representation of $\vec{x}$. Moreover, we propose a method based on graph propagation to find features that are indirectly related to a given short-text. We evaluate ClassiNets on several benchmark datasets for short-text classification. Our experimental results show that by using ClassiNet, we can statistically significantly improve the accuracy in short-text classification tasks, without having to use any external resources such as thesauri for finding related features.


Do Convolutional Networks Need to Be Deep for Text Classification ?

AAAI Conferences

We study in this work the importance of depth in convolutional models for text classification, either when character or word inputs are considered. We show on 5 standard text classification and sentiment analysis tasks that deep models indeed give better performances than shallow networks when the text input is represented as a sequence of characters. However, a simple shallow-and-wide network outperforms deep models such as DenseNet with word inputs. Our shallow word model further establishes new state-of-the-art performances on two datasets: Yelp Binary (95.9%) and Yelp Full (64.9%).


New machine-assisted text classification on Content Moderator now in public preview

#artificialintelligence

Content Moderator is part of Microsoft Cognitive Services allowing businesses to use machine assisted moderation of text, images, and videos that augment human review. The text moderation capability now includes a new machine-learning based text classification feature which uses a trained model to identify possible abusive, derogatory or discriminatory language such as slang, abbreviated words, offensive, and intentionally misspelled words for review. In contrast to the existing text moderation service that flags profanity terms, the text classification feature helps detect potentially undesired content that may be deemed as inappropriate depending on context. In addition, to convey the likelihood of each category it may recommend a human review of the content. The text classification feature is in preview and supports the English language.


Multi-Class Text Classification with PySpark – Towards Data Science

#artificialintelligence

Apache Spark is quickly gaining steam both in the headlines and real-world adoption, mainly because of its ability to process streaming data. With so much data being processed on a daily basis, it has become essential for us to be able to stream and analyze it in real time. In addition, Apache Spark is fast enough to perform exploratory queries without sampling. Many industry experts have provided all the reasons why you should use Spark for Machine Learning? So, here we are now, using Spark Machine Learning Library to solve a multi-class text classification problem, in particular, PySpark.



Text Classification: Applications and Use Cases

#artificialintelligence

Text analysis, as a whole, is an emerging field of study. Fields such as Marketing, Product Management, Academia, and Governance are already leveraging the process of analyzing and extracting information from textual data. We discussed the technology behind Text Classification, one of the essential parts of Text Analysis. Text classification or Text Categorization is the activity of labeling natural language texts with relevant categories from a predefined set. In laymen terms, text classification is a process of extracting generic tags from unstructured text.