AITopics | Text Classification

Collaborating Authors

Text Classification

"A text classifier is an automated means of determining some metadata about a document. Text classifiers are used for such diverse needs as spam filtering, suggesting categories for indexing a document created in a content management system, or automatically sorting help desk requests."
– John Graham-Cumming, Naive Bayesian Text Classification. Dr. Dobb's. May 1 2005.

News Overviews Instructional Materials AI-Alerts Classics

r/MachineLearning - [D] Text classification on a small dataset

@machinelearnbotMay-14-2018, 19:38:30 GMT

I am trying to perform multiclass text classification (for 24 classes) on a set documents, but I have a very small dataset currently (1200 total examples). The data collection process is a bit tedious in my case, hence the small dataset size. The best result I have achieved till now is 58% accuracy with an SVM model and a single layer CNN model. Is there any other approach I can try other than collecting more data? I have tried oversampling the training set, but it didn't seem to improve the performance.

machine learning, natural language, text classification, (4 more...)

@machinelearnbot

Industry: Media > News (0.40)

Technology:

Information Technology > Communications > Social Media (0.76)
Information Technology > Artificial Intelligence > Machine Learning (0.75)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.71)

Add feedback

Text Classification with TensorFlow Estimators

@machinelearnbotMay-13-2018, 17:05:46 GMT

Note: This post was written together with the awesome Julian Eisenschlos and was originally published on the TensorFlow blog. Throughout this post we will show you how to classify text using Estimators in TensorFlow. Welcome to Part 4 of a blog series that introduces TensorFlow Datasets and Estimators. You don't need to read all of the previous material, but take a look if you want to refresh any of the following concepts. Part 1 focused on pre-made Estimators, Part 2 discussed feature columns, and Part 3 how to create custom Estimators.

machine learning, natural language, text classification, (16 more...)

@machinelearnbot

Industry:

Media > Film (0.31)
Leisure & Entertainment (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.42)

Add feedback

Text classification based on ensemble extreme learning machine

Li, Ming, Xiao, Peilun, Zhang, Ju

arXiv.org Artificial IntelligenceMay-10-2018

In this paper, we propose a novel approach based on cost-sensitive ensemble weighted extreme learning machine; we call this approach AE1-WELM. We apply this approach to text classification. AE1-WELM is an algorithm including balanced and imbalanced multiclassification for text classification. Weighted ELM assigning the different weights to the different samples improves the classification accuracy to a certain extent, but weighted ELM considers the differences between samples in the different categories only and ignores the differences between samples within the same categories. We measure the importance of the documents by the sample information entropy, and generate cost-sensitive matrix and factor based on the document importance, then embed the cost-sensitive weighted ELM into the AdaBoost.M1 framework seamlessly. Vector space model(VSM) text representation produces the high dimensions and sparse features which increase the burden of ELM. To overcome this problem, we develop a text classification framework combining the word vector and AE1-WELM. The experimental results show that our method provides an accurate, reliable and effective solution for text classification.

classification, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

1805.06525

Country: Asia > China (0.29)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.88)

Add feedback

Hybrid Adaptive Fuzzy Extreme Learning Machine for text classification

Li, Ming, Xiao, Peilun, Zhang, Ju

arXiv.org Artificial IntelligenceMay-10-2018

In traditional ELM and its improved versions suffer from the problems of outliers or noises due to overfitting and imbalance due to distribution. We propose a novel hybrid adaptive fuzzy ELM(HA-FELM), which introduces a fuzzy membership function to the traditional ELM method to deal with the above problems. We define the fuzzy membership function not only basing on the distance between each sample and the center of the class but also the density among samples which based on the quantum harmonic oscillator model. The proposed fuzzy membership function overcomes the shortcoming of the traditional fuzzy membership function and could make itself adjusted according to the specific distribution of different samples adaptively. Experiments show the proposed HA-FELM can produce better performance than SVM, ELM, and RELM in text classification.

machine learning, natural language, text classification, (15 more...)

arXiv.org Artificial Intelligence

1805.06524

Country: Asia > China (0.33)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.98)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.88)

Add feedback

Machine Learning Helps Humans Perform Text Analysis

#artificialintelligenceMay-3-2018, 17:32:03 GMT

To augment that approach, we've found that we can use machine learning to improve the semantic data models as the data set evolves. Our specific use-case is text data in millions of documents. We've found that machine learning facilitates the storage and exploration of data that would otherwise be too vast to support valuable insights. Machine Learning (ML) allows for a model to improve over time given new training data, without requiring more human effort. For example, a common text-classification benchmark task is to train a model on messages for multiple discussion board threads and then later use it to predict what the topic of discussion was (space, computers, religion, etc).

help human perform text analysis, machine learning, natural language, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.40)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.36)

Add feedback

ClassiNet -- Predicting Missing Features for Short-Text Classification

Bollegala, Danushka, Atanasov, Vincent, Maehara, Takanori, Kawarabayashi, Ken-ichi

arXiv.org Artificial IntelligenceApr-14-2018

The fundamental problem in short-text classification is \emph{feature sparseness} -- the lack of feature overlap between a trained model and a test instance to be classified. We propose \emph{ClassiNet} -- a network of classifiers trained for predicting missing features in a given instance, to overcome the feature sparseness problem. Using a set of unlabeled training instances, we first learn binary classifiers as feature predictors for predicting whether a particular feature occurs in a given instance. Next, each feature predictor is represented as a vertex $v_i$ in the ClassiNet where a one-to-one correspondence exists between feature predictors and vertices. The weight of the directed edge $e_{ij}$ connecting a vertex $v_i$ to a vertex $v_j$ represents the conditional probability that given $v_i$ exists in an instance, $v_j$ also exists in the same instance. We show that ClassiNets generalize word co-occurrence graphs by considering implicit co-occurrences between features. We extract numerous features from the trained ClassiNet to overcome feature sparseness. In particular, for a given instance $\vec{x}$, we find similar features from ClassiNet that did not appear in $\vec{x}$, and append those features in the representation of $\vec{x}$. Moreover, we propose a method based on graph propagation to find features that are indirectly related to a given short-text. We evaluate ClassiNets on several benchmark datasets for short-text classification. Our experimental results show that by using ClassiNet, we can statistically significantly improve the accuracy in short-text classification tasks, without having to use any external resources such as thesauri for finding related features.

machine learning, natural language, text classification, (18 more...)

arXiv.org Artificial Intelligence

1804.0526

Country:

North America > United States (1.00)
Asia (1.00)
Europe (0.67)

Genre: Research Report > New Finding (0.88)

Industry:

Banking & Finance (0.67)
Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (1.00)
(6 more...)

Add feedback

Do Convolutional Networks Need to Be Deep for Text Classification ?

Le, Hoa T. (Laboratory LORIA) | Cerisara, Christophe (Laboratory LORIA) | Denis, Alexandre (SESAMm)

AAAI ConferencesApr-6-2018

We study in this work the importance of depth in convolutional models for text classification, either when character or word inputs are considered. We show on 5 standard text classification and sentiment analysis tasks that deep models indeed give better performances than shallow networks when the text input is represented as a sequence of characters. However, a simple shallow-and-wide network outperforms deep models such as DenseNet with word inputs. Our shallow word model further establishes new state-of-the-art performances on two datasets: Yelp Binary (95.9%) and Yelp Full (64.9%).

artificial intelligence, convolutional network, text classification, (1 more...)

AAAI Conferences

Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.80)

Add feedback

New machine-assisted text classification on Content Moderator now in public preview

#artificialintelligenceMar-16-2018, 03:33:20 GMT

Content Moderator is part of Microsoft Cognitive Services allowing businesses to use machine assisted moderation of text, images, and videos that augment human review. The text moderation capability now includes a new machine-learning based text classification feature which uses a trained model to identify possible abusive, derogatory or discriminatory language such as slang, abbreviated words, offensive, and intentionally misspelled words for review. In contrast to the existing text moderation service that flags profanity terms, the text classification feature helps detect potentially undesired content that may be deemed as inappropriate depending on context. In addition, to convey the likelihood of each category it may recommend a human review of the content. The text classification feature is in preview and supports the English language.

artificial intelligence, content moderator, natural language, (8 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)

Add feedback

Multi-Class Text Classification with PySpark – Towards Data Science

#artificialintelligenceMar-11-2018, 16:21:59 GMT

Apache Spark is quickly gaining steam both in the headlines and real-world adoption, mainly because of its ability to process streaming data. With so much data being processed on a daily basis, it has become essential for us to be able to stream and analyze it in real time. In addition, Apache Spark is fast enough to perform exploratory queries without sampling. Many industry experts have provided all the reasons why you should use Spark for Machine Learning? So, here we are now, using Spark Machine Learning Library to solve a multi-class text classification problem, in particular, PySpark.

logistic regression, machine learning, natural language, (12 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.64)

Add feedback

[1802.10200] Brain Tumor Type Classification via Capsule Networks

@machinelearnbotMar-5-2018, 08:45:16 GMT

Which authors of this paper are endorsers? Disable MathJax (What is MathJax?)

artificial intelligence, natural language, text classification, (4 more...)

@machinelearnbot

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.52)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.52)

Add feedback