AITopics | Text Classification

Collaborating Authors

Text Classification

"A text classifier is an automated means of determining some metadata about a document. Text classifiers are used for such diverse needs as spam filtering, suggesting categories for indexing a document created in a content management system, or automatically sorting help desk requests."
– John Graham-Cumming, Naive Bayesian Text Classification. Dr. Dobb's. May 1 2005.

News Overviews Instructional Materials AI-Alerts Classics

Online Learning Guide with Text Classification using Vowpal Wabbit (VW)

@machinelearnbotJan-18-2018, 04:55:14 GMT

A large number of E-Commerce and tech companies rely on real time training and predictions for their products. Google predicts real time click-through rates for their ads. This is used as an input to their auction mechanism, apart from a bid from the advertiser to decide which ads to show to the user. Stackoverflow uses real time predictions to automatically tag a question with the correct programming language so that they reach the right asker. An election management team might want to predict real time sentiment using Twitter to assess the impact of their campaign.

machine learning, natural language, vowpal wabbit, (14 more...)

@machinelearnbot

Industry:

Information Technology > Services (0.55)
Education > Educational Setting > Online (0.47)

Technology:

Information Technology > Communications > Social Media (0.89)
Information Technology > Artificial Intelligence > Machine Learning (0.71)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.47)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.41)

Add feedback

Fine-tuned Language Models for Text Classification

Howard, Jeremy, Ruder, Sebastian

arXiv.org Machine LearningJan-18-2018

Transfer learning has revolutionized computer vision, but existing approaches in NLP still require task-specific modifications and training from scratch. We propose Fine-tuned Language Models (FitLaM), an effective transfer learning method that can be applied to any task in NLP, and introduce techniques that are key for fine-tuning a state-of-the-art language model. Our method significantly outperforms the state-of-the-art on five text classification tasks, reducing the error by 18-24% on the majority of datasets. We open-source our pretrained models and code to enable adoption by the community.

machine learning, natural language, text classification, (14 more...)

arXiv.org Machine Learning

1801.06146

Country: North America (0.46)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Email Spam Filtering : A python implementation with scikit-learn

@machinelearnbotJan-17-2018, 23:28:53 GMT

This article was written by ML bot2 on Machine Learning in Action. Text mining (deriving information from text) is a wide field which has gained popularity with the huge text data being generated. Automation of a number of applications like sentiment analysis, document classification, topic classification, text summarization, machine translation, etc has been done using machine learning models. Spam filtering is a beginner's example of document classification task which involves classifying an email as spam or non-spam (a.k.a. Spam box in your Gmail account is the best example of this.

natural language, python implementation, text classification, (6 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.91)

Add feedback

Automated Text Classification Using Machine Learning

#artificialintelligenceJan-12-2018, 09:20:02 GMT

Digitization has changed the way we process and analyze information. There is an exponential increase in online availability of information. From web pages to emails, science journals, e-books, learning content, news and social media are all full of textual data. The idea is to create, analyze and report information fast. This is when automated text classification steps up.

classification, machine learning, natural language, (14 more...)

#artificialintelligence

Industry: Media (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.79)

Add feedback

A framework for automated rating of online reviews against the underlying topics

@machinelearnbotJan-2-2018, 09:20:17 GMT

Even though the most online review systems offer star rating in addition to free text reviews, this only applies to the overall review. However, different users may have different preferences in relation to different aspects of a product or a service and may struggle to extract relevant information from a massive amount of consumer reviews available online. In this paper, we present a framework for extracting prevalent topics from online reviews and automatically rating them on a 5-star scale. It consists of five modules, including linguistic pre-processing, topic modelling, text classification, sentiment analysis, and rating. Topic modelling is used to extract prevalent topics, which are then used to classify individual sentences against these topics.

online review, prevalent topic, sentiment, (3 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.40)

Add feedback

Aggressive Sampling for Multi-class to Binary Reduction with Applications to Text Classification

Joshi, Bikash, Amini, Massih R., Partalas, Ioannis, Iutzeler, Franck, Maximov, Yury

Neural Information Processing SystemsDec-31-2017

We address the problem of multi-class classification in the case where the number of classes is very large. We propose a double sampling strategy on top of a multi-class to binary reduction strategy, which transforms the original multi-class problem into a binary classification problem over pairs of examples. The aim of the sampling strategy is to overcome the curse of long-tailed class distributions exhibited in majority of large-scale multi-class classification problems and to reduce the number of pairs of examples in the expanded data. We show that this strategy does not alter the consistency of the empirical risk minimization principle defined over the double sample reduction. Experiments are carried out on DMOZ and Wikipedia collections with 10,000 to 100,000 classes where we show the efficiency of the proposed approach in terms of training and prediction time, memory consumption, and predictive performance with respect to state-of-the-art approaches.

classification, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States (0.68)

Genre: Research Report (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Tensorflow Docker Production ready AI product – #WeCoCreate – Medium

#artificialintelligenceDec-29-2017, 04:12:26 GMT

Everyone is talking about training the Deep Learning models and fine tuning them but very few talks about the deployment and the scalability aspects. In BotSupply, we focus not only on building accurate Machine Learning models, but also on delivering them to the clients with the greater efficiency. In this article, we will learn to deploy a sentiment analysis model trained on "Character-level Convolutional Networks for Text Classification" (Xiang Zhang, Junbo Zhao, Yann LeCun) which uses character-level ConvNet networks for text classification. Check out his great blog post on CNN classification. As explained in the above blog about the training process, I am pre-assuming that you have already trained your sentiment analysis model.

machine learning, natural language, text classification, (9 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.58)

Add feedback

[D] Max-over-time pooling vs no max-pooling for text classification? • r/MachineLearning

@machinelearnbotDec-26-2017, 12:50:16 GMT

Kim 2014 and Collobert 2011 argue that max-over-time pooling helps getting the words from a sentence that are most important to the semantics. Then I read a blog post from the Googler Lakshmanan V on text classification. The author argues that spatial invariance isn't wanted because it's important where words are placed in a sentence. Thus he doesn't recommend maxpool. Are there empirical studies that compares the two approaches?

artificial intelligence, natural language, text classification, (3 more...)

@machinelearnbot

Industry: Media > News (0.40)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.73)

Add feedback

TensorFlow for R

@machinelearnbotDec-12-2017, 23:00:23 GMT

You'll work with the IMDB dataset: a set of 50,000 highly polarized reviews from the Internet Movie Database. They're split into 25,000 reviews for training and 25,000 reviews for testing, each set consisting of 50% negative and 50% positive reviews. Because you should never test a machine-learning model on the same data that you used to train it! Just because a model performs well on its training data doesn't mean it will perform well on data it has never seen; and what you care about is your model's performance on new data (because you already know the labels of your training data – obviously you don't need your model to predict those). For instance, it's possible that your model could end up merely memorizing a mapping between your training samples and their targets, which would be useless for the task of predicting targets for data the model has never seen before.

machine learning, natural language, text classification, (6 more...)

@machinelearnbot

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.40)

Add feedback

On Extending Neural Networks with Loss Ensembles for Text Classification

Hajiabadi, Hamideh, Molla-Aliod, Diego, Monsefi, Reza

arXiv.org Machine LearningNov-14-2017

Ensemble techniques are powerful approaches that combine several weak learners to build a stronger one. As a meta learning framework, ensemble techniques can easily be applied to many machine learning techniques. In this paper we propose a neural network extended with an ensemble loss function for text classification. The weight of each weak loss function is tuned within the training phase through the gradient propagation optimization method of the neural network. The approach is evaluated on several text classification datasets. We also evaluate its performance in various environments with several degrees of label noise. Experimental results indicate an improvement of the results and strong resilience against label noise in comparison with other methods.

machine learning, natural language, text classification, (16 more...)

arXiv.org Machine Learning

1711.0517

Country:

Asia > Middle East > Iran (0.15)
Oceania > Australia > New South Wales (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.92)

Add feedback