Text Classification


Classification problems with Tensorflow. – Mohammed Rampurawala – Medium

#artificialintelligence

How to achieve classfication in tensorflow? Classification is the process of determining/predicting the class of given data points by using the existing points labeling. It comes under the category of supervised learning in which the model learns from the given data points and then uses this learning to classify new observations, this data can be bi-class(spam or not spam) or multi-class(Grade A, Grade B, Grade C or Grade D). A classification problem is when the output variable is a category, such as "green" or "red" and "spam" or "not spam". There are various applications in classification in many domains such as in Medical diagnosis, Grading system, Scores predication, etc.


Can automated smoothing significantly improve benchmark time series classification algorithms?

arXiv.org Machine Learning

We assess whether using six smoothing algorithms (moving average, exponential smoothing, Gaussian filter, Savitzky-Golay filter, Fourier approximation and a recursive median sieve) could be automatically applied to time series classification problems as a preprocessing step to improve the performance of three benchmark classifiers (1-Nearest Neighbour with Euclidean and Dynamic Time Warping distances, and Rotation Forest). We found no significant improvement over unsmoothed data even when we set the smoothing parameter through cross validation. We are not claiming smoothing has no worth. It has an important role in exploratory analysis and helps with specific classification problems where domain knowledge can be exploited. What we observe is that the automatic application does not help and that we cannot explain the improvement of other time series classification algorithms over the baseline classifiers simply as a function of the absence of smoothing.


Compositional coding capsule network with k-means routing for text classification

arXiv.org Machine Learning

Text classification is a challenging problem which aims to identify the category of texts. Recently, Capsule Networks (CapsNets) are proposed for image classification. It has been shown that CapsNets have several advantages over Convolutional Neural Networks (CNNs), while, their validity in the domain of text has less been explored. An effective method named deep compositional code learning has been proposed lately. This method can save many parameters about word embeddings without any significant sacrifices in performance. In this paper, we introduce the Compositional Coding (CC) mechanism between capsules, and we propose a new routing algorithm, which is based on k-means clustering theory. Experiments conducted on eight challenging text classification datasets show the proposed method achieves competitive accuracy compared to the state-of-the-art approach with significantly fewer parameters.


Centroid estimation based on symmetric KL divergence for Multinomial text classification problem

arXiv.org Machine Learning

We define a new method to estimate centroid for text classification based on the symmetric KL-divergence between the distribution of words in training documents and their class centroids. Experiments on several standard data sets indicate that the new method achieves substantial improvements over the traditional classifiers.


Revisiting Distributional Correspondence Indexing: A Python Reimplementation and New Experiments

arXiv.org Machine Learning

This paper introduces PyDCI, a new implementation of Distributional Correspondence Indexing (DCI) written in Python. DCI is a transfer learning method for cross-domain and cross-lingual text classification for which we had provided an implementation (here called JaDCI) built on top of JaTeCS, a Java framework for text classification. PyDCI is a stand-alone version of DCI that exploits scikit-learn and the SciPy stack. We here report on new experiments that we have carried out in order to test PyDCI, and in which we use as baselines new high-performing methods that have appeared after DCI was originally proposed. These experiments show that, thanks to a few subtle ways in which we have improved DCI, PyDCI outperforms both JaDCI and the above-mentioned high-performing methods, and delivers the best known results on the two popular benchmarks on which we had tested DCI, i.e., MultiDomainSentiment (a.k.a. MDS -- for cross-domain adaptation) and Webis-CLS-10 (for cross-lingual adaptation). PyDCI, together with the code allowing to replicate our experiments, is available at https://github.com/AlexMoreo/pydci .


Multi-class Classification Model Inspired by Quantum Detection Theory

arXiv.org Machine Learning

Machine Learning has become very famous currently which assist in identifying the patterns from the raw data. Technological advancement has led to substantial improvement in Machine Learning which, thus helping to improve prediction. Current Machine Learning models are based on Classical Theory, which can be replaced by Quantum Theory to improve the effectiveness of the model. In the previous work, we developed binary classifier inspired by Quantum Detection Theory. In this extended abstract, our main goal is to develop multi-class classifier. We generally use the terminology multinomial classification or multi-class classification when we have a classification problem for classifying observations or instances into one of three or more classes.


K-Nearest Neighbors (KNN): Solving Classification Problems

#artificialintelligence

In this tutorial, we are going to use the K-Nearest Neighbors (KNN) algorithm to solve a classification problem. Firstly, what exactly do we mean by classification? Classification across a variable means that results are categorised into a particular group. The KNN algorithm is one the most basic, yet most commonly used algorithms for solving classification problems. KNN works by seeking to minimize the distance between the test and training observations, so as to achieve a high classification accuracy.


Artificial Intelligence for Records Management RecordPoint

#artificialintelligence

As we discussed in the previous article, the Top 3 Challenges of Records Management, records management automation is the best way to address these challenges. But what is automation, really? Within these two main categories there are seven types of automation we typically deal with in the records management world. They can use fingerprinting, linguistic analysis, or both as methods of automation. All of them help us to classify content correctly against the file plan, and in some cases, we can build relationships between content for event better classification.


Explaining Black-Box Machine Learning Models - Code Part 2: Text classification with LIME

#artificialintelligence

Okay, our model above works but there are still common words and stop words in our model that LIME picks up on. Ideally, we would want to remove them before modeling and keep only relevant words. This we can accomplish by using additional steps and options in our preprocessing function. Important to know is that whatever preprocessing we do with our text corpus, train and test data has to have the same features (i.e. If we were to incorporate all the steps shown below into one function and call it separately on train and test data, we would end up with different words in our dtm and the predict() function won't work any more.


Explaining Keras image classification models with lime

#artificialintelligence

Last week I published a blog post about how easy it is to train image classification models with Keras. What I did not show in that post was how to use the model for making predictions. This, I will do here. But predictions alone are boring, so I'm adding explanations for the predictions using the lime package. I have already written a few blog posts (here, here and here) about LIME and have given talks (here and here) about it, too.