Goto

Collaborating Authors

 Text Classification


Implementing a CNN for Text Classification in TensorFlow

#artificialintelligence

Another TensorFlow feature you typically want to use is checkpointing โ€“ saving the parameters of your model to restore them later on. Checkpoints can be used to continue training at a later point, or to pick the best parameters setting using early stopping. Checkpoints are created using a Saver object. Before we can train our model we also need to initialize the variables in our graph. The initialize_all_variables function is a convenience function run all of the initializers we've defined for our variables. You can also call the initializer of your variables manually. That's useful if you want to initialize your embeddings with pre-trained values for example. Let's now define a function for a single training step, evaluating the model on a batch of data and updating the model parameters.



Interactive Semantic Featuring for Text Classification

arXiv.org Machine Learning

In text classification, dictionaries can be used to define human-comprehensible features. We propose an improvement to dictionary features called smoothed dictionary features. These features recognize document contexts instead of n-grams. We describe a principled methodology to solicit dictionary features from a teacher, and present results showing that models built using these human-comprehensible features are competitive with models trained with Bag of Words features.


Sentiment classification on node level for RNTN and SVN โ€ข /r/MachineLearning

@machinelearnbot

I have question regarding this paper (http://nlp.stanford.edu/ In the paper there are some results on page 7 in Table 1. There are results for All and Root. For the results All they use the results of all nodes of the tree. For Root they use the results on sentence level.


Supervised Learning for Document Classification with Scikit-Learn - QuantStart

#artificialintelligence

This is the first article in what will become a set of tutorials on how to carry out natural language document classification, for the purposes of sentiment analysis and, ultimately, automated trade filter or signal generation. This particular article will make use of Support Vector Machines (SVM) to classify text documents into mutually exclusive groups. Since this is the first article written in 2015, I feel it is now time to move on from Python 2.7.x and make use of the latest 3.4.x Hence all code in this article will be written with 3.4.x in mind. There are a significant number of steps to carry out between viewing a text document on a web site, say, and using its content as an input to an automated trading strategy to generate trade filters or signals. In this particular article we will avoid discussion of how to download multiple articles from external sources and make use of a given dataset that already comes with its own provided labels. This will allow us to concentrate on the implementation of the "classification pipeline", rather than spend a substantial amount of time obtaining and tagging documents. In subsequent articles in this series we will make use of Python libraries, such as ScraPy and BeautifulSoup to automatically obtain many web-based articles and effectively extract their text-based data from the HTML.


Text Analysis 101; A Basic Understanding for Business Users: Document Classification

@machinelearnbot

This blog was originally posted as part of our Text Analysis 101 blog series. It aims to explain how the classification of text works as part of Natural Language Processing. The automatic classification of documents is an example of how Machine Learning (ML) and Natural Language Processing (NLP) can be leveraged to enable machines to better understand human language. By classifying text, we are aiming to assign one or more classes or categories to a document or piece of text, making it easier to manage and sort the documents. Manually categorizing and grouping text sources can be extremely laborious and time-consuming, especially for publishers, news sites, blogs or anyone who deals with a lot of content.


100 Machine Learning videos you can't find in Google โ€ข /r/MachineLearning

#artificialintelligence

Serious answer: I tend to dive deep into a particular algorithm...learning the math better, getting used to different applications of it, etc. So that's where I usually spend my time - along with the advice /u/Jigsus offered...focusing my learning around the kinds of needs I'm working on problem-/data-wise. Sounds like survival analysis, so I try to find as much material focused around that. On the flip side, I haven't done anything like sentiment analysis, so I know next to nothing about Naive Bayes text classification. I tend to read over a rather wide selection of ML and statistics blogs, so I'm not entirely unclear about such things, it's just that I don't spend a copious amount of time other than playing with a toy dataset now and then.


Bulletin April/May 2013

#artificialintelligence

Specifically, the assignment of meaningful tags (annotations) to each unique data granule is best achieved through collaborative participation of data providers, curators and end users to augment and validate the results derived from machine learning (data mining) classification algorithms. The annotations provide curation, provenance and semantic (scientifically meaningful) metadata about the data source and the data object being studied. The design and specification of a unique, meaningful, searchable and scientifically impactful set of tags can be achieved through collaborative (human-plus-machine) annotation efforts and through discovery informatics research. These steps will produce a searchable classification and indexing scheme for the curation, classification, discovery, reuse, interoperability, integration and understanding of digital repositories.


Sentiment Classification Using Negation as a Proxy for Negative Sentiment

AAAI Conferences

We explore the relationship between negated text and negative sentiment in the task of sentiment classification. We propose a novel adjustment factor based on negation occurrences as a proxy for negative sentiment that can be applied to lexicon-based classifiers equipped with a negation detection pre-processing step. We performed an experiment on a multi-domain customer reviews dataset obtaining accuracy improvements over a baseline, and we further improved our results using out-of-domain data to calibrate the adjustment factor. We see future work possibilities in exploring negation detection refinements, and expanding the experiment to a broader spectrum of opinionated discourse, beyond that of customer reviews.


Automatic Classification of Poetry by Meter and Rhyme

AAAI Conferences

In this paper, we focus on large scale poetry classification by meter. We repurposed an open source poetry scanning program (the Scandroid by Charles O. Hartman) as a feature extractor. Our machine learning experiments show a useful ability to classify poems by poetic meter. We also made our own rhyme detector using the Carnegie Melon University Pronouncing Dictionary as our primary source of pronunciation information. Future work will involve classifying rhyme and assembling a graph (or graphs) as part of the Graph Poem Project depicting the interconnected nature of poetry across history, geography, genre, etc.