AITopics | Text Classification

Collaborating Authors

Text Classification

"A text classifier is an automated means of determining some metadata about a document. Text classifiers are used for such diverse needs as spam filtering, suggesting categories for indexing a document created in a content management system, or automatically sorting help desk requests."
– John Graham-Cumming, Naive Bayesian Text Classification. Dr. Dobb's. May 1 2005.

News Overviews Instructional Materials AI-Alerts Classics

Advanced Topics: Classification with Spotfire

@machinelearnbotNov-13-2017, 08:15:07 GMT

How can I predict my customer base? In this webinar, we'll answer real data science questions like this using Spotfire and TERR to make smarter decisions. For our next webinar, we'll be managing a hotel's marketing group, using classification methods inside of Spotfire. This is the fourth step in our five-part webinar series called the Building Blocks of Data Science. In this series, we will explore solving real data science questions using Spotfire and TERR.

artificial intelligence, natural language, text classification, (5 more...)

@machinelearnbot

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Technology:

Information Technology > Visualization (1.00)
Information Technology > Data Science (1.00)
Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.40)

Add feedback

Tensorflow Text Classification - Python Deep Learning - Source Dexter

@machinelearnbotOct-23-2017, 04:55:05 GMT

Text Classification is the task of assigning the right label to a given piece of text. This text can either be a phrase, a sentence or even a paragraph. Our aim would be to take in some text as input and attach or assign a label to it. Since we will be using Tensorflow deep learning library, we can call this the Tensorflow text classification system. This task involves training a neural network with lots of data indicating what a piece of text represents.

category, machine learning, natural language, (14 more...)

@machinelearnbot

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

Intro to text classification with Keras: automatically tagging Stack Overflow posts Google Cloud Big Data and Machine Learning Blog Google Cloud Platform

@machinelearnbotOct-9-2017, 20:40:53 GMT

Posted by Sara Robinson (Developer Advocate), Josh Gordon (Developer Advocate), and Marianne Linhares Monteiro (DA Intern). As humans, our brains can easily read a piece of text and extract the topic, tone, and sentiment. Up until just a few years ago, teaching a computer to do the same thing required extensive machine learning expertise and access to powerful computing resources. Now, frameworks like TensorFlow are helping to simplify the process of building machine learning models, and making it more accessible to developers with no background in ML. In this post, we'll show you how to build a simple model to predict the tag of a Stack Overflow question.

accuracy, data mining, machine learning, (16 more...)

@machinelearnbot

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.41)
Information Technology > Data Science > Data Mining > Big Data (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Add feedback

Unsupervised Text Classification for Natural Language Interactive Narratives

Bellassai, Jenna (Oberlin College) | Gordon, Andrew S. (University of Southern California) | Roemmele, Melissa (University of Southern California) | Cychosz, Margaret (University of California, Berkeley) | Odimegwu, Obiageli (University of Southern California) | Connolly, Olivia (University of Southern California)

AAAI ConferencesOct-1-2017

Natural language interactive narratives are a variant of traditional branching storylines where player actions are expressed in natural language rather than by selecting among choices. Previous efforts have handled the richness of natural language input using machine learning technologies for text classification, bootstrapping supervised machine learning approaches with human-in-the-loop data acquisition or by using expected player input as fake training data. This paper explores a third alternative, where unsupervised text classifiers are used to automatically route player input to the most appropriate storyline branch. We describe the Data-driven Interactive Narrative Engine (DINE), a web-based tool for authoring and deploying natural language interactive narratives. To compare the performance of different algorithms for unsupervised text classification, we collected thousands of user inputs from hundreds of crowdsourced participants playing 25 different scenarios, and hand-annotated them to create a gold-standard test set. Through comparative evaluations, we identified an unsupervised algorithm for narrative text classification that approaches the performance of supervised text classification algorithms. We discuss how this technology supports authors in the rapid creation and deployment of interactive narrative experiences, with authorial burdens similar to that of traditional branching storylines.

artificial intelligence, machine learning, natural language interactive narrative, (1 more...)

AAAI Conferences

Thirteenth Artificial Intelligence and Interactive Digital Entertainment Conference

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Active learning in annotating micro-blogs dealing with e-reputation

Cossu, Jean-Valère, Molina-Villegas, Alejandro, Tello-Signoret, Mariana

arXiv.org Artificial IntelligenceSep-25-2017

Elections unleash strong political views on Twitter, but what do people really think about politics? Opinion and trend mining on micro blogs dealing with politics has recently attracted researchers in several fields including Information Retrieval and Machine Learning (ML). Since the performance of ML and Natural Language Processing (NLP) approaches are limited by the amount and quality of data available, one promising alternative for some tasks is the automatic propagation of expert annotations. This paper intends to develop a so-called active learning process for automatically annotating French language tweets that deal with the image (i.e., representation, web reputation) of politicians. Our main focus is on the methodology followed to build an original annotated dataset expressing opinion from two French politicians over time. We therefore review state of the art NLP-based ML algorithms to automatically annotate tweets using a manual initiation step as bootstrap. This paper focuses on key issues about active learning while building a large annotated data set from noise. This will be introduced by human annotators, abundance of data and the label distribution across data and entities. In turn, we show that Twitter characteristics such as the author's name or hashtags can be considered as the bearing point to not only improve automatic systems for Opinion Mining (OM) and Topic Classification but also to reduce noise in human annotations. However, a later thorough analysis shows that reducing noise might induce the loss of crucial information.

annotation, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.18713/JIMIS-010917-3-2

1706.05349

Country:

Europe > France (0.68)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(7 more...)

Genre: Research Report (1.00)

Industry:

Government > Voting & Elections (0.67)
Information Technology > Services (0.46)
Government > Regional Government > Europe Government > France Government (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.88)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
(2 more...)

Add feedback

Implementing a CNN for Text Classification in Tensorflow

@machinelearnbotSep-17-2017, 18:50:17 GMT

Another TensorFlow feature you typically want to use is checkpointing – saving the parameters of your model to restore them later on. Checkpoints can be used to continue training at a later point, or to pick the best parameters setting using early stopping. Checkpoints are created using a Saver object. Before we can train our model we also need to initialize the variables in our graph. The initialize_all_variables function is a convenience function run all of the initializers we've defined for our variables. You can also call the initializer of your variables manually. That's useful if you want to initialize your embeddings with pre-trained values for example. Let's now define a function for a single training step, evaluating the model on a batch of data and updating the model parameters.

machine learning, natural language, text classification, (15 more...)

@machinelearnbot

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.41)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.33)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.31)

Add feedback

Aggressive Sampling for Multi-class to Binary Reduction with Applications to Text Classification

Joshi, Bikash, Amini, Massih-Reza, Partalas, Ioannis, Iutzeler, Franck, Maximov, Yury

arXiv.org Machine LearningSep-14-2017

We address the problem of multi-class classification in the case where the number of classes is very large. We propose a double sampling strategy on top of a multi-class to binary reduction strategy, which transforms the original multi-class problem into a binary classification problem over pairs of examples. The aim of the sampling strategy is to overcome the curse of long-tailed class distributions exhibited in majority of large-scale multi-class classification problems and to reduce the number of pairs of examples in the expanded data. We show that this strategy does not alter the consistency of the empirical risk minimization principle defined over the double sample reduction. Experiments are carried out on DMOZ and Wikipedia collections with 10,000 to 100,000 classes where we show the efficiency of the proposed approach in terms of training and prediction time, memory consumption, and predictive performance with respect to state-of-the-art approaches.

machine learning, natural language, text classification, (18 more...)

arXiv.org Machine Learning

1701.06511

Country:

Europe > France (0.29)
North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

An Automated Text Categorization Framework based on Hyperparameter Optimization

Tellez, Eric S., Moctezuma, Daniela, Miranda-Jímenez, Sabino, Graff, Mario

arXiv.org Artificial IntelligenceSep-14-2017

A great variety of text tasks such as topic or spam identification, user profiling, and sentiment analysis can be posed as a supervised learning problem and tackle using a text classifier. A text classifier consists of several subprocesses, some of them are general enough to be applied to any supervised learning problem, whereas others are specifically designed to tackle a particular task, using complex and computational expensive processes such as lemmatization, syntactic analysis, etc. Contrary to traditional approaches, we propose a minimalistic and wide system able to tackle text classification tasks independent of domain and language, namely microTC. It is composed by some easy to implement text transformations, text representations, and a supervised learning algorithm. These pieces produce a competitive classifier even in the domain of informally written text. We provide a detailed description of microTC along with an extensive experimental comparison with relevant state-of-the-art methods. mircoTC was compared on 30 different datasets. Regarding accuracy, microTC obtained the best performance in 20 datasets while achieves competitive results in the remaining 10. The compared datasets include several problems like topic and polarity classification, spam detection, user profiling and authorship attribution. Furthermore, it is important to state that our approach allows the usage of the technology even without knowledge of machine learning and natural language processing.

classifier, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

1704.01975

Country: North America > United States (0.93)

Genre: Research Report (1.00)

Industry:

Information Technology > Services (0.46)
Education > Focused Education > Special Education (0.44)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.89)

Add feedback

Automated Classification of Benign and Malignant Proliferative Breast Lesions

#artificialintelligenceAug-29-2017, 09:15:34 GMT

Pathologists must identify precursor lesions as either benign usual ductal hyperplasia (UDH) or malignant ductal carcinoma in situ (DCIS) for diagnosis and treatment of breast biopsies. Most patients with UDH receive no treatment and have minimal or no increased risk of cancer, while patients with DCIS are more likely to be diagnosed with invasive breast cancer1, 2. Treatment to reduce DCIS recurrence and invasive carcinoma has notable risks and side effects, given the extensive methods of lumpectomy with radiation, mastectomy, and tamoxifen hormonal treatment3. Diagnostic oversights can lead to either untreated cancer or unnecessary radiation treatment and chemotherapy, both of which have detrimental consequences. Thus, accurate diagnosis is critical for patients as well as for hospitals to reduce extraneous treatment costs. However, human pathologists may not always be in concordance as there is no strict set of instructions on how to carry out a diagnosis.

machine learning, natural language, text classification, (16 more...)

#artificialintelligence

Industry:

Health & Medicine > Therapeutic Area > Oncology > Carcinoma (0.56)
Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.40)

Add feedback

The Best Metric to Measure Accuracy of Classification Models

#artificialintelligenceJul-14-2017, 07:55:35 GMT

To understand the implication of translating the probability number, let's understand few basic concepts relating to evaluating a classification model with the help of an example given below. Since we are now comfortable with the interpretation of the Confusion Matrix, let's look at some popular metrics used for testing the classification models: Since the formula doesn't contain FP and TN, Sensitivity may give you a biased result, especially for imbalanced classes. In the example of Fraud detection, it gives you the percentage of Correctly Predicted Frauds from the pool of Actual Frauds. In the example of Fraud detection, it gives you the percentage of Correctly Predicted Frauds from the pool of Total Predicted Frauds.

classification model, law enforcement, public safety, (21 more...)

#artificialintelligence

Industry: Law Enforcement & Public Safety > Fraud (0.78)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.90)

Add feedback