AITopics

Country: North America > Canada > Ontario > Toronto (0.06)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.64)

Apostolova, Emilia, Kreek, R. Andrew

Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data

arXiv.org Machine LearningSep-11-2018

Industry datasets used for text classification are rarely created for that purpose. In most cases, the data and target predictions are a by-product of accumulated historical data, typically fraught with noise, present in both the text-based document, as well as in the targeted labels. In this work, we address the question of how well performance metrics computed on noisy, historical data reflect the performance on the intended future machine learning model input. The results demonstrate the utility of dirty training datasets used to build prediction models for cleaner (and different) prediction inputs.

machine learning, natural language, text classification, (14 more...)

1809.04019

Country:

Oceania > Australia (0.14)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Law (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)
(2 more...)

Stein, Roger A., Jaques, Patricia A., Valiati, Joao F.

An Analysis of Hierarchical Text Classification Using Word Embeddings

arXiv.org Artificial IntelligenceSep-5-2018

Efficient distributed numerical word representation models (word embeddings) combined with modern machine learning algorithms have recently yielded considerable improvement on automatic document classification tasks. However, the effectiveness of such techniques has not been assessed for the hierarchical text classification (HTC) yet. This study investigates the application of those models and algorithms on this specific problem by means of experimentation and analysis. We trained classification models with prominent machine learning algorithm implementations---fastText, XGBoost, SVM, and Keras' CNN---and noticeable word embeddings generation methods---GloVe, word2vec, and fastText---with publicly available data and evaluated them with measures specifically appropriate for the hierarchical context. FastText achieved an ${}_{LCA}F_1$ of 0.893 on a single-labeled version of the RCV1 dataset. An analysis indicates that using word embeddings and its flavors is a very promising approach for HTC.

classification, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.ins.2018.09.001

1809.01771

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Promising Solution (0.87)

Industry:

Law (0.92)
Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
(3 more...)

Qian, Jing, ElSherief, Mai, Belding, Elizabeth, Wang, William Yang

Hierarchical CVAE for Fine-Grained Hate Speech Classification

arXiv.org Artificial IntelligenceAug-31-2018

Existing work on automated hate speech detection typically focuses on binary classification or on differentiating among a small set of categories. In this paper, we propose a novel method on a fine-grained hate speech classification task, which focuses on differentiating among 40 hate groups of 13 different hate group categories. We first explore the Conditional Variational Autoencoder (CVAE) (Larsen et al., 2016; Sohn et al., 2015) as a discriminative model and then extend it to a hierarchical architecture to utilize the additional hate category information for more accurate prediction. Experimentally, we show that incorporating the hate category information for training can significantly improve the classification performance and our proposed model outperforms commonly-used discriminative models.

machine learning, natural language, text classification, (20 more...)

arXiv.org Artificial Intelligence

1809.00088

Country:

North America > United States > California > Santa Barbara County > Santa Barbara (0.14)
North America > United States > Alabama > Lee County > Auburn (0.04)

Genre: Research Report (0.84)

Industry: Law Enforcement & Public Safety > Terrorism (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.95)

Parvez, Md Rizwan, Bolukbasi, Tolga, Chang, kai-Wei, Saligrama, Venkatesh

Building a Robust Text Classifier on a Test-Time Budget

arXiv.org Machine LearningAug-29-2018

In this paper, we study a generic learning framework for building robust text classification model that achieves accuracy comparable to standard full models under test-time budget constraints. Our approach learns a selector to identify words that are relevant to the prediction tasks and only passes these words to the classifier for processing. The selector is trained jointly with the classifier and directly learns to incorporate with the classifier. We further propose a data aggregation scheme to improve the robustness of the classifier. Our learning framework is general and can be incorporated with any type of text classification model. On real-world data, we show that the proposed approach improves the performance of a given classifier and speeds up the model with a mere loss in accuracy performance.

classifier, machine learning, natural language, (20 more...)

1808.0827

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > India (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(4 more...)

Genre: Research Report > New Finding (0.68)

Industry: Energy > Power Industry > Utilities > Nuclear (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.38)

Schultz, Lex Razoux, Loog, Marco, Esfahani, Peyman Mohajerin

Distance Based Source Domain Selection for Sentiment Classification

arXiv.org Machine LearningAug-28-2018

Automated sentiment classification (SC) on short text fragments has received increasing attention in recent years. Performing SC on unseen domains with few or no labeled samples can significantly affect the classification performance due to different expression of sentiment in source and target domain. In this study, we aim to mitigate this undesired impact by proposing a methodology based on a predictive measure, which allows us to select an optimal source domain from a set of candidates. The proposed measure is a linear combination of well-known distance functions between probability distributions supported on the source and target domains (e.g. Earth Mover's distance and Kullback-Leibler divergence). The performance of the proposed methodology is validated through an SC case study in which our numerical experiments suggest a significant improvement in the cross domain classification error in comparison with a random selected source domain for both a naive and adaptive learning setting. In the case of more heterogeneous datasets, the predictability feature of the proposed model can be utilized to further select a subset of candidate domains, where the corresponding classifier outperforms the one trained on all available source domains. This observation reinforces a hypothesis that our proposed model may also be deployed as a means to filter out redundant information during a training phase of SC.

machine learning, natural language, text classification, (20 more...)

1808.09271

Country:

Europe > Netherlands > South Holland > Delft (0.04)
North America > United States > Michigan (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

arXiv.org Machine LearningAug-26-2018

From Random to Supervised: A Novel Dropout Mechanism Integrated with Global Information

Xu, Hengru, Li, Shen, Hu, Renfen, Li, Si, Gao, Sheng

Dropout is used to avoid overfitting by randomly dropping units from the neural networks during training. Inspired by dropout, this paper presents GI-Dropout, a novel dropout method integrating with global information to improve neural networks for text classification. Unlike the traditional dropout method in which the units are dropped randomly according to the same probability, we aim to use explicit instructions based on global information of the dataset to guide the training process. With GI-Dropout, the model is supposed to pay more attention to inapparent features or patterns. Experiments demonstrate the effectiveness of the dropout with global information on seven text classification tasks, including sentiment analysis and topic classification.

machine learning, natural language, text classification, (18 more...)

1808.08149

Country:

Asia > China > Beijing > Beijing (0.05)
North America > United States > Illinois (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

#artificialintelligenceAug-21-2018, 07:36:28 GMT

Text Classification with Deep Neural Network in TensorFlow -- Simple Explanation

Text classification implementation with TensorFlow can be simple. One of the areas where text classification can be applied -- chatbot text processing and intent resolution. I will describe step by step in this post, how to build TensorFlow model for text classification and how classification is done. Please refer to my previous post related to similar topic -- Contextual Chatbot with TensorFlow, Node.js and Oracle JET -- Steps How to Install and Get It Working. I would recommend to go through this great post about chatbot implementation -- Contextual Chatbots with Tensorflow.

machine learning, natural language, text classification, (10 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

#artificialintelligenceAug-18-2018, 09:38:31 GMT

Ham or Spam? SMS Text Classification with Machine Learning

The use of mobile phones has skyrocketed in the last decade leading to a new area for junk promotions from disreptable marketers. People innocently give out their mobile phone numbers while utilizing day to day services and are then flooded with spam promotional messages. In this post we will take a look at classifying SMS messages using the Naive Bayes Machine Learning model, understand why Naive Bayes works well for this use case and also dive a little into wordclouds to visualize this dataset.

machine learning, natural language, sms text classification, (2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.40)

#artificialintelligenceJul-24-2018, 00:52:52 GMT

Step 2.5: Choose a Model ML Universal Guides Google Developers

At this point, we have assembled our dataset and gained insights into the key characteristics of our data. Next, based on the metrics we gathered in Step 2, we should think about which classification model we should use. This means/ asking questions such as, "How do we present the text data to an algorithm that expects numeric input?" (this is called data preprocessing and vectorization), "What type of model should we use?", "What configuration parameters should we use for our model?", Thanks to decades of research, we have access to a large array of data preprocessing and model configuration options. However, the availability of a very large array of viable options to choose from greatly increases the complexity and the scope of the particular problem at hand.

machine learning, natural language, text classification, (14 more...)

Genre: Workflow (0.79)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.76)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.54)