Goto

Collaborating Authors

 Text Classification


Dialog speech sentiment classification for imbalanced datasets

arXiv.org Artificial Intelligence

Speech is the most common way humans express their feelings, and sentiment analysis is the use of tools such as natural language processing and computational algorithms to identify the polarity of these feelings. Even though this field has seen tremendous advancements in the last two decades, the task of effectively detecting under represented sentiments in different kinds of datasets is still a challenging task. In this paper, we use single and bi-modal analysis of short dialog utterances and gain insights on the main factors that aid in sentiment detection, particularly in the underrepresented classes, in datasets with and without inherent sentiment component. Furthermore, we propose an architecture which uses a learning rate scheduler and different monitoring criteria and provides state-of-the-art results for the SWITCHBOARD imbalanced sentiment dataset.


Lexico-semantic and affective modelling of Spanish poetry: A semi-supervised learning approach

arXiv.org Artificial Intelligence

Text classification tasks have improved substantially during the last years by the usage of transformers. However, the majority of researches focus on prose texts, with poetry receiving less attention, specially for Spanish language. In this paper, we propose a semi-supervised learning approach for inferring 21 psychological categories evoked by a corpus of 4572 sonnets, along with 10 affective and lexico-semantic multiclass ones. The subset of poems used for training an evaluation includes 270 sonnets. With our approach, we achieve an AUC beyond 0.7 for 76% of the psychological categories, and an AUC over 0.65 for 60% on the multiclass ones. The sonnets are modelled using transformers, through sentence embeddings, along with lexico-semantic and affective features, obtained by using external lexicons. Consequently, we see that this approach provides an AUC increase of up to 0.12, as opposed to using transformers alone.


Not All Negatives are Equal: Label-Aware Contrastive Loss for Fine-grained Text Classification

arXiv.org Artificial Intelligence

Fine-grained classification involves dealing with datasets with larger number of classes with subtle differences between them. Guiding the model to focus on differentiating dimensions between these commonly confusable classes is key to improving performance on fine-grained tasks. In this work, we analyse the contrastive fine-tuning of pre-trained language models on two fine-grained text classification tasks, emotion classification and sentiment analysis. We adaptively embed class relationships into a contrastive objective function to help differently weigh the positives and negatives, and in particular, weighting closely confusable negatives more than less similar negative examples. We find that Label-aware Contrastive Loss outperforms previous contrastive methods, in the presence of larger number and/or more confusable classes, and helps models to produce output distributions that are more differentiated.


Cross-lingual Transfer for Text Classification with Dictionary-based Heterogeneous Graph

arXiv.org Artificial Intelligence

In cross-lingual text classification, it is required that task-specific training data in high-resource source languages are available, where the task is identical to that of a low-resource target language. However, collecting such training data can be infeasible because of the labeling cost, task characteristics, and privacy concerns. This paper proposes an alternative solution that uses only task-independent word embeddings of high-resource languages and bilingual dictionaries. First, we construct a dictionary-based heterogeneous graph (DHG) from bilingual dictionaries. This opens the possibility to use graph neural networks for cross-lingual transfer. The remaining challenge is the heterogeneity of DHG because multiple languages are considered. To address this challenge, we propose dictionary-based heterogeneous graph neural network (DHGNet) that effectively handles the heterogeneity of DHG by two-step aggregations, which are word-level and language-level aggregations. Experimental results demonstrate that our method outperforms pretrained models even though it does not access to large corpora. Furthermore, it can perform well even though dictionaries contain many incorrect translations. Its robustness allows the usage of a wider range of dictionaries such as an automatically constructed dictionary and crowdsourced dictionary, which are convenient for real-world applications.


President Biden orders review of 9/11 document classification nearly 20 years later

FOX News

Watch'The Lost Calls of 9/11' Sunday, Sept. 5 at 10 p.m. ET on Fox News. President Biden on Friday signed an executive order calling for the review of classified information related to the terrorist attacks on Sept. 11, 2001, and the ultimate declassification of some documents. The president's move was lauded by families of victims who died on that fateful day nearly 20 years ago, and was seen as a supportive gesture toward many who have long sought the records in hopes of implicating the Saudi government. The order, coming little more than a week before the 20th anniversary of the attacks, is a significant moment in a yearslong tussle between the government and the families over what classified information about the run-up to the attacks could be made public. That conflict was on display last month when some 1,800 relatives, survivors and first responders came out against Biden's participation in 9/11 memorial events if the documents remained declassified.


Ship Type Classification

#artificialintelligence

In this blog, we will show our approach to classifying images of ship using supervised models. We use a dataset obtained from Kaggle in order to perform our analyses. We discuss various data preprocesses we went through in order to reduce the dimensionality of the data, and to feed our models the best inputs possible. Ship or vessel detection has a wide range of applications, in the areas of maritime safety, fisheries management, marine pollution, defence and maritime security, protection from piracy, illegal migration, etc. Keeping this in mind, a Governmental Maritime and Coastguard Agency is planning to deploy a computer vision based automated system to identify ship type only from the images taken by the survey boats. You have been hired as a consultant to build an efficient model for this project.


Multiclass Text Classification Using Deep Learning

#artificialintelligence

Before we go any further into text classification, we need a way to represent words numerically in a vocabulary. Because most of our ML models require numbers, not text. One way to achieve this goal is by using the one-hot encoding of word vectors, but this is not the right choice. Given the structure of one-hot encoded vectors, the similarity is always going to be 0 between different words. Word2Vec overcomes the above difficulties by providing us with a fixed-length (usually much smaller than the vocabulary size) vector representation of words.


Noisy Channel Language Model Prompting for Few-Shot Text Classification

arXiv.org Artificial Intelligence

We introduce a noisy channel approach for language model prompting in few-shot text classification. Instead of computing the likelihood of the label given the input (referred as direct models), channel models compute the conditional probability of the input given the label, and are thereby required to explain every word in the input. We use channel models for recently proposed few-shot learning methods with no or very limited updates to the language model parameters, via either in-context demonstration or prompt tuning. Our experiments show that, for both methods, channel models significantly outperform their direct counterparts, which we attribute to their stability, i.e., lower variance and higher worst-case accuracy. We also present extensive ablations that provide recommendations for when to use channel prompt tuning instead of other competitive models (e.g., direct head tuning): channel prompt tuning is preferred when the number of training examples is small, labels in the training data are imbalanced, or generalization to unseen labels is required.


Malawi News Classification -An NLP Project - Analytics Vidhya

#artificialintelligence

Text classification is common among the application that we use on daily basis. For example, email providers use text classification to filter out spam emails from your inbox. The other most common use of text classification is in customer care where they use sentimental analysis to differentiate bad reviews from good reviews ADDI AI 2050. In recent years the English language text classification has come a long way, but training classification models on low resource language and varying lengths still pose difficulties. In this Zindi competition, we are provided with news articles written in the Chichewa language and we have to train our model on multi-label classification as there are 19 categories of news.


Best Python Libraries Of 2021 For Natural Language Processing

#artificialintelligence

Natural Language Processing (NLP), a tech wizard, is the part of data science that teaches computers to comprehend human languages. It involves the analysis of data to extract meaningful insights. Of its many uses, the main ones include text mining, text classification, text and sentiment analysis, and speech generation and recognition. Today, we explore seven top Python NLP libraries. Using these libraries will enable one to build end-to-end NLP solutions -- from getting data for one's model to presenting the results. Additionally, one will learn about related concepts such as tokenisation, stemming, semantic reasoning and more.