Text Classification
A Survey of Text Classification Under Class Distribution Shift
Costache, Adriana Valentina, Gheorghe, Silviu Florin, Poesina, Eduard Gabriel, Irofti, Paul, Ionescu, Radu Tudor
The basic underlying assumption of machine learning (ML) models is that the training and test data are sampled from the same distribution. However, in daily practice, this assumption is often broken, i.e.~the distribution of the test data changes over time, which hinders the application of conventional ML models. One domain where the distribution shift naturally occurs is text classification, since people always find new topics to discuss. To this end, we survey research articles studying open-set text classification and related tasks. We divide the methods in this area based on the constraints that define the kind of distribution shift and the corresponding problem formulation, i.e.~learning with the Universum, zero-shot learning, and open-set learning. We next discuss the predominant mitigation approaches for each problem setup. Finally, we identify several future work directions, aiming to push the boundaries beyond the state of the art. Interestingly, we find that continual learning can solve many of the issues caused by the shifting class distribution. We maintain a list of relevant papers at https://github.com/Eduard6421/Open-Set-Survey.
A Hybrid Model for Few-Shot Text Classification Using Transfer and Meta-Learning
Gao, Jia, Lyu, Shuangquan, Liu, Guiran, Zhu, Binrong, Zheng, Hongye, Liao, Xiaoxuan
With the continuous development of natural language processing (NLP) technology, text classification tasks have been widely used in multiple application fields. However, obtaining labeled data is often expensive and difficult, especially in few-shot learning scenarios. To solve this problem, this paper proposes a few-shot text classification model based on transfer learning and meta-learning. The model uses the knowledge of the pre-trained model for transfer and optimizes the model's rapid adaptability in few-sample tasks through a meta-learning mechanism. Through a series of comparative experiments and ablation experiments, we verified the effectiveness of the proposed method. The experimental results show that under the conditions of few samples and medium samples, the model based on transfer learning and meta-learning significantly outperforms traditional machine learning and deep learning methods. In addition, ablation experiments further analyzed the contribution of each component to the model performance and confirmed the key role of transfer learning and meta-learning in improving model accuracy. Finally, this paper discusses future research directions and looks forward to the potential of this method in practical applications.
A Multiplicative Model for Learning Distributed Text-Based Attribute Representations
Ryan Kiros, Richard Zemel, Russ R. Salakhutdinov
In this paper we propose a general framework for learning distributed representations of attributes: characteristics of text whose representations can be jointly learned with word embeddings. Attributes can correspond to a wide variety of concepts, such as document indicators (to learn sentence vectors), language indicators (to learn distributed language representations), meta-data and side information (such as the age, gender and industry of a blogger) or representations of authors. We describe a third-order model where word context and attribute vectors interact multiplicatively to predict the next word in a sequence. This leads to the notion of conditional word similarity: how meanings of words change when conditioned on different attributes. We perform several experimental tasks including sentiment classification, cross-lingual document classification, and blog authorship attribution. We also qualitatively evaluate conditional word neighbours and attribute-conditioned text generation.
Review for NeurIPS paper: Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation
Weaknesses: - The technical novelty of the proposed method is somewhat incremental since it is largely based on the work from [14] with some modifications to the generator and the discriminator architectures. The word-level training feedback in the discriminator seems to be the main technical contribution, but is not ground-breaking as it extends the auxiliary classifier in conditional GAN with multiple classes (i.e. Specifically, only the nouns and adjectives are chosen manually as text-relevant attributes, which convey a very limited context of general descriptions. Although it may allow a fine-control of the image content in a limited context, it reduces the capability of aligning rich context of the text to the image, often available in approaches learning to encode the whole sentence (e.g. Although authors made some justifications in Section 3.2.1 of using heuristic approach, it does not feel that this assumption holds in general. Current comparisons are mostly focused on ManiGAN.
Review for NeurIPS paper: Uncertainty-aware Self-training for Few-shot Text Classification
Weaknesses: My main concerns are on the experiments. While the authors make effort to perform ablation analysis, I think there are still some important missing ablations to convince me that such BNN-powerd self-training scheme is better than classic ST: (1) The proposed method always uses smart sample selection strategy while the classic ST baseline in this paper does not select samples or just select them uniformly. It is very common for classic ST to select samples based on confidence scores, which can be class-dependent as well. Thus I feel that the comparison made with classic ST is not very fair. I would like to see the comparison between UST removing Conf and classic ST with confidence-based and class-dependent sample selection, or just replace the sample selection part in full UST with confidence-score-based selection to see what happens, otherwise I don't see any direct evidence to show that the BNN-powered "uncertainty-awareness" is better than simple confidence-score-based baseline.
Review for NeurIPS paper: Uncertainty-aware Self-training for Few-shot Text Classification
This work presents a novel approach of integrating uncertainty into self-training to obtain strong results on text classification with very few labels. The work compares against a strong set of baselines and has extensive ablations. The reviewers agreed the response answered most of their concerns. The work could be improved with more diverse low-resource setups and by improving the clarity of the writing.
On the Effectiveness of Large Language Models in Automating Categorization of Scientific Texts
Shahi, Gautam Kishore, Hummel, Oliver
The amount of scholarly texts is consistently increasing; around 2.5 million research articles are published yearly (Rabby et al., 2024). Due to this enormous increase, the classification of (scientific) texts has been attracting even more attention in recent years (Born-mann et al., 2021). Classifying the research area of scientific texts requires significant domain knowledge in various complex research fields. Hence, manual classification is challenging and time-consuming for librarians and limits the number of texts that can be classified manually (Zhang et al., 2023). Moreover, due to complex hierarchical classification schemes and their existing variety, classification of publications is also an unbeloved activity for researchers. Prominent examples of classification schemes include the Open Research Knowledge Graph (ORKG) (Auer and Mann, 2019), Microsoft Academic Graph (Wang et al., 2020), the Semantic Scholar Academic Graph (Kinney et al., 2023), ACM computing classification system (Rous, 2012), Dewey Decimal Classification (DDC) (Scott, 1998), and the ACL Anthology (Bird et al., 2008).
Mixed Feelings: Cross-Domain Sentiment Classification of Patient Feedback
Rønningstad, Egil, Storset, Lilja Charlotte, Mæhlum, Petter, Øvrelid, Lilja, Velldal, Erik
Sentiment analysis of patient feedback from the public health domain can aid decision makers in evaluating the provided services. The current paper focuses on free-text comments in patient surveys about general practitioners and psychiatric healthcare, annotated with four sentence-level polarity classes -- positive, negative, mixed and neutral -- while also attempting to alleviate data scarcity by leveraging general-domain sources in the form of reviews. For several different architectures, we compare in-domain and out-of-domain effects, as well as the effects of training joint multi-domain models.
Through the Looking Glass: LLM-Based Analysis of AR/VR Android Applications Privacy Policies
Alghamdi, Abdulaziz, Mohaisen, David
\begin{abstract} This paper comprehensively analyzes privacy policies in AR/VR applications, leveraging BERT, a state-of-the-art text classification model, to evaluate the clarity and thoroughness of these policies. By comparing the privacy policies of AR/VR applications with those of free and premium websites, this study provides a broad perspective on the current state of privacy practices within the AR/VR industry. Our findings indicate that AR/VR applications generally offer a higher percentage of positive segments than free content but lower than premium websites. The analysis of highlighted segments and words revealed that AR/VR applications strategically emphasize critical privacy practices and key terms. This enhances privacy policies' clarity and effectiveness.
Reviews: Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics
UPDATE after reading author rebuttal: Look forward to the changes in the final version of the paper. Detailed comments: 1. Understanding of RNNs for sentiment classification task - theoretical analysis backed by empirical observations: This work takes up the sentiment classification task. This work figured out some fixed points and centered their analysis of RNNs around them. The RNN states can be cast into a 1-dimensional manifold of these fixed points. The PCA of RNN states across examples reveal that training helps RNNs figure out a lower-dimensional representation. Interestingly the movement along this low dimensional manifold is minimal in absence of inputs or presence of neutral/un-informative words, whereas they show more movements if polarity bearing words are present, thus, showing linear separability effects along this 1-D manifold.