caragea
MultiMatch: Multihead Consistency Regularization Matching for Semi-Supervised Text Classification
Sirbu, Iustin, Popovici, Robert-Adrian, Caragea, Cornelia, Trausan-Matu, Stefan, Rebedea, Traian
We introduce MultiMatch, a novel semi-supervised learning (SSL) algorithm combining the paradigms of co-training and consistency regularization with pseudo-labeling. At its core, MultiMatch features a pseudo-label weighting module designed for selecting and filtering pseudo-labels based on head agreement and model confidence, and weighting them according to the perceived classification difficulty. This novel module enhances and unifies three existing techniques -- heads agreement from Multihead Co-training, self-adaptive thresholds from FreeMatch, and Average Pseudo-Margins from MarginMatch -- resulting in a holistic approach that improves robustness and performance in SSL settings. Experimental results on benchmark datasets highlight the superior performance of MultiMatch, i.e., MultiMatch achieves state-of-the-art results on 8 out of 10 setups from 5 natural language processing datasets and ranks first according to the Friedman test among 21 methods. Furthermore, MultiMatch demonstrates exceptional robustness in highly imbalanced settings, outperforming the second-best approach by 3.26%, a critical advantage for real-world text classification tasks. Our code is available on GitHub.
Multilingual Target-Stance Extraction
Social media enables data-driven analysis of public opinion on contested issues. Target-Stance Extraction (TSE) is the task of identifying the target discussed in a document and the document's stance towards that target. Many works classify stance towards a given target in a multilingual setting, but all prior work in TSE is English-only. This work introduces the first multilingual TSE benchmark, spanning Catalan, Estonian, French, Italian, Mandarin, and Spanish corpora. It manages to extend the original TSE pipeline to a multilingual setting without requiring separate models for each language. Our model pipeline achieves a modest F1 score of 12.78, underscoring the increased difficulty of the multilingual task relative to English-only setups and highlighting target prediction as the primary bottleneck. We are also the first to demonstrate the sensitivity of TSE's F1 score to different target verbalizations. Together these serve as a much-needed baseline for resources, algorithms, and evaluation criteria in multilingual TSE.
LLM-Guided Co-Training for Text Classification
Rahman, Md Mezbaur, Caragea, Cornelia
In this paper, we introduce a novel weighted co-training approach that is guided by Large Language Models (LLMs). Namely, in our co-training approach, we use LLM labels on unlabeled data as target labels and co-train two encoder-only based networks that train each other over multiple iterations: first, all samples are forwarded through each network and historical estimates of each network's confidence in the LLM label are recorded; second, a dynamic importance weight is derived for each sample according to each network's belief in the quality of the LLM label for that sample; finally, the two networks exchange importance weights with each other -- each network back-propagates all samples weighted with the importance weights coming from its peer network and updates its own parameters. By strategically utilizing LLM-generated guidance, our approach significantly outperforms conventional SSL methods, particularly in settings with abundant unlabeled data. Empirical results show that it achieves state-of-the-art performance on 4 out of 5 benchmark datasets and ranks first among 14 compared methods according to the Friedman test. Our results highlight a new direction in semi-supervised learning -- where LLMs serve as knowledge amplifiers, enabling backbone co-training models to achieve state-of-the-art performance efficiently.
Zero-Shot Keyphrase Generation: Investigating Specialized Instructions and Multi-Sample Aggregation on Large Language Models
Mohan, Jayanth, Chowdhury, Jishnu Ray, Malik, Tomas, Caragea, Cornelia
Keyphrases are the essential topical phrases that summarize a document. Keyphrase generation is a long-standing NLP task for automatically generating keyphrases for a given document. While the task has been comprehensively explored in the past via various models, only a few works perform some preliminary analysis of Large Language Models (LLMs) for the task. Given the impact of LLMs in the field of NLP, it is important to conduct a more thorough examination of their potential for keyphrase generation. In this paper, we attempt to meet this demand with our research agenda. Specifically, we focus on the zero-shot capabilities of open-source instruction-tuned LLMs (Phi-3, Llama-3) and the closed-source GPT-4o for this task. We systematically investigate the effect of providing task-relevant specialized instructions in the prompt. Moreover, we design task-specific counterparts to self-consistency-style strategies for LLMs and show significant benefits from our proposals over the baselines.
Infrastructure Ombudsman: Mining Future Failure Concerns from Structural Disaster Response
Chowdhury, Md Towhidul Absar, Datta, Soumyajit, Sharma, Naveen, KhudaBukhsh, Ashiqur R.
On January 28, 2022, at 6.39 a.m. EST, the Fern Hollow Bridge in Pittsburgh, Pennsylvania collapsed. Due to the timing of the failure, thankfully, fewer vehicles were on the bridge and only ten people were injured with no fatalities. Pittsburgh, also known as the City of Bridges, was getting ready for a visit from President Biden that day. Biden visited the collapse site and assured federal assistance to rebuild the bridge on the spot. This infrastructural failure, coinciding with a high-profile political visit and a push towards passing the Build Back Better infrastructure bill, attracted considerable media attention to the flailing infrastructural health in the US. As we were sifting through the social web discussions surrounding this issue, broad themes such as words of compassion for the victims and typical responses in social web political discourse such as political name-calling, conspiracy theories, and partisan mud-slinging emerged. However, apart from these expected social web reactions, we noticed a small minority of interactions that talked about anticipatory failures of other bridges in the US.
Language Models (Mostly) Do Not Consider Emotion Triggers When Predicting Emotion
Singh, Smriti, Caragea, Cornelia, Li, Junyi Jessy
Situations and events evoke emotions in humans, but to what extent do they inform the prediction of emotion detection models? Prior work in emotion trigger or cause identification focused on training models to recognize events that trigger an emotion. Instead, this work investigates how well human-annotated emotion triggers correlate with features that models deemed salient in their prediction of emotions. First, we introduce a novel dataset EmoTrigger, consisting of 900 social media posts sourced from three different datasets; these were annotated by experts for emotion triggers with high agreement. Using EmoTrigger, we evaluate the ability of large language models (LLMs) to identify emotion triggers, and conduct a comparative analysis of the features considered important for these tasks between LLMs and fine-tuned models. Our analysis reveals that emotion triggers are largely not considered salient features for emotion prediction models, instead there is intricate interplay between various features and the task of emotion detection.
CrisisMatch: Semi-Supervised Few-Shot Learning for Fine-Grained Disaster Tweet Classification
Zou, Henry Peng, Zhou, Yue, Caragea, Cornelia, Caragea, Doina
The shared real-time information about natural disasters on social media platforms like Twitter and Facebook plays a critical role in informing volunteers, emergency managers, and response organizations. However, supervised learning models for monitoring disaster events require large amounts of annotated data, making them unrealistic for real-time use in disaster events. To address this challenge, we present a fine-grained disaster tweet classification model under the semi-supervised, few-shot learning setting where only a small number of annotated data is required. Our model, CrisisMatch, effectively classifies tweets into fine-grained classes of interest using few labeled data and large amounts of unlabeled data, mimicking the early stage of a disaster. Through integrating effective semi-supervised learning ideas and incorporating TextMixUp, CrisisMatch achieves performance improvement on two disaster datasets of 11.2\% on average. Further analyses are also provided for the influence of the number of labeled data and out-of-domain results.
DeCrisisMB: Debiased Semi-Supervised Learning for Crisis Tweet Classification via Memory Bank
Zou, Henry Peng, Zhou, Yue, Zhang, Weizhi, Caragea, Cornelia
During crisis events, people often use social media platforms such as Twitter to disseminate information about the situation, warnings, advice, and support. Emergency relief organizations leverage such information to acquire timely crisis circumstances and expedite rescue operations. While existing works utilize such information to build models for crisis event analysis, fully-supervised approaches require annotating vast amounts of data and are impractical due to limited response time. On the other hand, semi-supervised models can be biased, performing moderately well for certain classes while performing extremely poorly for others, resulting in substantially negative effects on disaster monitoring and rescue. In this paper, we first study two recent debiasing methods on semi-supervised crisis tweet classification. Then we propose a simple but effective debiasing method, DeCrisisMB, that utilizes a Memory Bank to store and perform equal sampling for generated pseudo-labels from each class at each training iteration. Extensive experiments are conducted to compare different debiasing methods' performance and generalization ability in both in-distribution and out-of-distribution settings. The results demonstrate the superior performance of our proposed method. Our code is available at https://github.com/HenryPengZou/DeCrisisMB.
Keyphrase Generation Beyond the Boundaries of Title and Abstract
Garg, Krishna, Chowdhury, Jishnu Ray, Caragea, Cornelia
Keyphrase generation aims at generating important phrases (keyphrases) that best describe a given document. In scholarly domains, current approaches have largely used only the title and abstract of the articles to generate keyphrases. In this paper, we comprehensively explore whether the integration of additional information from the full text of a given article or from semantically similar articles can be helpful for a neural keyphrase generation model or not. We discover that adding sentences from the full text, particularly in the form of the extractive summary of the article can significantly improve the generation of both types of keyphrases that are either present or absent from the text. Experimental results with three widely used models for keyphrase generation along with one of the latest transformer models suitable for longer documents, Longformer Encoder-Decoder (LED) validate the observation. We also present a new large-scale scholarly dataset FullTextKP for keyphrase generation. Unlike prior large-scale datasets, FullTextKP includes the full text of the articles along with the title and abstract. We release the source code at https://github.com/kgarg8/FullTextKP.
Neural network approximation and estimation of classifiers with classification boundary in a Barron class
Caragea, Andrei, Petersen, Philipp, Voigtlaender, Felix
This article concerns the approximation and statistical estimation of high-dimensional, discontinuous functions by neural networks. More precisely, we study a certain class of target functions for classification problems, such as those encountered when automatically labeling images. For such problems, deep learning methods--based on the training of deep neural networks with gradient-based methods--achieve state of the art performance [32, 34]. The underlying functional relationship of such an (image) classification task is typically extremely high-dimensional. For example, the most widely used image databases used to benchmark classification algorithms are MNIST [35] with 28 28 pixels per image, CIFAR-10/CIFAR-100 [31] with 32 32 pixels per image and ImageNet [14, 32] which contains high-resolution images that are typically down-sampled to 256 256 pixels. Compared to practical applications, these benchmark datasets are relatively low-dimensional. Yet, already for MNIST, the simplest of those databases, the input dimension for the classification function is d 784. It is well known in classical approximation theory that high-dimensional approximation problems typically suffer from the so-called curse of dimensionality [11,40].