Text Classification
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.05)
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.64)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Supervised Word Mover's Distance
Gao Huang, Chuan Guo, Matt J. Kusner, Yu Sun, Fei Sha, Kilian Q. Weinberger
Recently, a new document metric called the word mover's distance (WMD) has been proposed with unprecedented results on k NN-based document classification. The WMD elevates high-quality word embeddings to a document metric by formulating the distance between two documents as an optimal transport problem between the embedded words. However, the document distances are entirely unsupervised and lack a mechanism to incorporate supervision when available. In this paper we propose an efficient technique to learn a supervised metric, which we call the Supervised-WMD (S-WMD) metric.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Europe > Portugal > Lisbon > Lisbon (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.90)
- Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.88)
Mind the Gap: Bridging Prior Shift in Realistic Few-Shot Crop-Type Classification
Reuss, Joana, Gikalo, Ekaterina, Körner, Marco
Real-world agricultural distributions often suffer from severe class imbalance, typically following a long-tailed distribution. Labeled datasets for crop-type classification are inherently scarce and remain costly to obtain. When working with such limited data, training sets are frequently constructed to be artificially balanced -- in particular in the case of few-shot learning -- failing to reflect real-world conditions. This mismatch induces a shift between training and test label distributions, degrading real-world generalization. To address this, we propose Dirichlet Prior Augmentation (DirPA), a novel method that simulates an unknown label distribution skew of the target domain proactively during model training. Specifically, we model the real-world distribution as Dirichlet-distributed random variables, effectively performing a prior augmentation during few-shot learning. Our experiments show that DirPA successfully shifts the decision boundary and stabilizes the training process by acting as a dynamic feature regularizer.
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
- Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.62)
- North America > United States > California > Santa Clara County > Stanford (0.05)
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
- North America > United States > California > Santa Clara County > Mountain View (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.83)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.78)
Learn to Select: Exploring Label Distribution Divergence for In-Context Demonstration Selection in Text Classification
Jiang, Ye, Wang, Taihang, Liu, Youzheng, Wang, Yimin, Xia, Yuhan, Long, Yunfei
In-context learning (ICL) for text classification, which uses a few input-label demonstrations to describe a task, has demonstrated impressive performance on large language models (LLMs). However, the selection of in-context demonstrations plays a crucial role and can significantly affect LLMs' performance. Most existing demonstration selection methods primarily focus on semantic similarity between test inputs and demonstrations, often overlooking the importance of label distribution alignment. To address this limitation, we propose a two-stage demonstration selection method, TopK + Label Distribution Divergence (L2D), which leverages a fine-tuned BERT-like small language model (SLM) to generate label distributions and calculate their divergence for both test inputs and candidate demonstrations. This enables the selection of demonstrations that are not only semantically similar but also aligned in label distribution with the test input. Extensive experiments across seven text classification benchmarks show that our method consistently outperforms previous demonstration selection strategies. Further analysis reveals a positive correlation between the performance of LLMs and the accuracy of the underlying SLMs used for label distribution estimation.
- Asia > China > Shandong Province > Qingdao (0.04)
- North America > Canada (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
CogL TX: Applying BERT to Long Texts
BERT is incapable of processing long texts due to its quadratically increasing memory and time consumption. The most natural ways to address this problem, such as slicing the text by a sliding window or simplifying transformers, suffer from insufficient long-range attentions or need customized CUDA kernels.
- North America > Canada (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.66)
3acb2a202ae4bea8840224e6fce16fd0-AuthorFeedback.pdf
We thank the reviewers for their insightful and useful feedback! 's primary concern is the gap between the performance of our BERT + prism model and SOT A Other points: R1: "The authors do not fully describe the various hypotheses they imply..." R2: Prism layer transforms are fixed; how does this compare to a learned transform? MLM task may lead to all the frequency bands becoming local. R2: "In fig.5, why is BERT+prism worse for indices outside [200, 300]?" R3: "precise choice on where to use the prism layer raises some questions..." R3: "The way of dividing the embeddings into 5 sectors seems a bit naive" We will note this in the paper, and that there is opportunity for future work! R4: "It would be nice to see ablations where you use high filters on POS tagging and low filters on para-31 R4: "As a sanity check, you could try to see what happens if you don't finetune the initial BERT model on The original BERT model achieves an accuracy of 94.6% for POS tagging, 41.8 for dialog acts, 28.9 for topic classification, slightly worse than our model that was trained longer on R4: "Since Figure 5 demonstrates good performance on long range masked language modeling, LAMBADA
Association via Entropy Reduction
Gamst, Anthony, Wilson, Lawrence
Prior to recent successes using neural networks, term frequency-inverse document frequency (tf-idf) was clearly regarded as the best choice for identifying documents related to a query. We provide a different score, aver, and observe, on a dataset with ground truth marking for association, that aver does do better at finding assciated pairs than tf-idf. This example involves finding associated vertices in a large graph and that may be an area where neural networks are not currently an obvious best choice. Beyond this one anecdote, we observe that (1) aver has a natural threshold for declaring pairs as unassociated while tf-idf does not, (2) aver can distinguish between pairs of documents for which tf-idf gives a score of 1.0, (3) aver can be applied to larger collections of documents than pairs while tf-idf cannot, and (4) that aver is derived from entropy under a simple statistical model while tf-idf is a construction designed to achieve a certain goal and hence aver may be more "natural." To be fair, we also observe that (1) writing down and computing the aver score for a pair is more complex than for tf-idf and (2) that the fact that the aver score is naturally scale-free makes it more complicated to interpret aver scores.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.61)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.50)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.44)
An Efficient Classification Model for Cyber Text
Hossen, Md Sakhawat, Borshon, Md. Zashid Iqbal, Badrudduza, A. S. M.
The uprising of deep learning methodology and practice in recent years has brought about a severe consequence of increasing carbon footprint due to the insatiable demand for computational resources and power. The field of text analytics also experienced a massive transformation in this trend of monopolizing methodology. In this paper, the original TF-IDF algorithm has been modified, and Clement Term Frequency-Inverse Document Frequency (CTF-IDF) has been proposed for data preprocessing. This paper primarily discusses the effectiveness of classical machine learning techniques in text analytics with CTF-IDF and a faster IRLBA algorithm for dimensionality reduction. The introduction of both of these techniques in the conventional text analytics pipeline ensures a more efficient, faster, and less computationally intensive application when compared with deep learning methodology regarding carbon footprint, with minor compromise in accuracy. The experimental results also exhibit a manifold of reduction in time complexity and improvement of model accuracy for the classical machine learning methods discussed further in this paper.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Middle East > Jordan (0.04)
- Asia > Bangladesh (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- (3 more...)