AITopics

2305.18324

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Ontario > Waterloo Region > Waterloo (0.14)
(7 more...)

Genre: Research Report > New Finding (0.46)

Industry: Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.86)

arXiv.org Artificial IntelligenceMay-21-2023

F-PABEE: Flexible-patience-based Early Exiting for Single-label and Multi-label text Classification Tasks

Gao, Xiangxiang, Zhu, Wei, Gao, Jiasheng, Yin, Congrui

Computational complexity and overthinking problems have become the bottlenecks for pre-training language models (PLMs) with millions or even trillions of parameters. A Flexible-Patience-Based Early Exiting method (F-PABEE) has been proposed to alleviate the problems mentioned above for single-label classification (SLC) and multi-label classification (MLC) tasks. F-PABEE makes predictions at the classifier and will exit early if predicted distributions of cross-layer are consecutively similar. It is more flexible than the previous state-of-the-art (SOTA) early exiting method PABEE because it can simultaneously adjust the similarity score thresholds and the patience parameters. Extensive experiments show that: (1) F-PABEE makes a better speedup-accuracy balance than existing early exiting strategies on both SLC and MLC tasks. (2) F-PABEE achieves faster inference and better performances on different PLMs such as BERT and ALBERT. (3) F-PABEE-JSKD performs best for F-PABEE with different similarity measures.

f-pabee, machine learning, natural language, (17 more...)

2305.11916

Country:

North America > United States > Pennsylvania (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Jiangxi Province > Nanchang (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.40)

arXiv.org Artificial IntelligenceMay-20-2023

Modeling the Q-Diversity in a Min-max Play Game for Robust Optimization

Wu, Ting, Zheng, Rui, Gui, Tao, Zhang, Qi, Huang, Xuanjing

Models trained with empirical risk minimization (ERM) are revealed to easily rely on spurious correlations, resulting in poor generalization. Group distributionally robust optimization (group DRO) can alleviate this problem by minimizing the worst-case loss over pre-defined groups. While promising, in practice factors like expensive annotations and privacy preclude the availability of group labels. More crucially, when taking a closer look at the failure modes of out-of-distribution generalization, the typical procedure of reweighting in group DRO loses efficiency. Hinged on the limitations, in this work, we reformulate the group DRO framework by proposing Q-Diversity. Characterized by an interactive training mode, Q-Diversity relaxes the group identification from annotation into direct parameterization. Furthermore, a novel mixing strategy across groups is presented to diversify the under-represented groups. In a series of experiments on both synthetic and real-world text classification tasks, results demonstrate that Q-Diversity can consistently improve worst-case accuracy under different distributional shifts, outperforming state-of-the-art alternatives.

machine learning, natural language, q-diversity, (20 more...)

2305.12123

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(8 more...)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.69)

Technology:

Information Technology > Communications > Social Media (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.35)

arXiv.org Artificial IntelligenceMay-19-2023

Empowering Sentence Encoders with Prompting and Label Retrieval for Zero-shot Text Classification

Hong, Jimin, Park, Jungsoo, Kim, Daeyoung, Choi, Seongjae, Son, Bokyung, Kang, Jaewook

With contrastive pre-training, sentence encoders are generally optimized to locate semantically similar samples closer to each other in their embedding spaces. In this work, we focus on the potential of their embedding spaces to be readily adapted to zero-shot text classification, as semantically distinct samples are already well-separated. Our framework, RaLP (Retrieval augmented Label Prompts for sentence encoder), encodes prompted label candidates with a sentence encoder, then assigns the label whose prompt embedding has the highest similarity with the input text embedding. In order to compensate for the potentially poorly descriptive labels in their original format, RaLP retrieves sentences that are semantically similar to the original label prompt from external corpora and use them as additional pseudo-label prompts. RaLP achieves competitive or stronger performance than much larger baselines on various closed-set classification and multiple-choice QA datasets under zero-shot settings. We show that the retrieval component plays a pivotal role in RaLP's success, and its results are robustly attained regardless of verbalizer variations.

large language model, natural language, text classification, (15 more...)

2212.10391

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (0.46)
Education (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

Improving Implicit Sentiment Learning via Local Sentiment Aggregation

Yang, Heng, Li, Ke

Aspect-based sentiment classification (ABSC) has revealed the potential dependency of sentiment polarities among different aspects. Our study further explores this phenomenon, positing that adjacent aspects often exhibit similar sentiments, a concept we term "aspect sentiment coherency." We argue that the current research landscape has not fully appreciated the significance of modeling aspect sentiment coherency. To address this gap, we introduce a local sentiment aggregation paradigm (LSA) that facilitates fine-grained sentiment coherency modeling. This approach enables the extraction of implicit sentiments for aspects lacking explicit sentiment descriptions. Leveraging gradient descent, we design a differential-weighted sentiment aggregation window that guides the modeling of aspect sentiment coherency. Experimental results affirm the efficacy of LSA in learning sentiment coherency, as it achieves state-of-the-art performance across three public datasets, thus significantly enhancing existing ABSC models. We have made our code available, providing a ready tool for existing methods to harness the potential of sentiment coherency information.

machine learning, natural language, sentiment coherency, (19 more...)

2110.08604

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Hong Kong (0.04)
North America > Dominican Republic (0.04)
(7 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.51)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
(2 more...)

DLUE: Benchmarking Document Language Understanding

Xu, Ruoxi, Lin, Hongyu, Guan, Xinyan, Han, Xianpei, Sun, Yingfei, Sun, Le

Understanding documents is central to many real-world tasks but remains a challenging topic. Unfortunately, there is no well-established consensus on how to comprehensively evaluate document understanding abilities, which significantly hinders the fair comparison and measuring the progress of the field. To benchmark document understanding researches, this paper summarizes four representative abilities, i.e., document classification, document structural analysis, document information extraction, and document transcription. Under the new evaluation framework, we propose \textbf{Document Language Understanding Evaluation} -- \textbf{DLUE}, a new task suite which covers a wide-range of tasks in various forms, domains and document genres. We also systematically evaluate six well-established transformer models on DLUE, and find that due to the lengthy content, complicated underlying structure and dispersed knowledge, document understanding is still far from being solved, and currently there is no neural architecture that dominates all tasks, raising requirements for a universal document understanding architecture.

machine learning, natural language, text classification, (18 more...)

2305.0952

Country:

Asia > China > Beijing > Beijing (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(2 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.34)

ContrastNet: A Contrastive Learning Framework for Few-Shot Text Classification

Chen, Junfan, Zhang, Richong, Mao, Yongyi, Xu, Jie

Few-shot text classification has recently been promoted by the meta-learning paradigm which aims to identify target classes with knowledge transferred from source classes with sets of small tasks named episodes. Despite their success, existing works building their meta-learner based on Prototypical Networks are unsatisfactory in learning discriminative text representations between similar classes, which may lead to contradictions during label prediction. In addition, the tasklevel and instance-level overfitting problems in few-shot text classification caused by a few training examples are not sufficiently tackled. In this work, we propose a contrastive learning framework named ContrastNet to tackle both discriminative representation and overfitting problems in few-shot text classification. ContrastNet learns to pull closer text representations belonging to the same class and push away text representations belonging to different classes, while simultaneously introducing unsupervised contrastive regularization at both task-level and instance-level to prevent overfitting. Experiments on 8 few-shot text classification datasets show that ContrastNet outperforms the current state-of-the-art models.

machine learning, natural language, text classification, (13 more...)

2305.09269

Country:

North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
North America > Canada > Ontario > National Capital Region > Ottawa (0.04)
Europe > United Kingdom > England > West Yorkshire > Leeds (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.86)

UOR: Universal Backdoor Attacks on Pre-trained Language Models

Du, Wei, Li, Peixuan, Li, Boqun, Zhao, Haodong, Liu, Gongshen

Backdoors implanted in pre-trained language models (PLMs) can be transferred to various downstream tasks, which exposes a severe security threat. However, most existing backdoor attacks against PLMs are un-targeted and task-specific. Few targeted and task-agnostic methods use manually pre-defined triggers and output representations, which prevent the attacks from being more effective and general. In this paper, we first summarize the requirements that a more threatening backdoor attack against PLMs should satisfy, and then propose a new backdoor attack method called UOR, which breaks the bottleneck of the previous approach by turning manual selection into automatic optimization. Specifically, we define poisoned supervised contrastive learning which can automatically learn the more uniform and universal output representations of triggers for various PLMs. Moreover, we use gradient search to select appropriate trigger words which can be adaptive to different PLMs and vocabularies. Experiments show that our method can achieve better attack performance on various text classification tasks compared to manual methods. Further, we tested our method on PLMs with different architectures, different usage paradigms, and more difficult tasks, which demonstrated the universality of our method. The source code of UOR will be released with the paper if accepted.

machine learning, natural language, trigger word, (17 more...)

2305.09574

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > California > Santa Clara County > Mountain View (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.34)

arXiv.org Artificial IntelligenceMay-15-2023

Pre-Training to Learn in Context

Gu, Yuxian, Dong, Li, Wei, Furu, Huang, Minlie

In-context learning, where pre-trained language models learn to perform tasks from task examples and instructions in their contexts, has attracted much attention in the NLP community. However, the ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context. To this end, we propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability by pre-training the model on a large collection of "intrinsic tasks" in the general plain-text corpus using the simple language modeling objective. PICL encourages the model to infer and perform tasks by conditioning on the contexts while maintaining task generalization of pre-trained models. We evaluate the in-context learning performance of the model trained with PICL on seven widely-used text classification datasets and the Super-NaturalInstrctions benchmark, which contains 100+ NLP tasks formulated to text generation. Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters. The code is publicly available at https://github.com/thu-coai/PICL.

large language model, machine learning, natural language, (21 more...)

2305.09137

Country:

South America > Argentina (0.04)
North America > United States > Wisconsin (0.04)
Asia > Middle East > Qatar (0.04)
(6 more...)

Genre: Research Report (0.50)

Industry:

Education (0.68)
Government (0.67)
Energy (0.67)
Leisure & Entertainment > Sports > Soccer (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.89)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.89)

Ma, Chunlan, ImaniGooghari, Ayyoob, Ye, Haotian, Asgari, Ehsaneddin, Schütze, Hinrich

Taxi1500: A Multilingual Dataset for Text Classification in 1500 Languages

arXiv.org Artificial IntelligenceMay-15-2023

While natural language processing tools have been developed extensively for some of the world's languages, a significant portion of the world's over 7000 languages are still neglected. One reason for this is that evaluation datasets do not yet cover a wide range of languages, including low-resource and endangered ones. We aim to address this issue by creating a text classification dataset encompassing a large number of languages, many of which currently have little to no annotated data available. We leverage parallel translations of the Bible to construct such a dataset by first developing applicable topics and employing a crowdsourcing tool to collect annotated data. By annotating the English side of the data and projecting the labels onto other languages through aligned verses, we generate text classification datasets for more than 1500 languages. We extensively benchmark several existing multilingual language models using our dataset. To facilitate the advancement of research in this area, we will release our dataset and code.

artificial intelligence, natural language, text classification, (18 more...)

2305.08487

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
(12 more...)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (0.88)