Large Language Model
Spam-T5: Benchmarking Large Language Models for Few-Shot Email Spam Detection
This paper investigates the effectiveness of large language models (LLMs) in email spam detection by comparing prominent models from three distinct families: BERT-like, Sentence Transformers, and Seq2Seq. Additionally, we examine well-established machine learning techniques for spam detection, such as Na\"ive Bayes and LightGBM, as baseline methods. We assess the performance of these models across four public datasets, utilizing different numbers of training samples (full training set and few-shot settings). Our findings reveal that, in the majority of cases, LLMs surpass the performance of the popular baseline techniques, particularly in few-shot scenarios. This adaptability renders LLMs uniquely suited to spam detection tasks, where labeled samples are limited in number and models require frequent updates. Additionally, we introduce Spam-T5, a Flan-T5 model that has been specifically adapted and fine-tuned for the purpose of detecting email spam. Our results demonstrate that Spam-T5 surpasses baseline models and other LLMs in the majority of scenarios, particularly when there are a limited number of training samples available. Our code is publicly available at https://github.com/jpmorganchase/emailspamdetection.
From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping
Wang, Junyang, Yan, Ming, Zhang, Yi, Sang, Jitao
With the development of Vision-Language Pre-training Models (VLPMs) represented by CLIP and ALIGN, significant breakthroughs have been achieved for association-based visual tasks such as image classification and image-text retrieval by the zero-shot capability of CLIP without fine-tuning. However, CLIP is hard to apply to generation-based tasks. This is due to the lack of decoder architecture and pre-training tasks for generation. Although previous works have created generation capacity for CLIP through additional language models, a modality gap between the CLIP representations of different modalities and the inability of CLIP to model the offset of this gap, which fails the concept to transfer across modalities. To solve the problem, we try to map images/videos to the language modality and generate captions from the language modality. In this paper, we propose the K-nearest-neighbor Cross-modality Mapping (Knight), a zero-shot method from association to generation. With text-only unsupervised training, Knight achieves State-of-the-Art performance in zero-shot methods for image captioning and video captioning. Our code is available at https://github.com/junyangwang0410/Knight.
Perception, performance, and detectability of conversational artificial intelligence across 32 university courses
Ibrahim, Hazem, Liu, Fengyuan, Asim, Rohail, Battu, Balaraju, Benabderrahmane, Sidahmed, Alhafni, Bashar, Adnan, Wifag, Alhanai, Tuka, AlShebli, Bedoor, Baghdadi, Riyadh, Bรฉlanger, Jocelyn J., Beretta, Elena, Celik, Kemal, Chaqfeh, Moumena, Daqaq, Mohammed F., Bernoussi, Zaynab El, Fougnie, Daryl, de Soto, Borja Garcia, Gandolfi, Alberto, Gyorgy, Andras, Habash, Nizar, Harris, J. Andrew, Kaufman, Aaron, Kirousis, Lefteris, Kocak, Korhan, Lee, Kangsan, Lee, Seungah S., Malik, Samreen, Maniatakos, Michail, Melcher, David, Mourad, Azzam, Park, Minsu, Rasras, Mahmoud, Reuben, Alicja, Zantout, Dania, Gleason, Nancy W., Makovi, Kinga, Rahwan, Talal, Zaki, Yasir
The emergence of large language models has led to the development of powerful tools such as ChatGPT that can produce text indistinguishable from human-generated work. With the increasing accessibility of such technology, students across the globe may utilize it to help with their school work -- a possibility that has sparked discussions on the integrity of student evaluations in the age of artificial intelligence (AI). To date, it is unclear how such tools perform compared to students on university-level courses. Further, students' perspectives regarding the use of such tools, and educators' perspectives on treating their use as plagiarism, remain unknown. Here, we compare the performance of ChatGPT against students on 32 university-level courses. We also assess the degree to which its use can be detected by two classifiers designed specifically for this purpose. Additionally, we conduct a survey across five countries, as well as a more in-depth survey at the authors' institution, to discern students' and educators' perceptions of ChatGPT's use. We find that ChatGPT's performance is comparable, if not superior, to that of students in many courses. Moreover, current AI-text classifiers cannot reliably detect ChatGPT's use in school work, due to their propensity to classify human-written answers as AI-generated, as well as the ease with which AI-generated text can be edited to evade detection. Finally, we find an emerging consensus among students to use the tool, and among educators to treat this as plagiarism. Our findings offer insights that could guide policy discussions addressing the integration of AI into educational frameworks.
Examining Temporalities on Stance Detection towards COVID-19 Vaccination
Mu, Yida, Jin, Mali, Bontcheva, Kalina, Song, Xingyi
Previous studies have highlighted the importance of vaccination as an effective strategy to control the transmission of the COVID-19 virus. It is crucial for policymakers to have a comprehensive understanding of the public's stance towards vaccination on a large scale. However, attitudes towards COVID-19 vaccination, such as pro-vaccine or vaccine hesitancy, have evolved over time on social media. Thus, it is necessary to account for possible temporal shifts when analysing these stances. This study aims to examine the impact of temporal concept drift on stance detection towards COVID-19 vaccination on Twitter. To this end, we evaluate a range of transformer-based models using chronological (split the training, validation and testing sets in the order of time) and random splits (randomly split these three sets) of social media data. Our findings demonstrate significant discrepancies in model performance when comparing random and chronological splits across all monolingual and multilingual datasets. Chronological splits significantly reduce the accuracy of stance classification. Therefore, real-world stance detection approaches need to be further refined to incorporate temporal factors as a key consideration.
World's Top AI Researchers Debate the Technology's Next Steps
KIGALI, Rwanda--Amid growing talk of the promise and peril of artificial intelligence, more than 2,000 researchers and engineers from around the world gathered in Rwanda this week to debate contrasting visions for the technology's future. One vision is to build ever-more-powerful systems such as ChatGPT that aim to exceed human intelligence to boost worker productivity and economic growth. The other is to create more-targeted, small-scale AI solutions to local and global challenges, including tackling climate change, improving healthcare and preserving biodiversity.
Refining the Responses of LLMs by Themselves
In this paper, we propose a simple yet efficient approach based on prompt engineering that leverages the large language model itself to optimize its answers without relying on auxiliary models. We introduce an iterative self-evaluating optimization mechanism, with the potential for improved output quality as iterations progress, removing the need for manual intervention. The experiment's findings indicate that utilizing our response refinement framework on the GPT-3.5 model yields results that are on par with, or even surpass, those generated by the cutting-edge GPT-4 model. Detailed implementation strategies and illustrative examples are provided to demonstrate the superiority of our proposed solution.
Professional Certification Benchmark Dataset: The First 500 Jobs For Large Language Models
The research creates a professional certification survey to test large language models and evaluate their employable skills. It compares the performance of two AI models, GPT-3 and Turbo-GPT3.5, on a benchmark dataset of 1149 professional certifications, emphasizing vocational readiness rather than academic performance. GPT-3 achieved a passing score (>70% correct) in 39% of the professional certifications without fine-tuning or exam preparation. The models demonstrated qualifications in various computer-related fields, such as cloud and virtualization, business analytics, cybersecurity, network setup and repair, and data analytics. Turbo-GPT3.5 scored 100% on the valuable Offensive Security Certified Professional (OSCP) exam. The models also displayed competence in other professional domains, including nursing, licensed counseling, pharmacy, and teaching. Turbo-GPT3.5 passed the Financial Industry Regulatory Authority (FINRA) Series 6 exam with a 70% grade without preparation. Interestingly, Turbo-GPT3.5 performed well on customer service tasks, suggesting potential applications in human augmentation for chatbots in call centers and routine advice services. The models also score well on sensory and experience-based tests such as wine sommelier, beer taster, emotional quotient, and body language reader. The OpenAI model improvement from Babbage to Turbo resulted in a median 60% better-graded performance in less than a few years. This progress suggests that focusing on the latest model's shortcomings could lead to a highly performant AI capable of mastering the most demanding professional certifications. We open-source the benchmark to expand the range of testable professional skills as the models improve or gain emergent capabilities.
Multitask Pre-training of Modular Prompt for Chinese Few-Shot Learning
Sun, Tianxiang, He, Zhengfu, Zhu, Qin, Qiu, Xipeng, Huang, Xuanjing
Prompt tuning is a parameter-efficient approach to adapting pre-trained language models to downstream tasks. Although prompt tuning has been shown to match the performance of full model tuning when training data is sufficient, it tends to struggle in few-shot learning settings. In this paper, we present Multi-task Pre-trained Modular Prompt (MP2) to boost prompt tuning for few-shot learning. MP2 is a set of combinable prompts pre-trained on 38 Chinese tasks. On downstream tasks, the pre-trained prompts are selectively activated and combined, leading to strong compositional generalization to unseen tasks. To bridge the gap between pre-training and fine-tuning, we formulate upstream and downstream tasks into a unified machine reading comprehension task. Extensive experiments under two learning paradigms, i.e., gradient descent and black-box tuning, show that MP2 significantly outperforms prompt tuning, full model tuning, and prior prompt pre-training methods in few-shot settings. In addition, we demonstrate that MP2 can achieve surprisingly fast and strong adaptation to downstream tasks by merely learning 8 parameters to combine the pre-trained modular prompts.
An Overview of AI and Blockchain Integration for Privacy-Preserving
Li, Zongwei, Kong, Dechao, Niu, Yuanzheng, Peng, Hongli, Li, Xiaoqi, Li, Wenkai
With the widespread attention and application of artificial intelligence (AI) and blockchain technologies, privacy protection techniques arising from their integration are of notable significance. In addition to protecting privacy of individuals, these techniques also guarantee security and dependability of data. This paper initially presents an overview of AI and blockchain, summarizing their combination along with derived privacy protection technologies. It then explores specific application scenarios in data encryption, de-identification, multi-tier distributed ledgers, and k-anonymity methods. Moreover, the paper evaluates five critical aspects of AI-blockchain-integration privacy protection systems, including authorization management, access control, data protection, network security, and scalability. Furthermore, it analyzes the deficiencies and their actual cause, offering corresponding suggestions. This research also classifies and summarizes privacy protection techniques based on AI-blockchain application scenarios and technical schemes. In conclusion, this paper outlines the future directions of privacy protection technologies emerging from AI and blockchain integration, including enhancing efficiency and security to achieve a more comprehensive privacy protection of privacy.
DiscoPrompt: Path Prediction Prompt Tuning for Implicit Discourse Relation Recognition
Chan, Chunkit, Liu, Xin, Cheng, Jiayang, Li, Zihan, Song, Yangqiu, Wong, Ginny Y., See, Simon
Implicit Discourse Relation Recognition (IDRR) is a sophisticated and challenging task to recognize the discourse relations between the arguments with the absence of discourse connectives. The sense labels for each discourse relation follow a hierarchical classification scheme in the annotation process (Prasad et al., 2008), forming a hierarchy structure. Most existing works do not well incorporate the hierarchy structure but focus on the syntax features and the prior knowledge of connectives in the manner of pure text classification. We argue that it is more effective to predict the paths inside the hierarchical tree (e.g., "Comparison -> Contrast -> however") rather than flat labels (e.g., Contrast) or connectives (e.g., however). We propose a prompt-based path prediction method to utilize the interactive information and intrinsic senses among the hierarchy in IDRR. This is the first work that injects such structure information into pre-trained language models via prompt tuning, and the performance of our solution shows significant and consistent improvement against competitive baselines.