clickbait
An Interpretable Benchmark for Clickbait Detection and Tactic Attribution
Nofar, Lihi, Portal, Tomer, Elbaz, Aviv, Apartsin, Alexander, Aperstein, Yehudit
The proliferation of clickbait headlines poses significant challenges to the credibility of information and user trust in digital media. While recent advances in machine learning have improved the detection of manipulative content, the lack of explainability limits their practical adoption. This paper presents a model for explainable clickbait detection that not only identifies clickbait titles but also attributes them to specific linguistic manipulation strategies. We introduce a synthetic dataset generated by systematically augmenting real news headlines using a predefined catalogue of clickbait strategies. This dataset enables controlled experimentation and detailed analysis of model behaviour. We present a two - stage framework for automatic clickbait analysis comprising detection and tactic attribution. In the first stage, we compare a fine - tuned BERT classifier with large language models (LLMs), specifically GPT - 4.0 and Gemini 2.4 Flash, under both zero - shot prompting and few - shot prompting enriched with illustrative clickbait headlines and their associated persuasive tactics. In the second stage, a dedicated BERT - based classifier predicts the specific clickbait strategies present in each headline. We share the dataset with the research community at https://github.com/LLM - HITCS25S/ClickbaitTacticsDetection The widespread use of clickbait headlines in digital media has become a pervasive challenge, undermining the credibility of information and exploiting user attention through manipulative linguistic techniques. While automated systems for detecting clickbait have improved in recent years, their focus has remained mainly on binary classification, simply labelling content as clickbait or not. However, effective mitigation of such content requires going beyond detection to understanding how and why certain headlines manipulate readers. Specifically, it is crucial to evaluate whether current AI models can accurately recognize and distinguish the diverse linguistic styles and persuasive strategies commonly employed in clickbait.
- North America > Haiti (0.14)
- Europe > Romania (0.04)
- Asia > Middle East > Republic of Türkiye > Ankara Province > Ankara (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
What Makes You CLIC: Detection of Croatian Clickbait Headlines
Anđelić, Marija, Šipek, Dominik, Majer, Laura, Šnajder, Jan
Online news outlets operate predominantly on an advertising-based revenue model, compelling journalists to create headlines that are often scandalous, intriguing, and provocative -- commonly referred to as clickbait. Automatic detection of clickbait headlines is essential for preserving information quality and reader trust in digital media and requires both contextual understanding and world knowledge. For this task, particularly in less-resourced languages, it remains unclear whether fine-tuned methods or in-context learning (ICL) yield better results. In this paper, we compile CLIC, a novel dataset for clickbait detection of Croatian news headlines spanning a 20-year period and encompassing mainstream and fringe outlets. We fine-tune the BERTić model on this task and compare its performance to LLM-based ICL methods with prompts both in Croatian and English. Finally, we analyze the linguistic properties of clickbait. We find that nearly half of the analyzed headlines contain clickbait, and that finetuned models deliver better results than general LLMs.
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (5 more...)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
Te Ahorré Un Click: A Revised Definition of Clickbait and Detection in Spanish News
Mordecki, Gabriel, Moncecchi, Guillermo, Couto, Javier
We revise the definition of clickbait, which lacks current consensus, and argue that the creation of a curiosity gap is the key concept that distinguishes clickbait from other related phenomena such as sensationalism and headlines that do not deliver what they promise or diverge from the article. Therefore, we propose a new definition: clickbait is a technique for generating headlines and teasers that deliberately omit part of the information with the goal of raising the readers' curiosity, capturing their attention and enticing them to click. We introduce a new approach to clickbait detection datasets creation, by refining the concept limits and annotations criteria, minimizing the subjectivity in the decision as much as possible. Following it, we created and release TA1C (for Te Ahorré Un Click, Spanish for Saved You A Click), the first open source dataset for clickbait detection in Spanish. It consists of 3,500 tweets coming from 18 well known media sources, manually annotated and reaching a 0.825 Fleiss' κ inter annotator agreement. We implement strong baselines that achieve 0.84 in F1-score.
- South America > Peru (0.14)
- South America > Argentina (0.04)
- Oceania > Palau (0.04)
- (8 more...)
- Media > News (1.00)
- Marketing (1.00)
- Leisure & Entertainment > Sports (1.00)
- (2 more...)
Baitradar: A Multi-Model Clickbait Detection Algorithm Using Deep Learning
Gamage, Bhanuka, Labib, Adnan, Joomun, Aisha, Lim, Chern Hong, Wong, KokSheik
Following the rising popularity of YouTube, there is an emerging problem on this platform called clickbait, which provokes users to click on videos using attractive titles and thumbnails. As a result, users ended up watching a video that does not have the content as publicized in the title. This issue is addressed in this study by proposing an algorithm called BaitRadar, which uses a deep learning technique where six inference models are jointly consulted to make the final classification decision. These models focus on different attributes of the video, including title, comments, thumbnail, tags, video statistics and audio transcript. The final classification is attained by computing the average of multiple models to provide a robust and accurate output even in situation where there is missing data. The proposed method is tested on 1,400 YouTube videos. On average, a test accuracy of 98% is achieved with an inference time of less than 2s.
- North America > United States (0.04)
- Asia > Malaysia (0.04)
Multimodal Clickbait Detection by De-confounding Biases Using Causal Representation Inference
Yu, Jianxing, Wang, Shiqi, Yin, Han, Sun, Zhenlong, Xie, Ruobing, Zhang, Bo, Rao, Yanghui
This paper focuses on detecting clickbait posts on the Web. These posts often use eye-catching disinformation in mixed modalities to mislead users to click for profit. That affects the user experience and thus would be blocked by content provider. To escape detection, malicious creators use tricks to add some irrelevant non-bait content into bait posts, dressing them up as legal to fool the detector. This content often has biased relations with non-bait labels, yet traditional detectors tend to make predictions based on simple co-occurrence rather than grasping inherent factors that lead to malicious behavior. This spurious bias would easily cause misjudgments. To address this problem, we propose a new debiased method based on causal inference. We first employ a set of features in multiple modalities to characterize the posts. Considering these features are often mixed up with unknown biases, we then disentangle three kinds of latent factors from them, including the invariant factor that indicates intrinsic bait intention; the causal factor which reflects deceptive patterns in a certain scenario, and non-causal noise. By eliminating the noise that causes bias, we can use invariant and causal factors to build a robust model with good generalization ability. Experiments on three popular datasets show the effectiveness of our approach.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > Austria > Vienna (0.14)
- (21 more...)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- (5 more...)
What Drives Online Popularity: Author, Content or Sharers? Estimating Spread Dynamics with Bayesian Mixture Hawkes
Calderon, Pio, Rizoiu, Marian-Andrei
The spread of content on social media is shaped by intertwining factors on three levels: the source, the content itself, and the pathways of content spread. At the lowest level, the popularity of the sharing user determines its eventual reach. However, higher-level factors such as the nature of the online item and the credibility of its source also play crucial roles in determining how widely and rapidly the online item spreads. In this work, we propose the Bayesian Mixture Hawkes (BMH) model to jointly learn the influence of source, content and spread. We formulate the BMH model as a hierarchical mixture model of separable Hawkes processes, accommodating different classes of Hawkes dynamics and the influence of feature sets on these classes. We test the BMH model on two learning tasks, cold-start popularity prediction and temporal profile generalization performance, applying to two real-world retweet cascade datasets referencing articles from controversial and traditional media publishers. The BMH model outperforms the state-of-the-art models and predictive baselines on both datasets and utilizes cascade- and item-level information better than the alternatives. Lastly, we perform a counter-factual analysis where we apply the trained publisher-level BMH models to a set of article headlines and show that effectiveness of headline writing style (neutral, clickbait, inflammatory) varies across publishers. The BMH model unveils differences in style effectiveness between controversial and reputable publishers, where we find clickbait to be notably more effective for reputable publishers as opposed to controversial ones, which links to the latter's overuse of clickbait.
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States (0.04)
- Europe > United Kingdom (0.04)
Generating clickbait spoilers with an ensemble of large language models
Woźny, Mateusz, Lango, Mateusz
Clickbait posts are a widespread problem in the webspace. The generation of spoilers, i.e. short texts that neutralize clickbait by providing information that satisfies the curiosity induced by it, is one of the proposed solutions to the problem. Current state-of-the-art methods are based on passage retrieval or question answering approaches and are limited to generating spoilers only in the form of a phrase or a passage. In this work, we propose an ensemble of fine-tuned large language models for clickbait spoiler generation. Our approach is not limited to phrase or passage spoilers, but is also able to generate multipart spoilers that refer to several non-consecutive parts of text. Experimental evaluation demonstrates that the proposed ensemble model outperforms the baselines in terms of BLEU, METEOR and BERTScore metrics.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Poland > Greater Poland Province > Poznań (0.05)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- (3 more...)
Mitigating Clickbait: An Approach to Spoiler Generation Using Multitask Learning
Pal, Sayantan, Das, Souvik, Srihari, Rohini K.
This study introduces 'clickbait spoiling', a novel technique designed to detect, categorize, and generate spoilers as succinct text responses, countering the curiosity induced by clickbait content. By leveraging a multi-task learning framework, our model's generalization capabilities are significantly enhanced, effectively addressing the pervasive issue of clickbait. The crux of our research lies in generating appropriate spoilers, be it a phrase, an extended passage, or multiple, depending on the spoiler type required. Our methodology integrates two crucial techniques: a refined spoiler categorization method and a modified version of the Question Answering (QA) mechanism, incorporated within a multi-task learning paradigm for optimized spoiler extraction from context. Notably, we have included fine-tuning methods for models capable of handling longer sequences to accommodate the generation of extended spoilers. This research highlights the potential of sophisticated text processing techniques in tackling the omnipresent issue of clickbait, promising an enhanced user experience in the digital realm.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (5 more...)
Maintaining Journalistic Integrity in the Digital Age: A Comprehensive NLP Framework for Evaluating Online News Content
Bojic, Ljubisa, Prodanovic, Nikola, Samala, Agariadne Dwinggo
The rapid growth of online news platforms has led to an increased need for reliable methods to evaluate the quality and credibility of news articles. This paper proposes a comprehensive framework to analyze online news texts using natural language processing (NLP) techniques, particularly a language model specifically trained for this purpose, alongside other well-established NLP methods. The framework incorporates ten journalism standards-objectivity, balance and fairness, readability and clarity, sensationalism and clickbait, ethical considerations, public interest and value, source credibility, relevance and timeliness, factual accuracy, and attribution and transparency-to assess the quality of news articles. By establishing these standards, researchers, media organizations, and readers can better evaluate and understand the content they consume and produce. The proposed method has some limitations, such as potential difficulty in detecting subtle biases and the need for continuous updating of the language model to keep pace with evolving language patterns.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Europe > Serbia > Vojvodina > South Bačka District > Novi Sad (0.04)
- North America > United States > New York (0.04)
- (8 more...)
Not all Fake News is Written: A Dataset and Analysis of Misleading Video Headlines
Sung, Yoo Yeon, Boyd-Graber, Jordan, Hassan, Naeemul
Polarization and the marketplace for impressions have conspired to make navigating information online difficult for users, and while there has been a significant effort to detect false or misleading text, multimodal datasets have received considerably less attention. To complement existing resources, we present multimodal Video Misleading Headline (VMH), a dataset that consists of videos and whether annotators believe the headline is representative of the video's contents. After collecting and annotating this dataset, we analyze multimodal baselines for detecting misleading headlines. Our annotation process also focuses on why annotators view a video as misleading, allowing us to better understand the interplay of annotators' background and the content of the videos.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Maryland (0.04)
- (9 more...)
- Media > News (1.00)
- Government > Regional Government > North America Government > United States Government (0.67)