Binary classification for perceived quality of headlines and links on worldwide news websites, 2018-2024

McCutcheon, Austin, de Oliveira, Thiago E. A., Zheleznov, Aleksandr, Brogly, Chris

Jun-12-2025–arXiv.org Artificial Intelligence

The proliferation of online news enables potential widespread publication of perceived low-quality news headlines/links. As a result, we investigated whether it was possible to automatically distinguish perceived lower-quality news headlines/links from perceived higher-quality headlines/links. We evaluated twelve machine learning models on a binary, balanced dataset of 57,544,214 worldwide news website links/headings from 2018-2024 (28,772,107 per class) with 115 extracted linguistic features. Binary labels for each text were derived from scores based on expert consensus regarding the respective news domain quality. Traditional ensemble methods, particularly the bagging classifier, had strong performance (88.1% accuracy, 88.3% F1, 80/20 train/test split). Fine-tuned DistilBERT achieved the highest accuracy (90.3%, 80/20 train/test split) but required more training time. The results suggest that both NLP features with traditional classifiers and deep learning models can effectively differentiate perceived news headline/link quality, with some trade-off between predictive performance and train time.

accuracy, artificial intelligence, machine learning, (14 more...)

arXiv.org Artificial Intelligence

Jun-12-2025

arXiv.org PDF

Add feedback

Country:
- North America > Canada (0.16)

Genre:
- Research Report > New Finding (0.88)

Industry:
- Media > News (0.68)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning (1.00)
  - Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found