Binary classification for perceived quality of headlines and links on worldwide news websites, 2018-2024
McCutcheon, Austin, de Oliveira, Thiago E. A., Zheleznov, Aleksandr, Brogly, Chris
–arXiv.org Artificial Intelligence
The proliferation of online news enables potential widespread publication of perceived low-quality news headlines/links. As a result, we investigated whether it was possible to automatically distinguish perceived lower-quality news headlines/links from perceived higher-quality headlines/links. We evaluated twelve machine learning models on a binary, balanced dataset of 57,544,214 worldwide news website links/headings from 2018-2024 (28,772,107 per class) with 115 extracted linguistic features. Binary labels for each text were derived from scores based on expert consensus regarding the respective news domain quality. Traditional ensemble methods, particularly the bagging classifier, had strong performance (88.1% accuracy, 88.3% F1, 80/20 train/test split). Fine-tuned DistilBERT achieved the highest accuracy (90.3%, 80/20 train/test split) but required more training time. The results suggest that both NLP features with traditional classifiers and deep learning models can effectively differentiate perceived news headline/link quality, with some trade-off between predictive performance and train time.
arXiv.org Artificial Intelligence
Jun-12-2025
- Country:
- North America > Canada > Ontario > Simcoe County > Orillia (0.05)
- Genre:
- Research Report > New Finding (0.88)
- Technology: