Goto

Collaborating Authors

 webgraph


Dredge Word, Social Media, and Webgraph Networks for Unreliable Website Classification and Identification

Williams, Evan M., Carragher, Peter, Carley, Kathleen M.

arXiv.org Artificial Intelligence

In an attempt to mimic the complex paths through which unreliable content spreads between search engines and social media, we explore the impact of incorporating both webgraph and large-scale social media contexts into website credibility classification and discovery systems. We further explore the usage of what we define as \textit{dredge words} on social media -- terms or phrases for which unreliable domains rank highly. Through comprehensive graph neural network ablations, we demonstrate that curriculum-based heterogeneous graph models that leverage context from both webgraphs and social media data outperform homogeneous and single-mode approaches. We further demonstrate that the incorporation of dredge words into our model strongly associates unreliable websites with social media and online commerce platforms. Finally, we show our heterogeneous model greatly outperforms competing systems in the top-k identification of unlabeled unreliable websites. We demonstrate the strong unreliability signals present in the diverse paths that users follow to uncover unreliable content, and we release a novel dataset of dredge words.


A Machine Learning Method to Block Ads Based on Local Browser Behavior

#artificialintelligence

Researchers in Switzerland and the US have devised a new machine learning approach to the detection of website advertising material that's based on the way such material interacts with the browser, rather than by analyzing its content or network behavior – two approaches which have proved ineffective in the long term in the face of CNAME cloaking (see below). Dubbed WebGraph, the framework uses a graph-based AI ad-blocking approach to detect promotional content by concentrating on such essential activities of network advertising – including telemetry attempts and local browser storage – that the only effective evasion technique would be to not conduct these activities. Though previous approaches have achieved slightly higher detection rates than WebGraph, all of them are prone to evasive techniques, while WebGraph is able to approach 100% integrity in the face of adversarial responses, including more sophisticated hypothesized responses that may emerge in the face of this novel ad-blocking method. The paper is led by two researchers from the Swiss Federal Institute of Technology, in concert with researchers from University of California, Davis and the University of Iowa. The work is a development from a 2020 research initiative with Brave browser called AdGraph, which featured two of the researchers from the new paper.