Trstanova, Zofia
Multilingual Disinformation Detection for Digital Advertising
Trstanova, Zofia, Manouzi, Nadir El, Chen, Maryline, da Cunha, Andre L. V., Ivanov, Sergei
In today's world, the presence of online disinformation and propaganda is more widespread than ever. Independent publishers are funded mostly via digital advertising, which is unfortunately also the case for those publishing disinformation content. The question of how to remove such publishers from advertising inventory has long been ignored, despite the negative impact on the open internet. In this work, we make the first step towards quickly detecting and red-flagging websites that potentially manipulate the public with disinformation. We build a machine learning model based on multilingual text embeddings that first determines whether the page mentions a topic of interest, then estimates the likelihood of the content being malicious, creating a shortlist of publishers that will be reviewed by human experts. Our system empowers internal teams to proactively, rather than defensively, blacklist unsafe content, thus protecting the reputation of the advertisement provider.
TATi-Thermodynamic Analytics ToolkIt: TensorFlow-based software for posterior sampling in machine learning applications
Heber, Frederik, Trstanova, Zofia, Leimkuhler, Benedict
The fundamental role of neural networks (NNs) is readily apparent from their widespread use in machine learning in applications such as natural language processing [72], social network analysis [26], medical diagnosis [6, 35], vision systems [66], and robotic path planning [44]. The greatest success of these models lies in their flexibility, their ability to represent complex, nonlinear relationships in high-dimensional data sets, and the availability of frameworks that allow NNs to be implemented on rapidly evolving GPU platforms [40, 29]. The industrial appetite for deep learning has led to very rapid expansion of the subject in recent years, although, as pointed out by Dunson [19], at times the mathematical and theoretical understanding of these methods has been swept aside in the rush to advance the methodology. The potential impact on society of machine learning algorithms demands that their exposition and use be subject to the highest standards of clarity, ease of interpretation, and uncertainty quantification. Typical NN training seeks to optimize the parameters of the network (biases and weights) under the constraint that the training data set is well approximated [28, 23].