Goto

Collaborating Authors

 ferrara


Defining bias in AI-systems: Biased models are fair models

Lindloff, Chiara, Siegert, Ingo

arXiv.org Artificial Intelligence

The debate around bias in AI systems is central to discussions on algorithmic fairness. However, the term bias often lacks a clear definition, despite frequently being contrasted with fairness, implying that an unbiased model is inherently fair. In this paper, we challenge this assumption and argue that a precise conceptualization of bias is necessary to effectively address fairness concerns. Rather than viewing bias as inherently negative or unfair, we highlight the importance of distinguishing between bias and discrimination. We further explore how this shift in focus can foster a more constructive discourse within academic debates on fairness in AI systems.


GPS Is Vulnerable to Attack. Magnetic Navigation Can Help

WIRED

Far above your head, constellations of satellites are working constantly to provide the positioning, navigation, and timing systems that quietly run modern life. Known as the global navigation satellite system, or GNSS, signals from these satellites provide the foundation for mobile networks, energy grids, the internet, and GPS. And increasingly, their dependability is under threat. GPS signals can be jammed--deliberately drowned out with other powerful radio signals--and spoofed, where erroneous signals are released to fool positioning systems. GPS interference has been documented in Ukraine, the Middle East, and the South China Sea.


IOHunter: Graph Foundation Model to Uncover Online Information Operations

Minici, Marco, Luceri, Luca, Fabbri, Francesco, Ferrara, Emilio

arXiv.org Artificial Intelligence

Social media platforms have become vital spaces for public discourse, serving as modern agor\'as where a wide range of voices influence societal narratives. However, their open nature also makes them vulnerable to exploitation by malicious actors, including state-sponsored entities, who can conduct information operations (IOs) to manipulate public opinion. The spread of misinformation, false news, and misleading claims threatens democratic processes and societal cohesion, making it crucial to develop methods for the timely detection of inauthentic activity to protect the integrity of online discourse. In this work, we introduce a methodology designed to identify users orchestrating information operations, a.k.a. \textit{IO drivers}, across various influence campaigns. Our framework, named \texttt{IOHunter}, leverages the combined strengths of Language Models and Graph Neural Networks to improve generalization in \emph{supervised}, \emph{scarcely-supervised}, and \emph{cross-IO} contexts. Our approach achieves state-of-the-art performance across multiple sets of IOs originating from six countries, significantly surpassing existing approaches. This research marks a step toward developing Graph Foundation Models specifically tailored for the task of IO detection on social media platforms.


Generative Memesis: AI Mediates Political Memes in the 2024 USA Presidential Election

Chang, Ho-Chun Herbert, Shaman, Benjamin, Chen, Yung-chun, Zha, Mingyue, Noh, Sean, Wei, Chiyu, Weener, Tracy, Magee, Maya

arXiv.org Artificial Intelligence

Visual content on social media has become increasingly influential in shaping political discourse and civic engagement. Using a dataset of 239,526 Instagram images, deep learning, and LLM-based workflows, we examine the impact of different content types on user engagement during the 2024 US presidential Elections, with a focus on synthetic visuals. Results show while synthetic content may not increase engagement alone, it mediates how political information is created through highly effective, often absurd, political memes. We define the notion of generative memesis, where memes are no longer shared person-to-person but mediated by AI through customized, generated images. We also find partisan divergences: Democrats use AI for in-group support whereas Republicans use it for out-group attacks. Non-traditional, left-leaning outlets are the primary creators of political memes; emphasis on different topics largely follows issue ownership.


Large Language Models Reveal Information Operation Goals, Tactics, and Narrative Frames

Burghardt, Keith, Chen, Kai, Lerman, Kristina

arXiv.org Artificial Intelligence

Adversarial information operations can destabilize societies by undermining fair elections, manipulating public opinions on policies, and promoting scams. Despite their widespread occurrence and potential impacts, our understanding of influence campaigns is limited by manual analysis of messages and subjective interpretation of their observable behavior. In this paper, we explore whether these limitations can be mitigated with large language models (LLMs), using GPT-3.5 as a case-study for coordinated campaign annotation. We first use GPT-3.5 to scrutinize 126 identified information operations spanning over a decade. We utilize a number of metrics to quantify the close (if imperfect) agreement between LLM and ground truth descriptions. We next extract coordinated campaigns from two large multilingual datasets from X (formerly Twitter) that respectively discuss the 2022 French election and 2023 Balikaran Philippine-U.S. military exercise in 2023. For each coordinated campaign, we use GPT-3.5 to analyze posts related to a specific concern and extract goals, tactics, and narrative frames, both before and after critical events (such as the date of an election). While the GPT-3.5 sometimes disagrees with subjective interpretation, its ability to summarize and interpret demonstrates LLMs' potential to extract higher-order indicators from text to provide a more complete picture of the information campaigns compared to previous methods.


BotArtist: Twitter bot detection Machine Learning model based on Twitter suspension

Shevtsov, Alexander, Antonakaki, Despoina, Lamprou, Ioannis, Pratikakis, Polyvios, Ioannidis, Sotiris

arXiv.org Artificial Intelligence

Twitter as one of the most popular social networks, offers a means for communication and online discourse, which unfortunately has been the target of bots and fake accounts, leading to the manipulation and spreading of false information. Towards this end, we gather a challenging, multilingual dataset of social discourse on Twitter, originating from 9M users regarding the recent Russo-Ukrainian war, in order to detect the bot accounts and the conversation involving them. We collect the ground truth for our dataset through the Twitter API suspended accounts collection, containing approximately 343K of bot accounts and 8M of normal users. Additionally, we use a dataset provided by Botometer-V3 with 1,777 Varol, 483 German accounts, and 1,321 US accounts. Besides the publicly available datasets, we also manage to collect 2 independent datasets around popular discussion topics of the 2022 energy crisis and the 2022 conspiracy discussions. Both of the datasets were labeled according to the Twitter suspension mechanism. We build a novel ML model for bot detection using the state-of-the-art XGBoost model. We combine the model with a high volume of labeled tweets according to the Twitter suspension mechanism ground truth. This requires a limited set of profile features allowing labeling of the dataset in different time periods from the collection, as it is independent of the Twitter API. In comparison with Botometer our methodology achieves an average 11% higher ROC-AUC score over two real-case scenario datasets.


Exposing Influence Campaigns in the Age of LLMs: A Behavioral-Based AI Approach to Detecting State-Sponsored Trolls

Ezzeddine, Fatima, Luceri, Luca, Ayoub, Omran, Sbeity, Ihab, Nogara, Gianluca, Ferrara, Emilio, Giordano, Silvia

arXiv.org Artificial Intelligence

The detection of state-sponsored trolls operating in influence campaigns on social media is a critical and unsolved challenge for the research community, which has significant implications beyond the online realm. To address this challenge, we propose a new AI-based solution that identifies troll accounts solely through behavioral cues associated with their sequences of sharing activity, encompassing both their actions and the feedback they receive from others. Our approach does not incorporate any textual content shared and consists of two steps: First, we leverage an LSTM-based classifier to determine whether account sequences belong to a state-sponsored troll or an organic, legitimate user. Second, we employ the classified sequences to calculate a metric named the "Troll Score", quantifying the degree to which an account exhibits troll-like behavior. To assess the effectiveness of our method, we examine its performance in the context of the 2016 Russian interference campaign during the U.S. Presidential election. Our experiments yield compelling results, demonstrating that our approach can identify account sequences with an AUC close to 99% and accurately differentiate between Russian trolls and organic users with an AUC of 91%. Notably, our behavioral-based approach holds a significant advantage in the ever-evolving landscape, where textual and linguistic properties can be easily mimicked by Large Language Models (LLMs): In contrast to existing language-based techniques, it relies on more challenging-to-replicate behavioral cues, ensuring greater resilience in identifying influence campaigns, especially given the potential increase in the usage of LLMs for generating inauthentic content. Finally, we assessed the generalizability of our solution to various entities driving different information operations and found promising results that will guide future research.


Multimodal and Explainable Internet Meme Classification

Thakur, Abhinav Kumar, Ilievski, Filip, Sandlin, Hông-Ân, Sourati, Zhivar, Luceri, Luca, Tommasini, Riccardo, Mermoud, Alain

arXiv.org Artificial Intelligence

In the current context where online platforms have been effectively weaponized in a variety of geo-political events and social issues, Internet memes make fair content moderation at scale even more difficult. Existing work on meme classification and tracking has focused on black-box methods that do not explicitly consider the semantics of the memes or the context of their creation. In this paper, we pursue a modular and explainable architecture for Internet meme understanding. We design and implement multimodal classification methods that perform example- and prototype-based reasoning over training cases, while leveraging both textual and visual SOTA models to represent the individual cases. We study the relevance of our modular and explainable models in detecting harmful memes on two existing tasks: Hate Speech Detection and Misogyny Classification. We compare the performance between example- and prototype-based methods, and between text, vision, and multimodal models, across different categories of harmfulness (e.g., stereotype and objectification). We devise a user-friendly interface that facilitates the comparative analysis of examples retrieved by all of our models for any given meme, informing the community about the strengths and limitations of these explainable methods.


Leveraging Social Interactions to Detect Misinformation on Social Media

Fornaciari, Tommaso, Luceri, Luca, Ferrara, Emilio, Hovy, Dirk

arXiv.org Artificial Intelligence

Detecting misinformation threads is crucial to guarantee a healthy environment on social media. We address the problem using the data set created during the COVID-19 pandemic. It contains cascades of tweets discussing information weakly labeled as reliable or unreliable, based on a previous evaluation of the information source. The models identifying unreliable threads usually rely on textual features. But reliability is not just what is said, but by whom and to whom. We additionally leverage on network information. Following the homophily principle, we hypothesize that users who interact are generally interested in similar topics and spreading similar kind of news, which in turn is generally reliable or not. We test several methods to learn representations of the social interactions within the cascades, combining them with deep neural language models in a Multi-Input (MI) framework. Keeping track of the sequence of the interactions during the time, we improve over previous state-of-the-art models.


Ferrara

AAAI Conferences

Information spreading on social media contributes to the formation of collective opinions. Millions of social media users are exposed every day to popular memes -- some generated organically by grassroots activity, others sustained by advertising, information campaigns or more or less transparent coordinated efforts. While most information campaigns are benign, some may have nefarious purposes, including terrorist propaganda, political astroturf, and financial market manipulation. This poses a crucial technological challenge with deep social implications: can we detect whether the spreading of a viral meme is being sustained by a promotional campaign? Here we study trending memes that attract attention either organically, or by means of advertisement. We designed a machine learning framework capable to detect promoted campaigns and separate them from organic ones in their early stages. Using a dataset of millions of posts associated with trending Twitter hashtags, we prove that remarkably accurate early detection is possible, achieving 95% AUC score. Feature selection analysis reveals that network diffusion patterns and content cues are powerful early detection signals.