P, Deepak
News about Global North considered Truthful! The Geo-political Veracity Gradient in Global South News
Mandava, Sujit, P, Deepak, Bhadra, Sahely
While there has been much research into developing AI techniques for fake news detection aided by various benchmark datasets, it has often been pointed out that fake news in different geo-political regions traces different contours. In this work we uncover, through analytical arguments and empirical evidence, the existence of an important characteristic in news originating from the Global South viz., the geo-political veracity gradient. In particular, we show that Global South news about topics from Global North -- such as news from an Indian news agency on US elections -- tend to be less likely to be fake. Observing through the prism of the political economy of fake news creation, we posit that this pattern could be due to the relative lack of monetarily aligned incentives in producing fake news about a different region than the regional remit of the audience. We provide empirical evidence for this from benchmark datasets. We also empirically analyze the consequences of this effect in applying AI-based fake news detection models for fake news AI trained on one region within another regional context. We locate our work within emerging critical scholarship on geo-political biases within AI in general, particularly with AI usage in fake news identification; we hope our insight into the geo-political veracity gradient could help steer fake news AI scholarship towards positively impacting Global South societies.
AI Safety: Necessary, but insufficient and possibly problematic
P, Deepak
This article critically examines the recent hype around AI safety. We first start with noting the nature of the AI safety hype as being dominated by governments and corporations, and contrast it with other avenues within AI research on advancing social good. We consider what 'AI safety' actually means, and outline the dominant concepts that the digital footprint of AI safety aligns with. We posit that AI safety has a nuanced and uneasy relationship with transparency and other allied notions associated with societal good, indicating that it is an insufficient notion if the goal is that of societal good in a broad sense. We note that the AI safety debate has already influenced some regulatory efforts in AI, perhaps in not so desirable directions. We also share our concerns on how AI safety may normalize AI that advances structural harm through providing exploitative and harmful AI with a veneer of safety.
AI and Core Electoral Processes: Mapping the Horizons
P, Deepak, Simoes, Stanley, MacCarthaigh, Muiris
Significant enthusiasm around AI uptake has been witnessed across societies globally. The electoral process -- the time, place and manner of elections within democratic nations -- has been among those very rare sectors in which AI has not penetrated much. Electoral management bodies in many countries have recently started exploring and deliberating over the use of AI in the electoral process. In this paper, we consider five representative avenues within the core electoral process which have potential for AI usage, and map the challenges involved in using AI within them. These five avenues are: voter list maintenance, determining polling booth locations, polling booth protection processes, voter authentication and video monitoring of elections. Within each of these avenues, we lay down the context, illustrate current or potential usage of AI, and discuss extant or potential ramifications of AI usage, and potential directions for mitigating risks while considering AI usage. We believe that the scant current usage of AI within electoral processes provides a very rare opportunity, that of being able to deliberate on the risks and mitigation possibilities, prior to real and widespread AI deployment. This paper is an attempt to map the horizons of risks and opportunities in using AI within the electoral processes and to help shape the debate around the topic.
Gender Bias in Fake News: An Analysis
Sahadevan, Navya, P, Deepak
Data science research into fake news has gathered much momentum in recent years, arguably facilitated by the emergence of large public benchmark datasets. While it has been well-established within media studies that gender bias is an issue that pervades news media, there has been very little exploration into the relationship between gender bias and fake news. In this work, we provide the first empirical analysis of gender bias vis-a-vis fake news, leveraging simple and transparent lexicon-based methods over public benchmark datasets. Our analysis establishes the increased prevalance of gender bias in fake news across three facets viz., abundance, affect and proximal words. The insights from our analysis provide a strong argument that gender bias needs to be an important consideration in research into fake news.
FiSH: Fair Spatial Hotspots
P, Deepak, Sundaram, Sowmya S
Pervasiveness of tracking devices and enhanced availability of spatially located data has deepened interest in using them for various policy interventions, through computational data analysis tasks such as spatial hot spot detection. In this paper, we consider, for the first time to our best knowledge, fairness in detecting spatial hot spots. We motivate the need for ensuring fairness through statistical parity over the collective population covered across chosen hot spots. We then characterize the task of identifying a diverse set of solutions in the noteworthiness-fairness trade-off spectrum, to empower the user to choose a trade-off justified by the policy domain. Being a novel task formulation, we also develop a suite of evaluation metrics for fair hot spots, motivated by the need to evaluate pertinent aspects of the task. We illustrate the computational infeasibility of identifying fair hot spots using naive and/or direct approaches and devise a method, codenamed {\it FiSH}, for efficiently identifying high-quality, fair and diverse sets of spatial hot spots. FiSH traverses the tree-structured search space using heuristics that guide it towards identifying effective and fair sets of spatial hot spots. Through an extensive empirical analysis over a real-world dataset from the domain of human development, we illustrate that FiSH generates high-quality solutions at fast response times.
ReSCo-CC: Unsupervised Identification of Key Disinformation Sentences
Ghosal, Soumya Suvra, P, Deepak, Jurek-Loughrey, Anna
Disinformation is often presented in long textual articles, especially when it relates to domains such as health, often seen in relation to COVID-19. These articles are typically observed to have a number of trustworthy sentences among which core disinformation sentences are scattered. In this paper, we propose a novel unsupervised task of identifying sentences containing key disinformation within a document that is known to be untrustworthy. We design a three-phase statistical NLP solution for the task which starts with embedding sentences within a bespoke feature space designed for the task. Sentences represented using those features are then clustered, following which the key sentences are identified through proximity scoring. We also curate a new dataset with sentence level disinformation scorings to aid evaluation for this task; the dataset is being made publicly available to facilitate further research. Based on a comprehensive empirical evaluation against techniques from related tasks such as claim detection and summarization, as well as against simplified variants of our proposed approach, we illustrate that our method is able to identify core disinformation effectively.
Representativity Fairness in Clustering
P, Deepak, Abraham, Savitha Sam
Incorporating fairness constructs into machine learning algorithms is a topic of much societal importance and recent interest. Clustering, a fundamental task in unsupervised learning that manifests across a number of web data scenarios, has also been subject of attention within fair ML research. In this paper, we develop a novel notion of fairness in clustering, called representativity fairness. Representativity fairness is motivated by the need to alleviate disparity across objects' proximity to their assigned cluster representatives, to aid fairer decision making. We illustrate the importance of representativity fairness in real-world decision making scenarios involving clustering and provide ways of quantifying objects' representativity and fairness over it. We develop a new clustering formulation, RFKM, that targets to optimize for representativity fairness along with clustering quality. Inspired by the $K$-Means framework, RFKM incorporates novel loss terms to formulate an objective function. The RFKM objective and optimization approach guides it towards clustering configurations that yield higher representativity fairness. Through an empirical evaluation over a variety of public datasets, we establish the effectiveness of our method. We illustrate that we are able to significantly improve representativity fairness at only marginal impact to clustering quality.
Local Connectivity in Centroid Clustering
P, Deepak
Clustering is a fundamental task in unsupervised learning, one that targets to group a dataset into clusters of similar objects. There has been recent interest in embedding normative considerations around fairness within clustering formulations. In this paper, we propose 'local connectivity' as a crucial factor in assessing membership desert in centroid clustering. We use local connectivity to refer to the support offered by the local neighborhood of an object towards supporting its membership to the cluster in question. We motivate the need to consider local connectivity of objects in cluster assignment, and provide ways to quantify local connectivity in a given clustering. We then exploit concepts from density-based clustering and devise LOFKM, a clustering method that seeks to deepen local connectivity in clustering outputs, while staying within the framework of centroid clustering. Through an empirical evaluation over real-world datasets, we illustrate that LOFKM achieves notable improvements in local connectivity at reasonable costs to clustering quality, illustrating the effectiveness of the method.
Whither Fair Clustering?
P, Deepak
Within the relatively busy area of fair machine learning that has been dominated by classification fairness research, fairness in clustering has started to see some recent attention. In this position paper, we assess the existing work in fair clustering and observe that there are several directions that are yet to be explored, and postulate that the state-of-the-art in fair clustering has been quite parochial in outlook. We posit that widening the normative principles to target for, characterizing shortfalls where the target cannot be achieved fully, and making use of knowledge of downstream processes can significantly widen the scope of research in fair clustering research. At a time when clustering and unsupervised learning are being increasingly used to make and influence decisions that matter significantly to human lives, we believe that widening the ambit of fair clustering is of immense significance.
Fairness in Clustering with Multiple Sensitive Attributes
Abraham, Savitha Sam, P, Deepak, Sundaram, Sowmya S
A clustering may be considered as fair on pre-specified sensitive attributes if the proportions of sensitive attribute groups in each cluster reflect that in the dataset. In this paper, we consider the task of fair clustering for scenarios involving multiple multi-valued or numeric sensitive attributes. We propose a fair clustering method, FairKM (Fair K-Means), that is inspired by the popular K-Means clustering formulation. We outline a computational notion of fairness which is used along with a cluster coherence objective, to yield the FairKM clustering method. We empirically evaluate our approach, wherein we quantify both the quality and fairness of clusters, over real-world datasets. Our experimental evaluation illustrates that the clusters generated by FairKM fare significantly better on both clustering quality and fair representation of sensitive attribute groups compared to the clusters from a state-of-the-art baseline fair clustering method.