Goto

Collaborating Authors

 Bandyapadhyay, Sayan


Fair Summarization: Bridging Quality and Diversity in Extractive Summaries

arXiv.org Artificial Intelligence

Fairness in multi-document summarization of user-generated content remains a critical challenge in natural language processing (NLP). Existing summarization methods often fail to ensure equitable representation across different social groups, leading to biased outputs. In this paper, we introduce two novel methods for fair extractive summarization: FairExtract, a clustering-based approach, and FairGPT, which leverages GPT-3.5-turbo with fairness constraints. We evaluate these methods using Divsumm summarization dataset of White-aligned, Hispanic, and African-American dialect tweets and compare them against relevant baselines. The results obtained using a comprehensive set of summarization quality metrics such as SUPERT, BLANC, SummaQA, BARTScore, and UniEval, as well as a fairness metric F, demonstrate that FairExtract and FairGPT achieve superior fairness while maintaining competitive summarization quality. Additionally, we introduce composite metrics (e.g., SUPERT+F, BLANC+F) that integrate quality and fairness into a single evaluation framework, offering a more nuanced understanding of the trade-offs between these objectives. This work highlights the importance of fairness in summarization and sets a benchmark for future research in fairness-aware NLP models.


A Polynomial-Time Approximation for Pairwise Fair $k$-Median Clustering

arXiv.org Artificial Intelligence

Clustering is a fundamental task in theoretical computer science and machine learning aimed at dividing a set of data items into several groups or clusters, such that each group contains similar data items. Typically, the similarity between data items is measured using a metric distance function. Clustering is often modeled as an optimization problem where the objective is to minimize a global cost function that reflects the quality of the clusters; this function varies depending on the application. Among the many cost functions studied for clustering, the most popular are k-median, k-means, and k-center. These objectives generally aim to minimize the variance within the clusters, serving as a proxy for grouping similar data items In this work, we study clustering problems with fairness constraints, commonly known as fair clustering problems. Fair clustering emerged as one of the most active research areas in algorithms motivated by the recent trend of research on fairness in artificial intelligence. In a seminal work, Chierichetti et al. [18] introduced a fair clustering problem, where given a set R of red points, a set B of blue points, and an integer balance parameter t 1, a clustering is said to be balanced if, in every cluster, the number of red points is at least 1/t times the number of blue points and at most t times the number of blue points.