Goto

Collaborating Authors

 inequity


Adapt in the Wild: Test-Time Entropy Minimization with Sharpness and Feature Regularization

Niu, Shuaicheng, Chen, Guohao, Chen, Deyu, Zhang, Yifan, Wu, Jiaxiang, Wen, Zhiquan, Chen, Yaofo, Zhao, Peilin, Miao, Chunyan, Tan, Mingkui

arXiv.org Artificial Intelligence

Test-time adaptation (TTA) may fail to improve or even harm the model performance when test data have: 1) mixed distribution shifts, 2) small batch sizes, 3) online imbalanced label distribution shifts. This is often a key obstacle preventing existing TTA methods from being deployed in the real world. In this paper, we investigate the unstable reasons and find that the batch norm layer is a crucial factor hindering TTA stability. Conversely, TTA can perform more stably with batch-agnostic norm layers, i.e., group or layer norm. However, we observe that TTA with group and layer norms does not always succeed and still suffers many failure cases, i.e., the model collapses into trivial solutions by assigning the same class label for all samples. By digging into this, we find that, during the collapse process: 1) the model gradients often undergo an initial explosion followed by rapid degradation, suggesting that certain noisy test samples with large gradients may disrupt adaptation; and 2) the model representations tend to exhibit high correlations and classification bias. To address this, we first propose a sharpness-aware and reliable entropy minimization method, called SAR, for stabilizing TTA from two aspects: 1) remove partial noisy samples with large gradients, 2) encourage model weights to go to a flat minimum so that the model is robust to the remaining noisy samples. Based on SAR, we further introduce SAR^2 to prevent representation collapse with two regularizers: 1) a redundancy regularizer to reduce inter-dimensional correlations among centroid-invariant features; and 2) an inequity regularizer to maximize the prediction entropy of a prototype centroid, thereby penalizing biased representations toward any specific class. Promising results demonstrate that our methods perform more stably over prior methods and are computationally efficient under the above wild test scenarios.


Uncertainty-aware Predict-Then-Optimize Framework for Equitable Post-Disaster Power Restoration

Jiang, Lin, Yu, Dahai, Xu, Rongchao, Tang, Tian, Wang, Guang

arXiv.org Artificial Intelligence

The increasing frequency of extreme weather events, such as hurricanes, highlights the urgent need for efficient and equitable power system restoration. Many electricity providers make restoration decisions primarily based on the volume of power restoration requests from each region. However, our data-driven analysis reveals significant disparities in request submission volume, as disadvantaged communities tend to submit fewer restoration requests. This disparity makes the current restoration solution inequitable, leaving these communities vulnerable to extended power outages. To address this, we aim to propose an equity-aware power restoration strategy that balances both restoration efficiency and equity across communities. However, achieving this goal is challenging for two reasons: the difficulty of predicting repair durations under dataset het-eroscedasticity, and the tendency of reinforcement learning agents to favor low-uncertainty actions, which potentially undermine equity. To overcome these challenges, we design a predict-then-optimize framework called EPOPR with two key components: (1) Equity-Conformalized Quantile Regression for uncertainty-aware repair duration prediction, and (2) Spatial-Temporal Attentional RL that adapts to varying uncertainty levels across regions for equitable decision-making. Experimental results show that our EPOPR effectively reduces the average power outage duration by 3.60% and decreases inequity between different communities by 14.19% compared to state-of-the-art baselines.


Analyzing Breast Cancer Survival Disparities by Race and Demographic Location: A Survival Analysis Approach

Farha, Ramisa, Olukoya, Joshua O.

arXiv.org Artificial Intelligence

This study employs a robust analytical framework to uncover patterns in survival outcomes among breast cancer patients from diverse racial and geographical backgrounds. This research uses the SEER 2021 dataset to analyze breast cancer survival outcomes to identify and comprehend dissimilarities. Our approach integrates exploratory data analysis (EDA), through this we identify key variables that influence survival rates and employ survival analysis techniques, including the Kaplan-Meier estimator and log-rank test and the advanced modeling Cox Proportional Hazards model to determine how survival rates vary across racial groups and countries. Model validation and interpretation are undertaken to ensure the reliability of our findings, which are documented comprehensively to inform policymakers and healthcare professionals. The outcome of this paper is a detailed version of statistical analysis that not just highlights disparities in breast cancer treatment and care but also serves as a foundational tool for developing targeted interventions to address the inequalities effectively. Through this research, our aim is to contribute to the global efforts to improve breast cancer outcomes and reduce treatment disparities.


Universal Narrative Model: an Author-centric Storytelling Framework for Generative AI

Gerba, Hank

arXiv.org Artificial Intelligence

In their survey of authoring tools for computational narrative, Kybartas and Bidarra note that "we believe that creating a standard model of computational narrative could allow different systems to interact with the same narrative, without being restricted by incompatible models and definitions. Furthermore, such a model would also facilitate research into the generation of specific story components, e.g., allowing for multiple generators and even authors to collaborate on a given narrative" [Kybartas and Bidarra [2017]]. This paper proposes such a standard: the Universal Narrative Model (UNM). We foresee that generative AI will enable a new paradigm of storytelling technologies and processes: from assisting a writer of linear media (novels, film, television, etc.) by allowing them to test out scenes and characters before committing them to a script, all the way through to real-time storytelling systems in videogames which respond to a player's agency, and countless use cases in between [Peng et al. [2024]]. The UNM is designed to service any use case in which coherent narrative structure is a consideration, and in which authorial intent and direction is privileged. In the last five years, a robust body of research has demonstrated a wide variety of potential uses for computational narrative systems powered by generative AI, and some limited commercial deployments already exist [Yang et al. [2024], Hu et al. [2024]]. With such promise, however, comes a series of challenges: technical, narrative, and ethical. The goal of the Entertainment Technology Center's "Universal Narrative Model" project was to produce the UNM as an open standard. The ultimate directive of the project was to privilege, above all else, author-centric design and functionality, setting the stage for generative workflows which extend an author's narrative intent and creativity, rather than eclipse or replace it.


Do Tutors Learn from Equity Training and Can Generative AI Assess It?

Thomas, Danielle R., Borchers, Conrad, Kakarla, Sanjit, Lin, Jionghao, Bhushan, Shambhavi, Guo, Boyuan, Gatz, Erin, Koedinger, Kenneth R.

arXiv.org Artificial Intelligence

Equity is a core concern of learning analytics. However, applications that teach and assess equity skills, particularly at scale are lacking, often due to barriers in evaluating language. Advances in generative AI via large language models (LLMs) are being used in a wide range of applications, with this present work assessing its use in the equity domain. We evaluate tutor performance within an online lesson on enhancing tutors' skills when responding to students in potentially inequitable situations. We apply a mixed-method approach to analyze the performance of 81 undergraduate remote tutors. We find marginally significant learning gains with increases in tutors' self-reported confidence in their knowledge in responding to middle school students experiencing possible inequities from pretest to posttest. Both GPT-4o and GPT-4-turbo demonstrate proficiency in assessing tutors ability to predict and explain the best approach. Balancing performance, efficiency, and cost, we determine that few-shot learning using GPT-4o is the preferred model. This work makes available a dataset of lesson log data, tutor responses, rubrics for human annotation, and generative AI prompts. Future work involves leveling the difficulty among scenarios and enhancing LLM prompts for large-scale grading and assessment.


AI-EDI-SPACE: A Co-designed Dataset for Evaluating the Quality of Public Spaces

Gowaikar, Shreeyash, Berard, Hugo, Mushkani, Rashid, Marchand, Emmanuel Beaudry, Ammar, Toumadher, Koseki, Shin

arXiv.org Artificial Intelligence

However, Moreover, the failure to acknowledge the socio-cultural concerns persist regarding the transparency and context context within which data is produced can introduce biases of data collection methodologies, especially when sourced into datasets. For example, algorithms trained on datasets through crowdsourcing platforms. Crowdsourcing often devoid of the historical context of segregation may inadvertently employs low-wage workers with poor working conditions perpetuate biases against certain minority groups and lacks consideration for the representativeness of annotators, [12]. Furthermore, the identities of workers involved in annotations leading to algorithms that fail to represent diverse are frequently overlooked, leading to a lack of diversity views and perpetuate biases against certain groups. To address in viewpoints captured within datasets. This bias is these limitations, we propose a methodology involving compounded by the common practice of aggregating annotations a co-design model that actively engages stakeholders at key through majority voting [5].


Initialization Matters: On the Benign Overfitting of Two-Layer ReLU CNN with Fully Trainable Layers

Shang, Shuning, Meng, Xuran, Cao, Yuan, Zou, Difan

arXiv.org Machine Learning

Benign overfitting refers to how over-parameterized neural networks can fit training data perfectly and generalize well to unseen data. While this has been widely investigated theoretically, existing works are limited to two-layer networks with fixed output layers, where only the hidden weights are trained. We extend the analysis to two-layer ReLU convolutional neural networks (CNNs) with fully trainable layers, which is closer to the practice. Our results show that the initialization scaling of the output layer is crucial to the training dynamics: large scales make the model training behave similarly to that with the fixed output, the hidden layer grows rapidly while the output layer remains largely unchanged; in contrast, small scales result in more complex layer interactions, the hidden layer initially grows to a specific ratio relative to the output layer, after which both layers jointly grow and maintain that ratio throughout training. Furthermore, in both settings, we provide nearly matching upper and lower bounds on the test errors, identifying the sharp conditions on the initialization scaling and signal-to-noise ratio (SNR) in which the benign overfitting can be achieved or not. Numerical experiments back up the theoretical results.


Mitigating the Risk of Health Inequity Exacerbated by Large Language Models

Ji, Yuelyu, Ma, Wenhe, Sivarajkumar, Sonish, Zhang, Hang, Sadhu, Eugene Mathew, Li, Zhuochun, Wu, Xizhi, Visweswaran, Shyam, Wang, Yanshan

arXiv.org Artificial Intelligence

Recent advancements in large language models have demonstrated their potential in numerous medical applications, particularly in automating clinical trial matching for translational research and enhancing medical question answering for clinical decision support. However, our study shows that incorporating non decisive sociodemographic factors such as race, sex, income level, LGBT+ status, homelessness, illiteracy, disability, and unemployment into the input of LLMs can lead to incorrect and harmful outputs for these populations. These discrepancies risk exacerbating existing health disparities if LLMs are widely adopted in healthcare. To address this issue, we introduce EquityGuard, a novel framework designed to detect and mitigate the risk of health inequities in LLM based medical applications. Our evaluation demonstrates its efficacy in promoting equitable outcomes across diverse populations.


5 questions schools and universities should ask before they purchase AI tech products

AIHub

Every few years, an emerging technology shows up at the doorstep of schools and universities promising to transform education. Technologies and apps that include or are powered by generative artificial intelligence, also known as GenAI. These technologies are sold on the potential they hold for education. For example, Khan Academy's founder opened his 2023 Ted Talk by arguing that "we're at the cusp of using AI for probably the biggest positive transformation that education has ever seen." As optimistic as these visions of the future may be, the realities of educational technology over the past few decades have not lived up to their promises.


Towards Socially and Environmentally Responsible AI

Li, Pengfei, Liu, Yejia, Yang, Jianyi, Ren, Shaolei

arXiv.org Artificial Intelligence

The sharply increasing sizes of artificial intelligence (AI) models come with significant energy consumption and environmental footprints, which can disproportionately impact certain (often marginalized) regions and hence create environmental inequity concerns. Moreover, concerns with social inequity have also emerged, as AI computing resources may not be equitably distributed across the globe and users from certain disadvantaged regions with severe resource constraints can consistently experience inferior model performance. Importantly, the inequity concerns that encompass both social and environmental dimensions still remain unexplored and have increasingly hindered responsible AI. In this paper, we leverage the spatial flexibility of AI inference workloads and propose equitable geographical load balancing (GLB) to fairly balance AI's regional social and environmental costs. Concretely, to penalize the disproportionately high social and environmental costs for equity, we introduce $L_q$ norms as novel regularization terms into the optimization objective for GLB decisions. Our empirical results based on real-world AI inference traces demonstrate that while the existing GLB algorithms result in disproportionately large social and environmental costs in certain regions, our proposed equitable GLB can fairly balance AI's negative social and environmental costs across all the regions.