spillover
Choosing Online Experiment Designs under Interference in Ads, Recommendations, and Member-Experience Systems
Shekhar, Prashant, Howard, Caroline
Online experiments in ads, recommendation, and member-experience systems are often planned before the dominant interference mechanism is known. A treatment may propagate through budgets, inventory, producer exposure, graph spillovers, or temporal carryover, making the randomization design itself a statistical decision. We formulate this problem as robust design selection over uncertain exposure mechanisms. Given a finite catalog of six implementable designs, the selector compares each design by worst-case planning risk over an ambiguity set. The risk combines exposure bias, assignment-unit variance, minimum detectable effect, contamination or carryover, operational cost, and estimand mismatch. For theoretical justification, the paper develops a geometry-aware guarantee, stating that design bias is bounded by Wasserstein distance to the launch exposure distribution, and this penalty is minimax tight under Lipschitz exposure response. We also prove finite-catalog approximation and a robust selector theorem with excess-risk control, exact recovery under separation, and certified shortlists when the risk surface is flat. Empirically, the same selector gives different recommendations across samples from public datasets. It selects user-randomization on Criteo ads with dimensionless robust risk 1.295, switchbacks on Open Bandit-bts/men with risk 2.105, and cluster-randomization on KuaiRand with risk 2.240. The Open Bandit case stresses known but uneven logging support, with propensities from 0.00006 to 0.594 and a 5.17% IPS effective-sample share. Overall, the paper contributes an interference-aware experiment design framework based on mechanism-robust design decisions, where the output is either a justified design choice or an uncertainty shortlist.
Stochastic Discount Factors with Cross-Asset Spillovers
The central objective of empirical asset pricing is to identify firm-level signals that explain the cross-section of expected stock returns--whether through exposure to risk factors or persistent mispricing. The dominant paradigm, grounded in the assumption of self-predictability, asserts that a firm's own characteristics forecast its own returns (see, e.g., Cochrane (2011); Harvey et al. (2016)). Complementing this view is a growing literature on cross-predictability--the idea that the characteristics or returns of one asset can help forecast the returns of others (see, e.g., Lo and MacKinlay (1990); Hou (2007); Cohen and Frazzini (2008); Cohen and Lou (2012); Huang et al. (2021, 2022)). A key mechanism underpinning this phenomenon is the presence of lead-lag effects, whereby price movements or information from one firm precede and predict those of related firms. Such effects can stem from staggered information diffusion, peer influence within industries, supply chain linkages, or correlated trading by institutional investors that induces price pressure across related assets. Despite recent methodological advances in modeling cross-stock predictability, several foundational questions remain unresolved. Chief among them is how a mean-variance investor can analytically integrate multiple predictive signals when returns are interconnected across assets. Equally crucial is developing a framework that jointly captures both the relevance of individual signals and the structure of return spillovers--enhancing portfolio performance while preserving interpretability .
Decomposition of Spillover Effects Under Misspecification:Pseudo-true Estimands and a Local--Global Extension
Applied work with interference typically models outcomes as functions of own treatment and a low-dimensional exposure mapping of others' treatments, even when that mapping may be misspecified. This raises a basic question: what policy object are exposure-based estimands implicitly targeting, and how should we interpret their direct and spillover components relative to the underlying policy question? We take as primitive the marginal policy effect, defined as the effect of a small change in the treatment probability under the actual experimental design, and show that any researcher-chosen exposure mapping induces a unique pseudo-true outcome model. This model is the best approximation to the underlying potential outcomes that depends only on the user-chosen exposure. Utilizing that representation, the marginal policy effect admits a canonical decomposition into exposure-based direct and spillover effects, and each component provides its optimal approximation to the corresponding oracle objects that would be available if interference were fully known. We then focus on a setting that nests important empirical and theoretical applications in which both local network spillovers and global spillovers, such as market equilibrium, operate. There, the marginal policy effect further decomposes asymptotically into direct, local, and global channels. An important implication is that many existing methods are more robust than previously understood once we reinterpret their targets as channel-specific components of this pseudo-true policy estimand. Simulations and a semi-synthetic experiment calibrated to a large cash-transfer experiment show that these components can be recovered in realistic experimental designs.
No Free Lunch in Language Model Bias Mitigation? Targeted Bias Reduction Can Exacerbate Unmitigated LLM Biases
Chand, Shireen, Baca, Faith, Ferrara, Emilio
Large Language Models (LLMs) inherit societal biases from their training data, potentially leading to harmful or unfair outputs. While various techniques aim to mitigate these biases, their effects are often evaluated only along the dimension of the bias being targeted. This work investigates the cross-category consequences of targeted bias mitigation. We study four bias mitigation techniques applied across ten models from seven model families, and we explore racial, religious, profession- and gender-related biases. We measure the impact of debiasing on model coherence and stereotypical preference using the StereoSet benchmark. Our results consistently show that while targeted mitigation can sometimes reduce bias in the intended dimension, it frequently leads to unintended and often negative consequences in others, such as increasing model bias and decreasing general coherence. These findings underscore the critical need for robust, multi-dimensional evaluation tools when examining and developing bias mitigation strategies to avoid inadvertently shifting or worsening bias along untargeted axes.
Unbiased Platform-Level Causal Estimation for Search Systems: A Competitive Isolation PSM-DID Framework
Song, Ying, Wang, Yijing, Yang, Hui, Jin, Weihan, Xiong, Jun, Zhou, Congyi, Zhu, Jialin, Gao, Xiang, Chen, Rong, Deng, HuaGuang, Dai, Ying, Xiao, Fei, Tang, Haihong, Zheng, Bo, Zhang, KaiFu
Evaluating platform-level interventions in search-based two-sided marketplaces is fundamentally challenged by systemic effects such as spillovers and network interference. While widely used for causal inference, the PSM (Propensity Score Matching) - DID (Difference-in-Differences) framework remains susceptible to selection bias and cross-unit interference from unaccounted spillovers. In this paper, we introduced Competitive Isolation PSM-DID, a novel causal framework that integrates propensity score matching with competitive isolation to enable platform-level effect measurement (e.g., order volume, GMV) instead of item-level metrics in search systems. Our approach provides theoretically guaranteed unbiased estimation under mutual exclusion conditions, with an open dataset released to support reproducible research on marketplace interference (github.com/xxxx). Extensive experiments demonstrate significant reductions in interference effects and estimation variance compared to baseline methods. Successful deployment in a large-scale marketplace confirms the framework's practical utility for platform-level causal inference.
Dynamic spillovers and investment strategies across artificial intelligence ETFs, artificial intelligence tokens, and green markets
Shao, Ying-Hui, Yang, Yan-Hong, Zhou, Wei-Xing
This paper investigates the risk spillovers among AI ETFs, AI tokens, and green markets using the R2 decomposition method. We reveal several key insights. First, the overall transmission connectedness index (TCI) closely aligns with the contemporaneous TCI, while the lagged TCI is significantly lower. Second, AI ETFs and clean energy act as risk transmitters, whereas AI tokens and green bond function as risk receivers. Third, AI tokens are difficult to hedge and provide limited hedging ability compared to AI ETFs and green assets. However, multivariate portfolios effectively reduce AI tokens investment risk. Among them, the minimum correlation portfolio outperforms the minimum variance and minimum connectedness portfolios.
The AI Double Standard: Humans Judge All AIs for the Actions of One
Manoli, Aikaterina, Pauketat, Janet V. T., Anthis, Jacy Reese
Robots and other artificial intelligence (AI) systems are widely perceived as moral agents responsible for their actions. As AI proliferates, these perceptions may become entangled via the moral spillover of attitudes towards one AI to attitudes towards other AIs. We tested how the seemingly harmful and immoral actions of an AI or human agent spill over to attitudes towards other AIs or humans in two preregistered experiments. In Study 1 (N = 720), we established the moral spillover effect in human-AI interaction by showing that immoral actions increased attributions of negative moral agency (i.e., acting immorally) and decreased attributions of positive moral agency (i.e., acting morally) and moral patiency (i.e., deserving moral concern) to both the agent (a chatbot or human assistant) and the group to which they belong (all chatbot or human assistants). There was no significant difference in the spillover effects between the AI and human contexts. In Study 2 (N = 684), we tested whether spillover persisted when the agent was individuated with a name and described as an AI or human, rather than specifically as a chatbot or personal assistant. We found that spillover persisted in the AI context but not in the human context, possibly because AIs were perceived as more homogeneous due to their outgroup status relative to humans. This asymmetry suggests a double standard whereby AIs are judged more harshly than humans when one agent morally transgresses. With the proliferation of diverse, autonomous AI systems, HCI research and design should account for the fact that experiences with one AI could easily generalize to perceptions of all AIs and negative HCI outcomes, such as reduced trust.
Asymmetries in Financial Spillovers
Huber, Florian, Klieber, Karin, Marcellino, Massimiliano, Onorante, Luca, Pfarrhofer, Michael
Financial shocks, such as the one observed during the global financial crisis, exhibit important domestic and international consequences on macroeconomic aggregates (see, e.g., Dovern and van Roye, 2014; Ciccarelli et al., 2016; Prieto et al., 2016; Gerba et al., 2024). Policymakers in central banks and governmental institutions, who aim to smooth business cycles and thus alleviate the negative effects of adverse financial disruptions, need to understand how such shocks impact the economy and propagate internationally to implement policies in a forward-looking manner. The recent literature provides plenty of evidence on the domestic and international effects of US financial shocks (see Balke, 2000; Gilchrist and Zakrajลกek, 2012; Cesa-Bianchi and Sokol, 2022). These papers find that financial shocks exert powerful effects on domestic output but also that US-based shocks spill over to foreign economies and trigger declines in international economic activity. Such effects might be subject to time variation (Abbate et al., 2016).
Leveraging heterogeneous spillover effects in maximizing contextual bandit rewards
Faruk, Ahmed Sayeed, Zheleva, Elena
Recommender systems relying on contextual multi-armed bandits continuously improve relevant item recommendations by taking into account the contextual information. The objective of these bandit algorithms is to learn the best arm (i.e., best item to recommend) for each user and thus maximize the cumulative rewards from user engagement with the recommendations. However, current approaches ignore potential spillover between interacting users, where the action of one user can impact the actions and rewards of other users. Moreover, spillover may vary for different people based on their preferences and the closeness of ties to other users. This leads to heterogeneity in the spillover effects, i.e., the extent to which the action of one user can impact the action of another. Here, we propose a framework that allows contextual multi-armed bandits to account for such heterogeneous spillovers when choosing the best arm for each user. By experimenting on several real-world datasets using prominent linear and non-linear contextual bandit algorithms, we observe that our proposed method leads to significantly higher rewards than existing solutions that ignore spillover.
Can predictive models be used for causal inference?
Pichler, Maximilian, Hartig, Florian
Supervised machine learning (ML) and deep learning (DL) algorithms excel at predictive tasks, but it is commonly assumed that they often do so by exploiting non-causal correlations, which may limit both interpretability and generalizability. Here, we show that this trade-off between explanation and prediction is not as deep and fundamental as expected. Whereas ML and DL algorithms will indeed tend to use non-causal features for prediction when fed indiscriminately with all data, it is possible to constrain the learning process of any ML and DL algorithm by selecting features according to Pearl's backdoor adjustment criterion. In such a situation, some algorithms, in particular deep neural networks, can provide near unbiased effect estimates under feature collinearity. Remaining biases are explained by the specific algorithmic structures as well as hyperparameter choice. Consequently, optimal hyperparameter settings are different when tuned for prediction or inference, confirming the general expectation of a trade-off between prediction and explanation. However, the effect of this trade-off is small compared to the effect of a causally constrained feature selection. Thus, once the causal relationship between the features is accounted for, the difference between prediction and explanation may be much smaller than commonly assumed. We also show that such causally constrained models generalize better to new data with altered collinearity structures, suggesting generalization failure may often be due to a lack of causal learning. Our results not only provide a perspective for using ML for inference of (causal) effects but also help to improve the generalizability of fitted ML and DL models to new data.