South America
7a969c30dc7e74d4e891c8ffb217cf79-Paper-Conference.pdf
Importantly,thesuccess ofanymitigation strategystrongly depends on the structure of the shift. Despite this, there has been little discussion of how toempirically assess the structure ofadistribution shift that one isencountering in practice. In this work, we adopt a causal framing to motivate conditional independence tests as akeytool for characterizing distribution shifts. Using our approach in two medical applications, we show that this knowledge can help diagnose failures offairness transfer,including cases where real-world shifts are more complexthanisoften assumed intheliterature.
Free Energy Mixer
Standard attention stores keys/values losslessly but reads them via a per-head convex average, blocking channel-wise selection. We propose the Free Energy Mixer (FEM): a free-energy (log-sum-exp) read that applies a value-driven, per-channel log-linear tilt to a fast prior (e.g., from queries/keys in standard attention) over indices. Unlike methods that attempt to improve and enrich the $(q,k)$ scoring distribution, FEM treats it as a prior and yields a value-aware posterior read at unchanged complexity, smoothly moving from averaging to per-channel selection as the learnable inverse temperature increases, while still preserving parallelism and the original asymptotic complexity ($O(T^2)$ for softmax; $O(T)$ for linearizable variants). We instantiate a two-level gated FEM that is plug-and-play with standard and linear attention, linear RNNs and SSMs. It consistently outperforms strong baselines on NLP, vision, and time-series at matched parameter budgets.
A second order regret bound for NormalHedge
Freund, Yoav, Harvey, Nicholas J. A., Portella, Victor S., Qi, Yabing, Wang, Yu-Xiang
We consider the problem of prediction with expert advice for ``easy'' sequences. We show that a variant of NormalHedge enjoys a second-order $ε$-quantile regret bound of $O\big(\sqrt{V_T \log(V_T/ε)}\big) $ when $V_T > \log N$, where $V_T$ is the cumulative second moment of instantaneous per-expert regret averaged with respect to a natural distribution determined by the algorithm. The algorithm is motivated by a continuous time limit using Stochastic Differential Equations. The discrete time analysis uses self-concordance techniques.
CollapsingBanditsandTheirApplicationtoPublic HealthInterventions
Neither (i) nor (ii) are known for general RMABs. Therefore, to capture the scheduling problems addressed inthiswork,weintroduce anewsubclass ofRMABs,Collapsing Bandits, distinguished by the following feature: when an arm is played, the agent fully observes its state, "collapsing" any uncertainty, but when an arm is passive, no observation is made and uncertainty evolves.