Oceania
7a969c30dc7e74d4e891c8ffb217cf79-Paper-Conference.pdf
Importantly,thesuccess ofanymitigation strategystrongly depends on the structure of the shift. Despite this, there has been little discussion of how toempirically assess the structure ofadistribution shift that one isencountering in practice. In this work, we adopt a causal framing to motivate conditional independence tests as akeytool for characterizing distribution shifts. Using our approach in two medical applications, we show that this knowledge can help diagnose failures offairness transfer,including cases where real-world shifts are more complexthanisoften assumed intheliterature.
Privately Learning Decision Lists and a Differentially Private Winnow
We give new differentially private algorithms for the classic problems of learning decision lists and large-margin halfspaces in the PAC and online models. In the PAC model, we give a computationally efficient algorithm for learning decision lists with minimal sample overhead over the best non-private algorithms. In the online model, we give a private analog of the influential Winnow algorithm for learning halfspaces with mistake bound polylogarithmic in the dimension and inverse polynomial in the margin. As an application, we describe how to privately learn decision lists in the online model, qualitatively matching state-of-the art non-private guarantees.
BFTS: Thompson Sampling with Bayesian Additive Regression Trees
Deng, Ruizhe, Chakraborty, Bibhas, Chen, Ran, Tan, Yan Shuo
Contextual bandits are a core technology for personalized mobile health interventions, where decision-making requires adapting to complex, non-linear user behaviors. While Thompson Sampling (TS) is a preferred strategy for these problems, its performance hinges on the quality of the underlying reward model. Standard linear models suffer from high bias, while neural network approaches are often brittle and difficult to tune in online settings. Conversely, tree ensembles dominate tabular data prediction but typically rely on heuristic uncertainty quantification, lacking a principled probabilistic basis for TS. We propose Bayesian Forest Thompson Sampling (BFTS), the first contextual bandit algorithm to integrate Bayesian Additive Regression Trees (BART), a fully probabilistic sum-of-trees model, directly into the exploration loop. We prove that BFTS is theoretically sound, deriving an information-theoretic Bayesian regret bound of $\tilde{O}(\sqrt{T})$. As a complementary result, we establish frequentist minimax optimality for a "feel-good" variant, confirming the structural suitability of BART priors for non-parametric bandits. Empirically, BFTS achieves state-of-the-art regret on tabular benchmarks with near-nominal uncertainty calibration. Furthermore, in an offline policy evaluation on the Drink Less micro-randomized trial, BFTS improves engagement rates by over 30% compared to the deployed policy, demonstrating its practical effectiveness for behavioral interventions.
Information Geometry of Absorbing Markov-Chain and Discriminative Random Walks
Discriminative Random Walks (DRWs) are a simple yet powerful tool for semi-supervised node classification, but their theoretical foundations remain fragmentary. We revisit DRWs through the lens of information geometry, treating the family of class-specific hitting-time laws on an absorbing Markov chain as a statistical manifold. Starting from a log-linear edge-weight model, we derive closed-form expressions for the hitting-time probability mass function, its full moment hierarchy, and the observed Fisher information. The Fisher matrix of each seed node turns out to be rank-one, taking the quotient by its null space yields a low-dimensional, globally flat manifold that captures all identifiable directions of the model. Leveraging the geometry, we introduce a sensitivity score for unlabeled nodes that bounds, and in one-dimensional cases attains, the maximal first-order change in DRW betweenness under unit Fisher perturbations. The score can lead to principled strategies for active label acquisition, edge re-weighting, and explanation.
SupplementaryMaterial: ImprovingTransferabilityofRepresentations viaAugmentation-AwareSelf-Supervision ATrade-offbetweenaugmentationinvarianceandawareness
Tosupportthis, we compute the cosine similarity between representations from augmented and original samples, i.e., CS = Ex D,t T[sim(g f(t(x)),g f(x))]. For linear evaluation benchmarks, we randomly choose validation samples in the training split for each dataset when the validation split is not officially provided. Note that the pretraining setups are the same as they officiallyusedforImageNet pretraining described in[2,5,30]. When incorporating our AugSelf into the methods, we use λ=1.0andAAugSelf ={crop,color},unlessotherwisestated. Other hyperparameters are the same as the ImageNet100 setup describedinSectionF.1.