Statistical Learning
Model Adaptation: Historical Contrastive Learning for Unsupervised Domain Adaptation without Source Data Supplemental Materials Anonymous Author(s) Affiliation Address email
A.1 Proof of Proposition 12 Proposition 1 The historical contrastive instance discrimination (HCID) can be modelled as a3 maximum likelihood problem optimized via Expectation Maximization.4 Maximum likelihood (ML) is a concept to describe the theoretic insights of clustering algorithms.6 PN n=1 Z(kn) = 1), and the last step of derivation13 employs Jensen's inequality [6, 7, 4]. Z(kn) log p(xq,kn; ฮธE) (5) Expectation step focuses on estimating the posterior probability p(kn; xq,ฮธE). We first gener-17 ate keys by a historical encoder: kt mn = Et m(xt), and xt Xtgt. Then, We calculate18 p(kn; xq,ฮธE) = p(kt mn; xq,ฮธE) = 1 (xq,kt mn), where 1 (xq,kt mn) = 1 if both belong to the19 positive pair; otherwise, 1 (xq,kt mn) = 0.20 Please note the notation "t m" shows that the k is encoded by a historical encoder.21
Optimal Transport for Treatment Effect Estimation
Estimating conditional average treatment effect from observational data is highly challenging due to the existence of treatment selection bias. Prevalent methods mitigate this issue by aligning distributions of different treatment groups in the latent space. However, there are two critical problems that these methods fail to address: (1) mini-batch sampling effects (MSE), which causes misalignment in non-ideal mini-batches with outcome imbalance and outliers; (2) unobserved confounder effects (UCE), which results in inaccurate discrepancy calculation due to the neglect of unobserved confounders. To tackle these problems, we propose a principled approach named Entire Space CounterFactual Regression (ESCFR), which is a new take on optimal transport in the context of causality. Specifically, based on the framework of stochastic optimal transport, we propose a relaxed masspreserving regularizer to address the MSE issue and design a proximal factual outcome regularizer to handle the UCE issue. Extensive experiments demonstrate that our proposed ESCFR can successfully tackle the treatment selection bias and achieve significantly better performance than state-of-the-art methods.
Efficiently Factorizing Boolean Matrices using Proximal Gradient Descent
Addressing the interpretability problem of NMF on Boolean data, Boolean Matrix Factorization (BMF) uses Boolean algebra to decompose the input into low-rank Boolean factor matrices. These matrices are highly interpretable and very useful in practice, but they come at the high computational cost of solving an NP-hard combinatorial optimization problem. To reduce the computational burden, we propose to relax BMF continuously using a novel elastic-binary regularizer, from which we derive a proximal gradient algorithm. Through an extensive set of experiments, we demonstrate that our method works well in practice: On synthetic data, we show that it converges quickly, recovers the ground truth precisely, and estimates the simulated rank exactly. On real-world data, we improve upon the state of the art in recall, loss, and runtime, and a case study from the medical domain confirms that our results are easily interpretable and semantically meaningful.