Supplemental Material of " Identification and Estimation of Joint Probabilities of Potential Outcomes in Observational Studies with Covariate Information "

Neural Information Processing Systems 

A.1 Proof of Theorem 1 From Conditions 1 and 2 in Theorem 1, by the consistency property, we have p(x S, Q = M S, (A.11) where the notation " " stands for a transposed vector/matrix. Since P is invertible from Condition 3 in Theorem 1, from equation (A.11), R is given as the solution of the simultaneous linear equation From equation (A.12), since we have QP P = S from equation (A.11), and the first column of S is given as (p(u A.2 Proof of Theorem 2 From Conditions 4 and 5 of Theorem 2, by the consistency property, we have p(y The equation (B.5) is the condition in which the first column of (Θ Once we obtain the estimator R as the solution of the optimization problem (B.6), the estimator of u = (p(u Similarly, we can estimate causal risk difference as the difference between the second and third components of û. S, (B.17) thus, we have S. Because it means that S is a solution of the following minimization problem 1 The equations (B.19) and (B.20) are the conditions in which the first row of P As we can see immediately, for the zero Θ of the estimating equation (B.28), the any row permutated matrix Π Θ is also the solution of the same estimating equation, where Π is the permutation matrix. Therefore, we find the row permutated matrix ΠΘ, which achieve the smallest losses and adopt the matrix as the estimator of S. Once we obtain the estimator Θ as the solution of the optimization problem (B.21), the estimator of u = (p(u B.3 Asymptotic normality Following Yuan and Jennrich [5], we show the asymptotic normality of the estimators from Algorithm 2. In this section, we investigate more properties of our proposed estimators through more numerical experiments in addition to Section 5. Letting X, Y, Z, W, and U be discrete variables, we consider the causal diagrams shown in Figure 1, where the joint probabilities of (X, Y, Z, W, U) are given according Table C.1. As seen from Table C.2, the sample means of p(u In addition, the outliers would occur when it is difficult to judge that Condition 6 holds from observed data.