sufficiency
It Just Takes Two: Scaling Amortized Inference to Large Sets
Wehenkel, Antoine, Kagan, Michael, Heinrich, Lukas, Pollard, Chris
Neural posterior estimation has emerged as a powerful tool for amortized inference, with growing adoption across scientific and applied domains. In many of these applications, the conditioning variable is a set of observations whose elements depend not only on the target but also on unknown factors shared across the set. Optimal inference therefore requires treating the set jointly, which in turn requires training the estimator at the deployment set size -- a regime where memory and compute quickly become prohibitive. We introduce a simple, theoretically grounded strategy that decouples representation learning from posterior modeling. Our method trains a mean-pool Deep Set on sets of size at most two, producing an encoder that generalizes to arbitrary set sizes. The inference head is then finetuned on pre-aggregated embeddings, making training cost essentially independent of the deployment set size N. Across scalar, image, multi-view 3D, molecular, and high-dimensional conditional generation benchmarks with N in the thousands, our approach matches or outperforms standard baselines at a fraction of the compute.
A Sufficient-Statistic Reduction of the Information Bottleneck to a Low-Dimensional Problem
We show that if the conditional distribution p(C | T) factors through a sufficient statistic ฯ(T), then the Information Bottleneck (IB) problem for (T, C) is exactly equivalent to the IB problem for (ฯ(T), C). The reduction is loss-free: it preserves the full IB curve, the Lagrangian optimum at every trade-off parameter \b{eta}, and the optimal representations up to pullback through ฯ. As a result, the computational complexity of solving the IB problem is governed by the dimension of the sufficient statistic rather than the ambient dimension of the source. This identifies an exact structural condition under which the generic IB problem becomes tractable, and gives a formal bridge between the discrete and linear-Gaussian regimes. We then show that the classical Gaussian IB solution of Chechik, Globerson, Tishby and Weiss is an immediate corollary of this reduction, and we state a nonlinear-Gaussian generalisation. A small numerical example illustrates the practical consequence: when a low-dimensional sufficient statistic is available, the exact IB curve can be computed on the reduced problem at a cost determined by the statistic rather than by the ambient source dimension.
Identification and Estimation of Joint Probabilitiesof Potential Outcomes in Observational Studies with Covariate Information
However, because they are not identifiable without any assumptions, various assumptions have been utilized to evaluate the joint probabilities of potential outcomes, e.g., the assumption of monotonicity (Pearl, 2009; Tian and Pearl, 2000), the independence between potential outcomes (Robins and Richardson, 2011), the condition of gain equality (Li and Pearl, 2019), and the specific functional relationshipsbetween cause and effect (Pearl, 2009). Unlike existing identification conditions, in order to evaluate the joint probabilities of potential outcomeswithoutsuch assumptions,this paper proposestwo types of novel identification conditions using covariate information. In addition, when the joint probabilities of potential outcomes are identifiable through the proposed conditions, the estimation problem of the joint probabilities of potential outcomes reduces to that of singular models and thus they can not be evaluated by standard statistical estimation methods. To solve the problem,this paper proposes a new statisticalestimationmethod based on the augmented Lagrangianmethod and shows the asymptoticnormality of the proposed estimators. Given space constraints, the proofs, the details on the statistical estimationmethod, some numerical experiments, and the case study are provided in the supplementary material.