assumption3
Multistage Conditional Compositional Optimization
Şen, Buse, Hu, Yifan, Kuhn, Daniel
We introduce Multistage Conditional Compositional Optimization (MCCO) as a new paradigm for decision-making under uncertainty that combines aspects of multistage stochastic programming and conditional stochastic optimization. MCCO minimizes a nest of conditional expectations and nonlinear cost functions. It has numerous applications and arises, for example, in optimal stopping, linear-quadratic regulator problems, distributionally robust contextual bandits, as well as in problems involving dynamic risk measures. The naïve nested sampling approach for MCCO suffers from the curse of dimensionality familiar from scenario tree-based multistage stochastic programming, that is, its scenario complexity grows exponentially with the number of nests. We develop new multilevel Monte Carlo techniques for MCCO whose scenario complexity grows only polynomially with the desired accuracy.
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > Switzerland (0.04)
- Europe > France (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Orthogonal Learner for Estimating Heterogeneous Long-Term Treatment Effects
Ma, Haorui, Frauen, Dennis, Melnychuk, Valentyn, Feuerriegel, Stefan
Estimation of heterogeneous long-term treatment effects (HLTEs) is widely used for personalized decision-making in marketing, economics, and medicine, where short-term randomized experiments are often combined with long-term observational data. However, HLTE estimation is challenging due to limited overlap in treatment or in observing long-term outcomes for certain subpopulations, which can lead to unstable HLTE estimates with large finite-sample variance. To address this challenge, we introduce the LT-O-learners (Long-Term Orthogonal Learners), a set of novel orthogonal learners for HLTE estimation. The learners are designed for the canonical HLTE setting that combines a short-term randomized dataset $\mathcal{D}_1$ with a long-term historical dataset $\mathcal{D}_2$. The key idea of our LT-O-Learners is to retarget the learning objective by introducing custom overlap weights that downweight samples with low overlap in treatment or in long-term observation. We show that the retargeted loss is equivalent to the weighted oracle loss and satisfies Neyman-orthogonality, which means our learners are robust to errors in the nuisance estimation. We further provide a general error bound for the LT-O-Learners and give the conditions under which quasi-oracle rate can be achieved. Finally, our LT-O-learners are model-agnostic and can thus be instantiated with arbitrary machine learning models. We conduct empirical evaluations on synthetic and semi-synthetic benchmarks to confirm the theoretical properties of our LT-O-Learners, especially the robustness in low-overlap settings. To the best of our knowledge, ours are the first orthogonal learners for HLTE estimation that are robust to low overlap that is common in long-term outcomes.
- Asia > Middle East > Jordan (0.04)
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Germany (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- North America > United States > New Jersey > Bergen County > Hackensack (0.04)
- (3 more...)
- North America > Greenland (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)
40bb79c081828bebdc39d65a82367246-Supplemental-Conference.pdf
Table1: Linearnetwork Layer# Name Layer Inshape Outshape 1 Flatten() (3,32,32) 3072 2 fc1 nn.Linear(3072, 200) 3072 200 3 fc2 nn.Linear(200, 1) 200 1 Fully-connected Network We conduct further experiments on several different fully-connected networks with 4 hidden layers with various activation functions. Our subset is smaller because of the computation limitation when calculating the Gram matrix. Experiments show that the properties along GD trajectory(e.g. We consider simple linear networks, fully-connected networks, convolutional networks in this appendix. The following Figure 4 illustrates the positive correlation between thesharpness andtheA-norm, andtherelationship between theloss D(t) 2 and R(t) 2 alongthetrajectory.
- North America > United States (0.04)
- North America > Canada > Ontario > Toronto (0.04)