Goto

Collaborating Authors

 Overview






A Unified Switching System Perspective and Convergence Analysis of Q-Learning Algorithms

Neural Information Processing Systems

However, its application to Q-learning has been limited due to the presence of the max-operator, which makes the associated ODE model a complex nonlinear system. In contrast, the associated ODE of TD learning for policy evaluation is a linear system, whose asymptotic stability is much easier to analyze in general.






On Inductive Biases for Heterogeneous Treatment Effect Estimation Appendix

Neural Information Processing Systems

Here, we present a detailed overview of existing model-agnostic "meta-learner" strategies for CA TE Unfortunately, good performance on estimation of the POs is not sufficient. Note that, as we discuss in section C.2, we fixed all hyperparameters throughout all experiments as tuning Input: Testing data X Trained FlexTENet flex for i 1: flex.n_layers We retrieve the data from https://jenniferhill7.wixsite.com/acic-2016/competition "D" we change only the response surface of the treated to As stated in the main text, we fixed equivalent hyperparameters across all methods within any experiments to not conflate hyperparameter tuning with the value of the different strategies. B (D.3), present additional results on PO estimation (D.4), and then move to analyzing the learned We also consider the effect of using our approaches as first-stage (nuisance) estimators for two-step learners (D.6).