Reviews: Optimization over Continuous and Multi-dimensional Decisions with Observational Data

Neural Information Processing Systems 

The paper proposes an algorithm that can learn good decision-making policies over a continuous set of decisions using only access to observational data. The problem is well-motivated, but the paper can be substantially strengthened. Quality: Ok The paper clearly motivates a weakness of direct regression (e.g. from context and decision, to predict expected cost). The regression models may have different uncertainty for different decisions, and so it is useful to include an empirical estimate of variance and bias of the regression when selecting decisions. The paper will be more informative by highlighting several choices of regression models, each with different V (variance) and B (bias), and observing how lambda_1 lambda_2 are tuned to pick low-cost decisions with high probability.