Goto

Collaborating Authors

 Optimization




Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. Motivated by the practical problem of designing a security deployment strategy to protect targets from an adversary the author(s) model and study this as a Stackelberg game. The main result of the author(s) is that the defender can efficiently learn the payoffs of the adversary by carefully deploying resources and observing the adversary's attacks. Clearly, this setting may not be viable in the cases where the cost incurred by the defender on a successful attack is large (such as a terrorist attack) but perhaps is a reasonable strategy for other cases such as drug smuggling. The main result of the paper is a probably approximately optimal algorithm that finds a defender optimal strategy by learning from polynomial (in the number of targets and encoding length of the problem) number of attacks from the adversary.




Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper provides an FPTAS for stochastic network design in bidirected trees in time O(n^8/ฮต^6). The authors achieve this via Dynamic Programming. Since this algorithm is pretty slow, they provide a more efficient algorithm and give empirical results on that. I'm not sure about the relevance of the Stochastic Network Design Problem at NIPS, but given that it generalizes the Influence Maximization problem, there should be interest.



Appendix A Approximation Error Analysis In this section, we provide a complete proof of Theorem 1, quantifying the effect of function embedding of constraints in dual

Neural Information Processing Systems

The proof is an adaptation from the standard LP for state-value functions to the case of Q -LP ( De Farias and V an Roy, 2003). The effect of full-rank basis embedding in the example in Section 3.1 can be justified straightforwardly. The algorithm can be generalized to undiscounted MDPs with =1 and finite-horizon MDPs. A similar argument of Section 3.3 for discounted MDPs can be applied to MDPs are strictly more general than multi-armed and contextual bandits. Karampatziakis et al. ( 2019) considers The estimator in Karampatziakis et al. ( 2019) is derived from empirical likelihood with reverse Computationally, the estimator in Karampatziakis et al. ( 2019) requires an extra statistics, i.e., ( max Unfortunately the reverse KL-divergence does not satisfy the conditions in Assumption 1 .


CoinDICE: Off-Policy Confidence Interval Estimation

Neural Information Processing Systems

One of the major barriers that hinders the application of reinforcement learning (RL) is the ability to evaluate new policies reliably before deployment, a problem generally known as off-policy evaluation (OPE).