Reinforcement Learning
131f383b434fdf48079bff1e44e2d9a5-AuthorFeedback.pdf
See Table 1for the average running time per problem instance. Note that the implementation of Z3 and OR-tools22 are in C++, while NeuRewriter and RL baselines are in Python. Still, we can observethat our approach achieves a23 better balance between the time-efficiency and the result quality. For expression simplification and job scheduling,24 NeuRewriter is even more time-efficient than Z3 and OR-tools. The region-pickerฯฯ is parameterized by aQ-function and is similar in spirit to soft-Q learning [2].
RobustDeepReinforcementLearning throughAdversarialLoss
Our RADIAL-RL agents consistently outperform prior methods when tested against attacks of varying strength and are more computationally efficient to train. In addition, we propose a new evaluation method calledGreedyWorst-Case Reward(GWC) tomeasure attack agnostic robustness of deep RL agents. We show that GWC can be evaluated efficiently and is a good estimate of the reward under the worst possible sequence of adversarial attacks.