Appendix
–Neural Information Processing Systems
We explain the robustness interpretation of the dual regularization as the perturbation of Bellman differences. In this section, we elaborate the robustness interpretation of the primal regularization. For simplicity, we also considerf1() = () 2. Therefore, we haveαQ E(s,a) dD[f1(Q(s,a))] = αQ n For different regularization, the perturbations will be in differentdualspaces. Similarly, we can derive the unconstrianed dual form by removing the primal variable with a particular primal regularizationαQEdD[f1(Q)]. We use a10 10 grid environment where an agent can move left/right/up/down. The observations are thex,y coordinates of this agent's location.
Neural Information Processing Systems
Feb-8-2026, 07:36:47 GMT
- Technology:
- Information Technology (0.87)