b710915795b9e9c02cf10d6d2bdb688c-Paper.pdf
–Neural Information Processing Systems
The most well-known work in the reward shaping domain is the potential-based reward shaping (PBRS) method [12], which is the first to show that policy invariance can be guaranteed if the shaping reward function is in the form of the difference of potential values.
Neural Information Processing Systems
Feb-9-2026, 23:58:17 GMT
- Country:
- North America > Canada
- Asia > China
- Zhejiang Province > Hangzhou (0.04)
- Tianjin Province > Tianjin (0.04)
- Technology: