A Additional numerical experiments
–Neural Information Processing Systems
In this section, we introduce some additional numerical experiments. To add some randomness of the environment, we set that the states transit randomly. The optimal policy encourages the agent to take the special jump and reach the terminal state. In the target policy, the agent will reach the terminal state as soon as possible but avoid to take the special jump. We assume that the agent does not know the attacker's manipulations and the presence of the attacker.
Neural Information Processing Systems
Nov-14-2025, 08:04:05 GMT
- Technology: