Agents
Appendix 571 In this appendix, we provide more details about the four experiments and some scenario examples
Autoregressive sampling is used to create a traffic snapshot. We train a scenario generation model TrafficGen with mixed data. The detailed hyperparameters are shown in Table 4. Figure 7: Dynamics of the generated traffic scenarios. The first column is the original case. The middle columns show the generated scenarios at different timesteps.
Safety through feedback in Constrained RL
This feedback can be system generated or elicited from a human observing the training process. Previous approaches have not been able to scale to complex environments and are constrained to receiving feedback at the state level which can be expensive to collect. To this end, we introduce an approach that scales to more complex domains and extends beyond state-level feedback, thus, reducing the burden on the evaluator.