
Tothis end, weperformed additional manipulations ofanimal task6 parameters (inter-trial delays; reward depletion rates; switch costs) and tested non-deep models (port/action-value7 V/Q-learning). Suchadjusment36 does notnecessitatecostly updates ofhigh-dimensional state values.