example where multi step outperforms one step

Apr-25-2026, 04:38:49 GMT–Neural Information Processing Systems

As explained in the main text, this section presents an example that is only a slight modification of the one in Figure 4, but where a multi-step approach is clearly preferred over just one step. The data-generating and learning processes are exactly the same (100 trajectories of length 100, discount 0.9, α = 0.1for reverse KL regularization). The only difference is that rather than using a behavior that is a mixture of optimal and uniform, we use a behavior that is a mixture of maximally suboptimal and uniform. If we call the suboptimal policy π (which always goes down and left in our gridworld), then the behavior for the modified example is β = 0.2 π +0.8 u, where uis uniform. Results are shown in Figure 7. Figure 7: A gridworld example with modified behavior where multi-step is much better than one-step.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Apr-25-2026, 04:38:49 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)

Duplicate Docs Excel Report

Title
step

Similar Docs Excel Report more

Title	Similarity	Source
None found