Context-dependent upper-confidence bounds for directed exploration

Raksha Kumaraswamy, Matthew Schlegel, Adam White, Martha White

Neural Information Processing Systems 

Second, we t = rt+1+ t+1x>t+1w x>t w , TD-errorforw (see (2)). This t is tobelarger t.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found