4b4edc2630fe75800ddc29a7b4070add-Supplemental.pdf

Neural Information Processing Systems 

Assumption 2.1, Assumption 2.2 and is policy complete. Suppose we have learned policiesπh+1,...,πH, we use eπh to denote the optimal policy of Q Thus Definition 3.7 gives πH(s)=π H(s). Notice that ReLU, squared ReLU, leaky ReLU, and polynomial activation function functions all satisfies the above assumption. We make the following assumption on the dimension of feature vectors, which corresponds to how features can extract information about neural networks from noisysamples. Define outer product e as follows.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found