TheValue-EquivalencePrinciple forModel-Based ReinforcementLearning SupplementaryMaterial

Neural Information Processing Systems 

Moreover, we include an additional result which illustrates a situation in which approximate VE models can outperform the MLEmodel. For each (i,j) pair, the above expression is suggestive of a dot-product between twon m vectors: a combination ofai and cj, and a "flattened" version ofB. Define the former combination of vectors asdij = [ai1cj1,ai1cj2,,aincjm]> Rnm 1, and stack them as rows as: D =[d11,d12,,dnm]> Rk` nm.ToflattenB,simplydefineb=[B11,B12,,Bnm]> Finally notice that the construction ofdij can be thought of as vertically stackingn copies ofcj eachscaledbyadifferententryin ai. This means that scaled copies of bothai and cj can be found by selecting specific groups of indices indij. It follows that ifa1,...,an are linearly independent then so ared1j,...,dnj for any j.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found