Appendices
–Neural Information Processing Systems
In Equation 4, maximization is carried out over the inputy to the inverse-map, and the input z which is captured inˆp in the above optimization problem, i.e. maximization overz in Equation 4 is equivalent to choosingˆp subject to the choice of singleton/ Dirac-deltaˆp. Since Equation 4 describes a constrained optimization problem, our approach towards solving this problem in practice is via dual gradient descent. Gradient descent is used to optimize the Lagrangian of Equation 4 (with the constraintp(z) 2 modified to belogp(z) 2 as it is easy to uselogp(z)numerically for stochasticoptimization),showninEquation5. Ateachiteration,itsamplesafunction from this distribution and queries the pointx?t that greedily minimizes this function. Information Ratio Russo and Van Roy[30] related the expected regret of TS to its expected information gain i.e. the expected reduction in the entropy of the posterior distribution ofX .
Neural Information Processing Systems
Feb-8-2026, 02:06:39 GMT
- Technology: