Energy
A Discussion on Hyper parameter Tuning
Contextual bandit is a class of online learning problems that can be viewed as a simple reinforcement learning problem without transition. For a completely understanding of contextual bandit problems, we refer the readers to the Chapter 4 of [Bubeck et al., 2012]. Here we include the main idea for completeness. In contextual bandit problems, the agent needs to find out the best action given some observed context (a.k.a the optimal policy in reinforcement learning). Formally, we define S as the context set and K as the number of action.
Gauging Variational Inference
Sung-Soo Ahn, Michael Chertkov, Jinwoo Shin
Both provide lower bounds for the partition function by utilizing the so-called gauge transformation which modifies factors of GM while keeping the partition function invariant. Moreover, we prove that both G-MF and G-BP are exact for GMs with a single loop of a special structure, even though the bare MF and BP perform badly in this case.