Incrementality Bidding via Reinforcement Learning under Mixed and Delayed Rewards Appendix A Formal Definition of Inhomogeneous Poisson Process

Neural Information Processing Systems 

To prove Theorem 2, we need several auxiliary lemmas. Lemma 2. Given estimated parameters null θ and null F, for any bidding policy π, we have R ( π; null θ, null F) R ( π; θ, null F) E Recall, by definition of R (π; null θ, null F) in Eq. (6), for any null θ and null F, R (π; null θ, null F) = E Lemma 3. F or any fixed bidding strategy π, we have null null R ( π; θ, null F Given the MDP re-formulation in Subsection 3.1, we have for any null,b and h = 2, 3,,H, null Given the above two auxiliary lemmas, we are ready to prove Theorem 2. Proof of Theorem 2. For notation simplicity, let OPT( null CR Then we bound the above terms separately in the following. In this section, we enumerate several useful technical lemmas used in this paper. Finally, we describe the well-known simulation lemma. Then we complete the proof.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found