Goto

Collaborating Authors

 induction





Appendix to Weakly Coupled Deep Q-Networks A Proofs

Neural Information Processing Systems

We prove part the first part of the proposition (weak duality) by induction. It is well-known that, by the value iteration algorithm's convergence, Q Consider a state s S and a feasible action a A (s). We use an induction proof. B (w), which follows by the convergence of value iteration.A.2 Proof of Theorem 1 Proof. Now we state the following lemma.



A Corrections to the main paper 2 2 B Problem setup 3

Neural Information Processing Systems

In the course of preparing the supplementary materials we identified the following two mistakes. For the convenience of the reader we provide the full, corrected table below. C is an appropriatly chosen constant. Frei et al. (2022) Xu & Gu (2023) Theorem 3.1 Theorem 3.6 Theorem 3.8n C log null 1 δ null log null m δ null 1 δ 1 log null m δ null m C 1 log null n δ null log null n δ null log null n δ null log null n δ null γ 1 C 1 n 1 n 1 n 1 nd 1 k γ C 1 nd null log( The same mistake also means that the sentence starting on line 188 "Comparing In order to provide a convenient reference for the reader, we summarize our notation as follows. As such we typically resort to using a generically large enough constant C . For the reader's convenience we recap the data model studied in this work. We assume test data are drawn mutually i.i.d. In regard to the initialization of the network weights, for convenience we assume each neuron's To this end, we introduce the following notation, where p { 1, 1}. P(( B < κT) (T > 0) | w, v > 0) 1 P( T = 0 | w, v > 0) P( B κT | w, v > 0), therefore it suffices to upper bound the two probabilities on the right-hand-side. Using a variant of Hoeffding's bound for sampling without replacement (see Proposition Based on Lemma B.2, the following lemma bounds the probability that " on the counting functions: in particular we write P (i, l) + P (i, l) = P ( i, i) = 1 /2 and hence we conclude p + q = 1 / 2. As a result Observe by the data model, described in Section B.2, that We will often make use of the following similar but more pessimistic bounds on the activations.