jensen
- North America > United States (0.14)
- Asia > Japan (0.04)
- Europe > Poland (0.04)
Appendices A Some Useful Lemmas
In this paper, there are some equivalent forms of the generalization error we will study, e.g., Eq. (2) This lemma is a consequence of Lemma 2.1, with further utilizing some symmetric properties. Recall Eq. (1) in Lemma 2.1, E Note that Eq. (2) in the main text is from the second equation above, which is used to derive individual Notice that we do not change the definitions of any the random variable, e.g., This, as we have already seen in Eq. (5) in the main text, is used to derive hypotheses-conditioned CMI bounds in Section 4. It's easy to see that when To obtain Eq. (14), we let W This is used to derive supersample-conditioned CMI bounds in Section 4. It's easy to see that both Like all the previous information-theoretic bounds, the following lemma is widely used in our paper. We also invoke some other lemmas as given below. It's easy to verify that We note that the reason we introduce four types of SCH stability in Definition 2.1 is that solely using The basic set up is as follows. By Lemma A.3, we have E Recall Eq. (12) in Lemma A.1 and applying Jensen's inequality to the absolute function, the first The proof is nearly the same to the proof of Theorem 3.1, except that now the randomness of the algorithm is given for each DV auxiliary function, so the randomness of Similar to the proof of Theorem 3.1, we let We now prove the first bound. Lemma A.2, we have E By Lemma A.3, we have E Recall Eq. (14) in Lemma A.1 and by Jensen's inequality for the absolute function, the first bound is To prove the second bound, we return to Eq. (20), and take expectation over For the second part of Theorem 4.1, notice that it's valid to let The proof is similar to [18, Theorem 2.1].
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- North America > Canada (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
- Asia > Middle East > Jordan (0.04)
bbc92a647199b832ec90d7cf57074e9e-Supplemental.pdf
Before defining our algorithm at each iterationt we first lighten our notation with a shorthandba(X) = b(ˆp(t 1)(X),a) (at different iterationt, ba denotes different functions), andb(X) is the vector of (b1(X),,bK(X)). For the intuition of the algorithm, consider the t-th iteration where the current prediction function is ˆp(t 1). Thestatement of the theorem is identical; the proof is also essentially the same except for the use of some new technicaltools. Conversely, if ˆp is LB decision calibrated, then kE[p (X) ˆp(X)|U]k1 = 0 almost surely (because if the expectation of a non-negative random variable is zero, the random variable must be zero almost surely), which implies thatˆp is distributioncalibrated. For BKa we use the VC dimension approach.
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (11 more...)
A Baseline algorithms
The following theorem is a more general version of Theorem 5.1. Assume that Assumptions 1 to 3 hold. Note that the only difference between Theorem B.1 and Theorem 5.1 lies in That is, the "oldest" response used to update By Jensen's inequality and L -smoothness, we have null f In order for the paper to be self-contained, we restate the proof here. The following lemma is slightly modified from Lemma 8 in [18]. By Lemma B.1, we have B Combining Appendix B.3.1 and Appendix B.3.2, we have B.4 Deriving the convergence bound In this subsection, we obtain Theorem B.1 based on the descent lemma.