Goto

Collaborating Authors

 jensen


Appendices A Some Useful Lemmas

Neural Information Processing Systems

In this paper, there are some equivalent forms of the generalization error we will study, e.g., Eq. (2) This lemma is a consequence of Lemma 2.1, with further utilizing some symmetric properties. Recall Eq. (1) in Lemma 2.1, E Note that Eq. (2) in the main text is from the second equation above, which is used to derive individual Notice that we do not change the definitions of any the random variable, e.g., This, as we have already seen in Eq. (5) in the main text, is used to derive hypotheses-conditioned CMI bounds in Section 4. It's easy to see that when To obtain Eq. (14), we let W This is used to derive supersample-conditioned CMI bounds in Section 4. It's easy to see that both Like all the previous information-theoretic bounds, the following lemma is widely used in our paper. We also invoke some other lemmas as given below. It's easy to verify that We note that the reason we introduce four types of SCH stability in Definition 2.1 is that solely using The basic set up is as follows. By Lemma A.3, we have E Recall Eq. (12) in Lemma A.1 and applying Jensen's inequality to the absolute function, the first The proof is nearly the same to the proof of Theorem 3.1, except that now the randomness of the algorithm is given for each DV auxiliary function, so the randomness of Similar to the proof of Theorem 3.1, we let We now prove the first bound. Lemma A.2, we have E By Lemma A.3, we have E Recall Eq. (14) in Lemma A.1 and by Jensen's inequality for the absolute function, the first bound is To prove the second bound, we return to Eq. (20), and take expectation over For the second part of Theorem 4.1, notice that it's valid to let The proof is similar to [18, Theorem 2.1].


Spatio-Temporal Trajectory Foundation Model - Recent Advances and Future Directions

Yang, Sean Bin, Sun, Ying, Cheng, Yunyao, Lin, Yan, Torp, Kristian, Hu, Jilin

arXiv.org Artificial Intelligence

Foundation models (FMs) have emerged as a powerful paradigm, enabling a diverse range of data analytics and knowledge discovery tasks across scientific fields. Inspired by the success of FMs, particularly large language models, researchers have recently begun to explore spatio-temporal foundation models (STFMs) to improve adaptability and generalization across a wide spectrum of spatio-temporal (ST) tasks. Despite rapid progress, a systematic investigation of trajectory foundation models (TFMs), a crucial subclass of STFMs, is largely lacking. This tutorial addresses this gap by offering a comprehensive overview of recent advances in TFMs, including a taxonomy of existing methodologies and a critical analysis of their strengths and limitations. In addition, the tutorial highlights open challenges and outlines promising research directions to advance spatio-temporal general intelligence through the development of robust, responsible, and transferable TFMs.




A Additional statements and proofs

Neural Information Processing Systems

Lemma 2. If X is a σ -subgaussian random variable with zero mean, then E e To prove the second part of the lemma, we are going to use Donsker-V aradhan inequality again, but for a different function. To prove the last part of the lemma, we just use the Markov's Furthermore, each of these summands has zero mean. Furthermore, each of these summands has zero mean. The proof closely follows that of Proposition 1. Bounding expected generalization gap of f . A natural question arises whether a different type of noise would give better bounds.



A Generalised Jensen Inequality

Neural Information Processing Systems

In Section 4, we require a version of Jensen's inequality generalised to (possibly) infinite-dimensional We will actually apply generalised Jensen's inequality with conditional expectations, so we need the Suppose E is a sub- σ -algebra of F . Now take the supremum of the right-hand side over Q. Then (5) tells us that E null f (V) |E null f null E null V |E nullnull, as required. Pettis integrability, all the conditions of Theorem A.2 are satisfied. The assumptions used in this paper, with notations translated to our context, are 15 1.



Supplementary Material for " Online Convex Optimization Over Erd os-Rényi Random Networks "

Neural Information Processing Systems

Let Assumptions 1 and 2 hold. By substituting (A.5) into (A.3) and using Assumption 2, we derive Hence by (A.7), we obtain (A.2). Jensen's inequality for conditional expectations, we obtain that Enull nulle Suppose Assumptions 1, and 2, hold. By combing (2) with (A.14) and (A.8), there holds x Note by the Jensen's inequality that Then from (A.13) it follows that E null Then the theorem is proved. This combined with (A.19) proves the theorem.