Goto

Collaborating Authors

 Directed Networks



214cfbe603b7f9f9bc005d5f53f7a1d3-Paper.pdf

Neural Information Processing Systems

In this paper, we investigate the question: Given a small number of datapoints, for example N = 30, how tight can PAC-Bayes and test set bounds be made? For such small datasets, test set bounds adversely affect generalisation performance by withholding data from the training procedure. In this setting, PAC-Bayes bounds are especially attractive, due to their ability to use all the data to simultaneouslylearn a posterior and bound its generalisation risk. We focus on the case of i.i.d.


Markov locality and relating it to p locality

Neural Information Processing Systems

To gain intuition for how p-locality functions, we will introduce another notion of locality, called Markov locality, which will use the language of Markov blankets. We will prove that under relatively relaxed conditions p-locality and Markov locality are equivalent. This will allow us to relate the notion of locality to various graph structures commonly used to represent probability distributions, and will be a key step in proving Properties 2.1 and 2.2. We start by defining the Markov boundary, M(X,S), of a random variable X contained in a set of random variables S, as a minimal set such that p(X|S) = p(X|M(X,S)). The Markov boundary defines a minimal set of variables such that, conditioned on these variables, conditioning on no additional random variables in S changes the probability of X [39]. Similarly, we define the Markov blanket, M(X,S) for X in S as any set of variables such that conditioning on M(X,S), makes X conditionally independent from all other variables [39]. In this way, the Markov boundary is a Markov blanket but not all blankets are boundaries. Markov locality: Given probability distribution p(Z) and function f: RNX+Nฮ˜ RNฮ˜, the update function f(Z) is Markov-local with respect to the distribution p over Z if and only if k: Z โ„ฆs.t. AMarkov boundary can be thought of as the set of variables that'locally' communicate with the parameter ฮ˜k, thus providing a natural measure of locality. Importantly, for Markov-locality to be of use, we would like the Markov boundaries of random variables in the model of interest to be unique.





Model Adaptation: Historical Contrastive Learning for Unsupervised Domain Adaptation without Source Data Supplemental Materials Anonymous Author(s) Affiliation Address email

Neural Information Processing Systems

A.1 Proof of Proposition 12 Proposition 1 The historical contrastive instance discrimination (HCID) can be modelled as a3 maximum likelihood problem optimized via Expectation Maximization.4 Maximum likelihood (ML) is a concept to describe the theoretic insights of clustering algorithms.6 PN n=1 Z(kn) = 1), and the last step of derivation13 employs Jensen's inequality [6, 7, 4]. Z(kn) log p(xq,kn; ฮธE) (5) Expectation step focuses on estimating the posterior probability p(kn; xq,ฮธE). We first gener-17 ate keys by a historical encoder: kt mn = Et m(xt), and xt Xtgt. Then, We calculate18 p(kn; xq,ฮธE) = p(kt mn; xq,ฮธE) = 1 (xq,kt mn), where 1 (xq,kt mn) = 1 if both belong to the19 positive pair; otherwise, 1 (xq,kt mn) = 0.20 Please note the notation "t m" shows that the k is encoded by a historical encoder.21