Appendix

Neural Information Processing Systems 

In this section we motivate the design choices and inductive biases that we encode into our neural encoder network e, which is the network that is used to model the relative accuracies of the weak supervision sources λ. Recall that we model the probability of a particular sample x X having the class label y Y = {1,...,C}as Pθ(y|λ) = softmax(s)yP(y), (4) s = θ(λ,x)Tλ RC . Connection to prior PGM models We now motivate this choice by deriving a less expressive variant of it from the standard Markov Random Field (MRF) used in the related work. If we view the attention scores θ(λ,x) Rm, that assign sample-dependent accuracies to each labeling function, as sample-independent parameters θ1 and, by that, drop the features from the equation - as is done in the related work [30, 32, 19, 11] - we can rewrite Eq. 4 as exp θT1 1 {λ = y} P We can recognize Pθ as a distribution from the exponential familiy, and more specifically as a pairwise MRF, or factor graph, with canonical parameters θ = (θ1,θ2) and corresponding sufficient statistics, or factors, φ(λ,y) = (φ1(λ,y),φ2(λ)), as well as the log partition function Zθ. The accuracy factors and parameters φ1,θ1 are the core component of this model and sometimes take the form φ1(λy) = λy in binary models as in [30, 19, 11]. The label-independent factors φ2(λ) have, as can be seen from the derivation above, no direct influence on the latent label posterior, but are often used to model labeling propensities 1 {λ 6= 0}and correlation dependencies 1 {λi = λj}, which can be important for PGM parameter learning, but are susceptible to misspecifications [39, 11, 8].

Similar Docs  Excel Report  more

TitleSimilaritySource
None found