72dad95a24fae750f8ab1cb3dab5e58d-Paper-Conference.pdf

Neural Information Processing Systems 

These additive-noise models areprimarily focusing onaccurately estimating theconditional mean E[y|x], while paying less attention to whether the noise distribution can accurately capture the uncertainty ofy given x. For this reason, they may not work well if the distribution ofy given x clearly deviates from the additive-noise assumption. For example, ifp(y|x) is multi-modal, which commonly happens when there are missing categorical covariates inx, then E[y|x] may not be close to any possible true values ofy given that specificx.