Goto

Collaborating Authors


Bias and variance of the Bayesian-mean decoder

Neural Information Processing Systems

Perception, in theoretical neuroscience, has been modeled as the encoding of external stimuli into internal signals, which are then decoded. The Bayesian mean is an important decoder, as it is optimal for purposes of both estimation and discrimination. We present widely-applicable approximations to the bias and to the variance of the Bayesian mean, obtained under the minimal and biologicallyrelevant assumption that the encoding results from a series of independent, though not necessarily identically-distributed, signals. Simulations substantiate the accuracy of our approximations in the small-noise regime. The bias of the Bayesian mean comprises two components: one driven by the prior, and one driven by the precision of the encoding.



Sparse and Continuous Attention Mechanisms Andrรฉ F. T. Martins,, Ant os Treviso Vlad Niculae, Pe

Neural Information Processing Systems

Exponential families are widely used in machine learning; they include many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, there has been recent work on sparse alternatives to softmax (e.g.


f0b76267fbe12b936bd65e203dc675c1-AuthorFeedback.pdf

Neural Information Processing Systems

Note that the VQA results in Table 2 with continuous attention use fewer basis functions than discrete regions. Good idea, we will add this to the camera-ready version. Is this a necessary or a sufficient condition?" Sufficient; we will clarify and follow the suggestions (move the beta-escort definition to the main text and fix typos). We will add a citation. We chose ridge regression as it enables a closed-form solution expressed linearly in terms of the basis functions (Eq. We haven't tried linear interpolation, However, for a high-level vision system, combining our method with BUTD is an interesting idea. Text are naturally discrete tokens."



Supplement: Matrix Completion with Quantified Uncertainty through Low Rank Gaussian Copula

Neural Information Processing Systems

For the first equality, we use Eq. In practice, the result is more useful for small d, such as d = 0. Let us first state a generalization of our Theorem 2. Theorem 4. Suppose x LRGC(W, ฯƒ The proof applies to each missing dimension j M. Let us further define s For a detailed treatment of sub-Gaussian random distributions, see [10]. K p for all p 1 with some K > 0. The sub-Gaussian norm of x is defined as ||x|| Our Lemma 2 is Lemma 17 in [1], which is also a simplified version of Theorem 1 in [4]. To compute (2) and (3), we use the law of total expectation similar as in Section 1.1 by first treating z R. The computation for all cases are similar. We take the first case as an example.


Matrix Completion with Quantified Uncertainty through Low Rank Gaussian Copula

Neural Information Processing Systems

Modern large scale datasets are often plagued with missing entries. For tabular data with missing values, a flurry of imputation algorithms solve for a complete matrix which minimizes some penalized reconstruction error. However, almost none of them can estimate the uncertainty of its imputations. This paper proposes a probabilistic and scalable framework for missing value imputation with quantified uncertainty. Our model, the Low Rank Gaussian Copula, augments a standard probabilistic model, Probabilistic Principal Component Analysis, with marginal transformations for each column that allow the model to better match the distribution of the data. It naturally handles Boolean, ordinal, and real-valued observations and quantifies the uncertainty in each imputation.


sample variance, few papers explicitly explore the issue of calibration: does MI sample variance predict imputation

Neural Information Processing Systems

We thank the reviewers for providing useful feedback. Our paper does address calibration: imputation accuracy correlates with our uncertainty metric. We compare our method with one of the fastest MI methods, MIPCA (Josse et al. 2011), on synthetic Figure 1: Imputation error (NRMSE for continuous and MAE for ordinal) on the subset of m% entries for which method's associated uncertainty metric indicates highest reliability. For MIPCA, we use 20 imputations; low sample variance corresponds to high reliability. It can also be used for MI if desired.