Goto

Collaborating Authors

 Bayesian Inference


Artificial Intelligence in Clinical Health Care Applications: Viewpoint

arXiv.org Artificial Intelligence

The idea of Artificial Intelligence (AI) has a long history. It turned out, however, that reaching intelligence at human levels is more complicated than originally anticipated. Currently we are experiencing a renewed interest in AI, fueled by an enormous increase in computing power and an even larger increase in data, in combination with improved AI technologies like deep learning. Healthcare is considered the next domain to be revolutionized by Artificial Intelligence. While AI approaches are excellently suited to develop certain algorithms, for biomedical applications there are specific challenges. We propose recommendations to improve AI projects in the biomedical space and especially clinical healthcare.


Gradient-Based Neural DAG Learning

arXiv.org Machine Learning

We propose a novel score-based approach to learning a directed acyclic graph (DAG) from observational data. We adapt a recently proposed continuous constrained optimization formulation to allow for nonlinear relationships between variables using neural networks. This extension allows to model complex interactions while being more global in its search compared to other greedy approaches. In addition to comparing our method to existing continuous optimization methods, we provide missing empirical comparisons to nonlinear greedy search methods. On both synthetic and real-world data sets, this new method outperforms current continuous methods on most tasks while being competitive with existing greedy search methods on important metrics for causal inference.


Combining Physics-Based Domain Knowledge and Machine Learning using Variational Gaussian Processes with Explicit Linear Prior

arXiv.org Machine Learning

Centuries of development in natural sciences and mathematical modeling provide valuable domain expert knowledge that has yet to be explored for the development of machine learning models. When modeling complex physical systems, both domain knowledge and data contribute important information about the system. In this paper, we present a data-driven model that takes advantage of partial domain knowledge in order to improve generalization and interpretability. The presented model, which we call EVGP (Explicit Variational Gaussian Process), uses an explicit linear prior to incorporate partial domain knowledge while using data to fill in the gaps in knowledge. Variational inference was used to obtain a sparse approximation that scales well to large datasets. The advantages include: 1) using partial domain knowledge to improve inductive bias (assumptions of the model), 2) scalability to large datasets, 3) improved interpretability. We show how the EVGP model can be used to learn system dynamics using basic Newtonian mechanics as prior knowledge. We demonstrate that using simple priors from partially defined physics models considerably improves performance when compared to fully data-driven models.


Do place cells dream of conditional probabilities? Learning Neural Nystr\"om representations

arXiv.org Machine Learning

We posit that hippocampal place cells encode information about future locations under a transition distribution observed as an agent explores a given (physical or conceptual) space. The encoding of information about the current location, usually associated with place cells, then emerges as a necessary step to achieve this broader goal. We formally derive a biologically-inspired neural network from Nystr\"om kernel approximations and empirically demonstrate that the network successfully approximates transition distributions. The proposed network yields representations that, just like place cells, soft-tile the input space with highly sparse and localized receptive fields. Additionally, we show that the proposed computational motif can be extended to handle supervised problems, creating class-specific place cells while exhibiting low sample complexity.


Bayesian Optimization of Composite Functions

arXiv.org Machine Learning

We consider optimization of composite objective functions, i.e., of the form $f(x)=g(h(x))$, where $h$ is a black-box derivative-free expensive-to-evaluate function with vector-valued outputs, and $g$ is a cheap-to-evaluate real-valued function. While these problems can be solved with standard Bayesian optimization, we propose a novel approach that exploits the composite structure of the objective function to substantially improve sampling efficiency. Our approach models $h$ using a multi-output Gaussian process and chooses where to sample using the expected improvement evaluated on the implied non-Gaussian posterior on $f$, which we call expected improvement for composite functions (\ei). Although \ei\ cannot be computed in closed form, we provide a novel stochastic gradient estimator that allows its efficient maximization. We also show that our approach is asymptotically consistent, i.e., that it recovers a globally optimal solution as sampling effort grows to infinity, generalizing previous convergence results for classical expected improvement. Numerical experiments show that our approach dramatically outperforms standard Bayesian optimization benchmarks, reducing simple regret by several orders of magnitude.


The Stanford Acuity Test: A Probabilistic Approach for Precise Visual Acuity Testing

arXiv.org Artificial Intelligence

Chart-based visual acuity measurements are used by billions of people to diagnose and guide treatment of vision impairment. However, the ubiquitous eye exam has no mechanism for reasoning about uncertainty and as such, suffers from a well-documented reproducibility problem. In this paper we uncover a new parametric probabilistic model of visual acuity response based on measurements of patients with eye disease. We present a state of the art eye exam which (1) reduces acuity exam error by 75\% without increasing exam length, (2) knows how confident it should be, (3) can trace predictions over time and incorporate prior beliefs and (4) provides insight for educational Item Response Theory. For patients with more serious eye disease, the novel ability to finely measure acuity from home could be a crucial part in early diagnosis. We provide a web implementation of our algorithm for anyone in the world to use.


Reliable training and estimation of variance networks

arXiv.org Machine Learning

We propose and investigate new complementary methodologies for estimating predictive variance networks in regression neural networks. We derive a locally aware mini-batching scheme that result in sparse robust gradients, and show how to make unbiased weight updates to a variance network. Further, we formulate a heuristic for robustly fitting both the mean and variance networks post hoc. Finally, we take inspiration from posterior Gaussian processes and propose a network architecture with similar extrapolation properties to Gaussian processes. The proposed methodologies are complementary, and improve upon baseline methods individually. Experimentally, we investigate the impact on predictive uncertainty on multiple datasets and tasks ranging from regression, active learning and generative modeling. Experiments consistently show significant improvements in predictive uncertainty estimation over state-of-the-art methods across tasks and datasets.


Estimating Real Log Canonical Thresholds

arXiv.org Machine Learning

Evaluation of the marginal likelihood plays an important role in model selection problems. The widely applicable Bayesian information criterion (WBIC) and singular Bayesian information criterion (sBIC) give approximations to the log marginal likelihood, which can be applied to both regular and singular models. When the real log canonical thresholds are known, the performance of sBIC is considered to be better than that of WBIC, but only few real log canonical thresholds are known. In this paper, we propose a new estimator of the real log canonical thresholds based on the variance of thermodynamic integration with an inverse temperature. In addition, we propose an application to make sBIC widely applicable. Finally, we investigate the performance of the estimator and model selection by simulation studies and application to real data.


Bayes Theorem: A Primer - Lavanya.ai

#artificialintelligence

Imagine you're sleeping, and you hear strange noises in your front lawn. You're very sleepy, so you hypothesize that the strange noises are being generated by a hungry dinosaur. You think to yourself, 'this is exactly what I would hear if there was a dinosaur outside in my front lawn'. But then as you think more about it, you realize that the likelihood of there actually being a dinosaur in your front lawn is extremely low; whereas the likelihood of hearing strange noises from the front lawn is likely pretty high. So you exhale as you realize that the actual probability of there being a dinosaur in your front lawn, aka your original hypothesis, given the evidence is extremely low.


Quantifying Point-Prediction Uncertainty in Neural Networks via Residual Estimation with an I/O Kernel

arXiv.org Machine Learning

Neural Networks (NNs) have been extensively used for a wide spectrum of real-world regression tasks, where the goal is to predict a numerical outcome such as revenue, effectiveness, or a quantitative result. In many such tasks, the point prediction is not enough, but also the uncertainty (i.e. risk, or confidence) of that prediction must be estimated. Standard NNs, which are most often used in such tasks, do not provide any such information. Existing approaches try to solve this issue by combining Bayesian models with NNs, but these models are hard to implement, more expensive to train, and usually do not perform as well as standard NNs. In this paper, a new framework called RIO is developed that makes it possible to estimate uncertainty in any pretrained standard NN. RIO models prediction residuals using Gaussian Process with a composite input/output kernel. The residual prediction and I/O kernel are theoretically motivated and the framework is evaluated in twelve real-world datasets. It is found to provide reliable estimates of the uncertainty, reduce the error of the point predictions, and scale well to large datasets. Given that RIO can be applied to any standard NN without modifications to model architecture or training pipeline, it provides an important ingredient in building real-world applications of NNs.