AITopics

Experimental design is a fundamental problem in many science and engineering fields. In this problem, sample efficiency is crucial due to the time, money, and safety costs of real-world design evaluations. Existing approaches either rely on active data collection or access to large, labeled datasets of past experiments, making them impractical in many real-world scenarios. In this work, we address the more challenging yet realistic setting of few-shot experimental design, where only a few labeled data points of input designs and their corresponding values are available. We approach this problem as a conditional generation task, where a model conditions on a few labeled examples and the desired output to generate an optimal input design. To this end, we introduce Experiment Pretrained Transformers (ExPT), a foundation model for few-shot experimental design that employs a novel combination of synthetic pretraining with in-context learning. In ExPT, we only assume knowledge of a finite collection of unlabelled data points from the input domain and pretrain a transformer neural network to optimize diverse synthetic functions defined over this domain. Unsupervised pretraining allows ExPT to adapt to any design task at test time in an in-context fashion by conditioning on a few labeled data points from the target task and generating the candidate optima. We evaluate ExPT on few-shot experimental design in challenging domains and demonstrate its superior generality and performance compared to existing methods. The source code is available at https://github.com/tung-nd/ExPT.git.

expt, expt 0, optimization, (14 more...)

2310.19961

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

List, Florian, Montel, Noemi Anau, Weniger, Christoph

Bayesian Simulation-based Inference for Cosmological Initial Conditions

Reconstructing astrophysical and cosmological fields from observations is challenging. It requires accounting for non-linear transformations, mixing of spatial structure, and noise. In contrast, forward simulators that map fields to observations are readily available for many applications. We present a versatile Bayesian field reconstruction algorithm rooted in simulation-based inference and enhanced by autoregressive modeling. The proposed technique is applicable to generic (non-differentiable) forward simulators and allows sampling from the posterior for the underlying field. We show first promising results on a proof-of-concept application: the recovery of cosmological initial conditions from late-time density fields.

arxiv, monthly notice, royal astronomical society, (11 more...)

2310.1991

Country:

Europe > Austria > Vienna (0.14)
Asia > Turkmenistan > Ahal Region > Anau (0.05)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Yuksekgonul, Mert, Zhang, Linjun, Zou, James, Guestrin, Carlos

Beyond Confidence: Reliable Models Should Also Consider Atypicality

While most machine learning models can provide confidence in their predictions, confidence is insufficient to understand a prediction's reliability. For instance, the model may have a low confidence prediction if the input is not well-represented in the training dataset or if the input is inherently ambiguous. In this work, we investigate the relationship between how atypical (rare) a sample or a class is and the reliability of a model's predictions. We first demonstrate that atypicality is strongly related to miscalibration and accuracy. In particular, we empirically show that predictions for atypical inputs or atypical classes are more overconfident and have lower accuracy. Using these insights, we show incorporating atypicality improves uncertainty quantification and model performance for discriminative neural networks and large language models. In a case study, we show that using atypicality improves the performance of a skin lesion classifier across different skin tone groups without having access to the group attributes. Overall, we propose that models should use not only confidence but also atypicality to improve uncertainty quantification and performance. Our results demonstrate that simple post-hoc atypicality estimators can provide significant value.

atypicality, calibration, prediction, (13 more...)

2305.18262

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (1.00)
Health & Medicine > Therapeutic Area > Dermatology (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Chandra, Kartik, Chen, Tony, Li, Tzu-Mao, Ragan-Kelley, Jonathan, Tenenbaum, Josh

Inferring the Future by Imagining the Past

A single panel of a comic book can say a lot: it can depict not only where the characters currently are, but also their motions, their motivations, their emotions, and what they might do next. More generally, humans routinely infer complex sequences of past and future events from a *static snapshot* of a *dynamic scene*, even in situations they have never seen before. In this paper, we model how humans make such rapid and flexible inferences. Building on a long line of work in cognitive science, we offer a Monte Carlo algorithm whose inferences correlate well with human intuitions in a wide variety of domains, while only using a small, cognitively-plausible number of samples. Our key technical insight is a surprising connection between our inference problem and Monte Carlo path tracing, which allows us to apply decades of ideas from the computer graphics community to this seemingly-unrelated theory of mind task.

agent, algorithm, inference, (14 more...)

2305.17195

Country:

North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > New York > New York County > New York City (0.04)
(6 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
(2 more...)

Kulkarni, Siddhesh, Pal, Subhadip, Gaskins, Jeremy T.

A Bayesian Methodology for Estimation for Sparse Canonical Correlation

It can be challenging to perform an integrative statistical analysis of multi-view high-dimensional data acquired from different experiments on each subject who participated in a joint study. Canonical Correlation Analysis (CCA) is a statistical procedure for identifying relationships between such data sets. In that context, Structured Sparse CCA (ScSCCA) is a rapidly emerging methodological area that aims for robust modeling of the interrelations between the different data modalities by assuming the corresponding CCA directional vectors to be sparse. Although it is a rapidly growing area of statistical methodology development, there is a need for developing related methodologies in the Bayesian paradigm. In this manuscript, we propose a novel ScSCCA approach where we employ a Bayesian infinite factor model and aim to achieve robust estimation by encouraging sparsity in two different levels of the modeling framework. Firstly, we utilize a multiplicative Half-Cauchy process prior to encourage sparsity at the level of the latent variable loading matrices. Additionally, we promote further sparsity in the covariance matrix by using graphical horseshoe prior or diagonal structure. We conduct multiple simulations to compare the performance of the proposed method with that of other frequently used CCA procedures, and we apply the developed procedures to analyze multi-omics data arising from a breast cancer study.

artificial intelligence, bioinformatics, machine learning, (16 more...)

2310.19621

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Ohio (0.04)
North America > United States > New Jersey > Union County > Summit (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
(4 more...)

Zhao, Zheng, Mair, Sebastian, Schön, Thomas B., Sjölund, Jens

On Feynman--Kac training of partial Bayesian neural networks

Recently, partial Bayesian neural networks (pBNNs), which only consider a subset of the parameters to be stochastic, were shown to perform competitively with full Bayesian neural networks. However, pBNNs are often multi-modal in the latent-variable space and thus challenging to approximate with parametric models. To address this problem, we propose an efficient sampling-based training strategy, wherein the training of a pBNN is formulated as simulating a Feynman--Kac model. We then describe variations of sequential Monte Carlo samplers that allow us to simultaneously estimate the parameters and the latent posterior distribution of this model at a tractable computational cost. We show on various synthetic and real-world datasets that our proposed training scheme outperforms the state of the art in terms of predictive performance.

artificial intelligence, machine learning, posterior distribution, (19 more...)

2310.19608

Country:

Europe > Sweden > Östergötland County > Linköping (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland (0.04)
Europe > Sweden > Uppsala County > Uppsala (0.04)

Genre: Research Report (0.50)

Industry: Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Kumari, Chandrani, Siddharthan, Rahul

MMM and MMMSynth: Clustering of heterogeneous tabular data, and synthetic data generation

We provide new algorithms for two tasks relating to heterogeneous tabular datasets: clustering, and synthetic data generation. Tabular datasets typically consist of heterogeneous data types (numerical, ordinal, categorical) in columns, but may also have hidden cluster structure in their rows: for example, they may be drawn from heterogeneous (geographical, socioeconomic, methodological) sources, such that the outcome variable they describe (such as the presence of a disease) may depend not only on the other variables but on the cluster context. Moreover, sharing of biomedical data is often hindered by patient confidentiality laws, and there is current interest in algorithms to generate synthetic tabular data from real data, for example via deep learning. We demonstrate a novel EM-based clustering algorithm, MMM (``Madras Mixture Model''), that outperforms standard algorithms in determining clusters in synthetic heterogeneous data, and recovers structure in real data. Based on this, we demonstrate a synthetic tabular data generation algorithm, MMMsynth, that pre-clusters the input data, and generates cluster-wise synthetic data assuming cluster-specific data distributions for the input columns. We benchmark this algorithm by testing the performance of standard ML algorithms when they are trained on synthetic data and tested on real published datasets. Our synthetic data generation algorithm outperforms other literature tabular-data generators, and approaches the performance of training purely with real data.

artificial intelligence, bayesian inference, machine learning, (17 more...)

2310.19454

Country:

North America > United States > Wisconsin (0.04)
North America > United States > Massachusetts > Middlesex County > Newton (0.04)
Asia > India > Tamil Nadu > Chennai (0.04)
Asia > India > Maharashtra > Mumbai (0.04)

Genre: Research Report (0.65)

Industry: Health & Medicine > Therapeutic Area (0.48)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Nickl, Peter, Xu, Lu, Tailor, Dharmesh, Möllenhoff, Thomas, Khan, Mohammad Emtiyaz

The Memory Perturbation Equation: Understanding Model's Sensitivity to Data

Understanding model's sensitivity to its training data is crucial but can also be challenging and costly, especially during training. To simplify such issues, we present the Memory-Perturbation Equation (MPE) which relates model's sensitivity to perturbation in its training data. Derived using Bayesian principles, the MPE unifies existing sensitivity measures, generalizes them to a wide-variety of models and algorithms, and unravels useful properties regarding sensitivities. Our empirical results show that sensitivity estimates obtained during training can be used to faithfully predict generalization on unseen test data. The proposed equation is expected to be useful for future research on robust and adaptive learning.

artificial intelligence, machine learning, sensitivity, (17 more...)

2310.19273

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Deleu, Tristan, Nishikawa-Toomey, Mizu, Subramanian, Jithendaraa, Malkin, Nikolay, Charlin, Laurent, Bengio, Yoshua

Joint Bayesian Inference of Graphical Structure and Parameters with a Single Generative Flow Network

Generative Flow Networks (GFlowNets), a class of generative models over discrete and structured sample spaces, have been previously applied to the problem of inferring the marginal posterior distribution over the directed acyclic graph (DAG) of a Bayesian Network, given a dataset of observations. Based on recent advances extending this framework to non-discrete sample spaces, we propose in this paper to approximate the joint posterior over not only the structure of a Bayesian Network, but also the parameters of its conditional probability distributions. We use a single GFlowNet whose sampling policy follows a two-phase process: the DAG is first generated sequentially one edge at a time, and then the corresponding parameters are picked once the full structure is known. Since the parameters are included in the posterior distribution, this leaves more flexibility for the local probability models of the Bayesian Network, making our approach applicable even to non-linear models parametrized by neural networks. We show that our method, called JSP-GFN, offers an accurate approximation of the joint posterior, while comparing favorably against existing methods on both simulated and real data.

artificial intelligence, bayesian inference, machine learning, (17 more...)

2305.19366

Country:

North America > Canada > Quebec > Montreal (0.14)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
Asia > China > Ningxia Hui Autonomous Region > Yinchuan (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Straub, Dominik, Schultheis, Matthias, Koeppl, Heinz, Rothkopf, Constantin A.

Probabilistic inverse optimal control for non-linear partially observable systems disentangles perceptual uncertainty and behavioral costs

Inverse optimal control can be used to characterize behavior in sequential decision-making tasks. Most existing work, however, is limited to fully observable or linear systems, or requires the action signals to be known. Here, we introduce a probabilistic approach to inverse optimal control for partially observable stochastic non-linear systems with unobserved action signals, which unifies previous approaches to inverse optimal control with maximum causal entropy formulations. Using an explicit model of the noise characteristics of the sensory and motor systems of the agent in conjunction with local linearization techniques, we derive an approximate likelihood function for the model parameters, which can be computed within a single forward pass. We present quantitative evaluations on stochastic and partially observable versions of two classic control tasks and two human behavioral tasks. Importantly, we show that our method can disentangle perceptual factors and behavioral costs despite the fact that epistemic and pragmatic actions are intertwined in sequential decision-making under uncertainty, such as in active sensing and active learning. The proposed method has broad applicability, ranging from imitation learning to sensorimotor neuroscience.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2303.16698

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.05)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(3 more...)