Goto

Collaborating Authors

 optimal design




Optimal Treatment Allocation for Efficient Policy Evaluation in Sequential Decision Making Ting Li

Neural Information Processing Systems

A/B testing is critical for modern technological companies to evaluate the effectiveness of newly developed products against standard baselines. This paper studies optimal designs that aim to maximize the amount of information obtained from online experiments to estimate treatment effects accurately.


Neural Optimal Design of Experiment for Inverse Problems

Darges, John E., Afkham, Babak Maboudi, Chung, Matthias

arXiv.org Machine Learning

We introduce Neural Optimal Design of Experiments, a learning-based framework for optimal experimental design in inverse problems that avoids classical bilevel optimization and indirect sparsity regularization. NODE jointly trains a neural reconstruction model and a fixed-budget set of continuous design variables representing sensor locations, sampling times, or measurement angles, within a single optimization loop. By optimizing measurement locations directly rather than weighting a dense grid of candidates, the proposed approach enforces sparsity by design, eliminates the need for l1 tuning, and substantially reduces computational complexity. We validate NODE on an analytically tractable exponential growth benchmark, on MNIST image sampling, and illustrate its effectiveness on a real world sparse view X ray CT example. In all cases, NODE outperforms baseline approaches, demonstrating improved reconstruction accuracy and task-specific performance.


Fusing Foveal Fixations Using Linear Retinal Transformations and Bayesian Experimental Design

Williams, Christopher K. I.

arXiv.org Artificial Intelligence

Humans (and many vertebrates) face the problem of fusing together multiple fixations of a scene in order to obtain a representation of the whole, where each fixation uses a high-resolution fovea and decreasing resolution in the periphery. In this paper we explicitly represent the retinal transformation of a fixation as a linear downsampling of a high-resolution latent image of the scene, exploiting the known geometry. This linear transformation allows us to carry out exact inference for the latent variables in factor analysis (FA) and mixtures of FA models of the scene. Further, this allows us to formulate and solve the choice of "where to look next" as a Bayesian experimental design problem using the Expected Information Gain criterion. Experiments on the Frey faces and MNIST datasets demonstrate the effectiveness of our models.





Supplementary Material: Experimental Design for Linear Functionals in Reproducing Kernel Hilbert Spaces A Estimability results

Neural Information Processing Systems

In A.1, we show consequence of Def. 1 which is used in the proofs We can apply Theorem??, to get C We show that our condition in Def. 1 and Pukelsheims and estimability This definition is sometimes used as restatement of the estimability property. Definition 4 (Projected data) . Lemma 2. The assumption in Definition 4 implies the assumption in Definition 1 with This section includes proofs for the concentration results presented in the main text. Z is as in Def. 2 where X The term above is so called self-normalized noise, which can be handled by techniques of de la Peña et al. (2009) popularized by Abbasi-Y adkori et al. (2011). From now on the proof is generic.


Efficient Preference-Based Reinforcement Learning: Randomized Exploration Meets Experimental Design

Schlaginhaufen, Andreas, Ouhamma, Reda, Kamgarpour, Maryam

arXiv.org Machine Learning

We study reinforcement learning from human feedback in general Markov decision processes, where agents learn from trajectory-level preference comparisons. A central challenge in this setting is to design algorithms that select informative preference queries to identify the underlying reward while ensuring theoretical guarantees. We propose a meta-algorithm based on randomized exploration, which avoids the computational challenges associated with optimistic approaches and remains tractable. We establish both regret and last-iterate guarantees under mild reinforcement learning oracle assumptions. To improve query complexity, we introduce and analyze an improved algorithm that collects batches of trajectory pairs and applies optimal experimental design to select informative comparison queries. The batch structure also enables parallelization of preference queries, which is relevant in practical deployment as feedback can be gathered concurrently. Empirical evaluation confirms that the proposed method is competitive with reward-based reinforcement learning while requiring a small number of preference queries.