Goto

Collaborating Authors

 best model


Appendix details

Neural Information Processing Systems

A.1 Linear mappings between zand x Usually, we have data x PRNˆD1 and latent representation z PRNˆD2 with N the number of neurons, D1 the dimensionality of the data, D2 the dimensionality of the latent space and, usually, D1 " D2. In cases where a method mdoes only produce some latent representation zm, we fit a reconstruction ˆxm "Wzm with a least squares projection W "pzTmzmq 1zTmx. In cases where a method mdoes only produce some reconstruction ˆxm, we produce a simple latent representation zm by extracting the first D2 columns of the left singular vectors U from the singular value decomposition x"USVT. Both of these projections are fitted on the training data, then fixed and also used on the validation and test data. We used three datasets, where the first two (dataset A [2] n=8417 cells; B [54] n=4600) are two-photon recordings of mouse retinal bipolar cell (BC) responses to the chirp stimuli (local and full-field, see [2] for details).





Appendix A Model details

Neural Information Processing Systems

The red lines in the bottom plot indicate linear fits and the red axis labels show the rank correlation coefficients ρ and p values. The matrix is orthogonal, thus avoiding a singular design. As scGen returns corrected input data, we performed PCA on the output data, which were used for further evaluation (cf. Appendix Section A.1). Here, we used the same number of principle components (PCs) as used for Embedded cells are colored by dataset. In Figure 9, we present the results of the simulation experiments discussed in the main text.




Foraprobabilityspace (ΩBox,E,PBox),withΩBox Rd,theGaussian-boxprocessisgeneratedas µi ΩBox, σi Rd+, r Rd+ Ci N(µi,σi), Xi, =Ci+ri, Xi, =Ci ri, Box(Xi) = dY

Neural Information Processing Systems

All coordinates will be modeled by independent Gumbel distributions, and thus it is enough to calculate the expected side-length of a box as the expected volume will simply be the product of the expected side-lengths. To properly restrict the Gumbel distributions to[0,1], we can either formcensoredortruncated distributions. Thetruncateddistribution,ontheotherhand,multipliesthe densities with the indicator function for[0,1]and renormalizes them to integrate to 1. The higher the temperature of the boxes, the more the true integral will tend to provide larger conditional probabilities. Monte Carlo experiments support this conclusion.


Model Selection for Production System via Automated Online Experiments

Neural Information Processing Systems

A challenge that machine learning practitioners in the industry face is the task of selecting the best model to deploy in production. As a model is often an intermediate component of a production system, online controlled experiments such as A/B tests yield the most reliable estimation of the effectiveness of the whole system, but can only compare two or a few models due to budget constraints. We propose an automated online experimentation mechanism that can efficiently perform model selection from a large pool of models with a small number of online experiments. We derive the probability distribution of the metric of interest that contains the model uncertainty from our Bayesian surrogate model trained using historical logs. Our method efficiently identifies the best model by sequentially selecting and deploying a list of models from the candidate set that balance exploration-exploitation. Using simulations based on real data, we demonstrate the effectiveness of our method on two different tasks.


Scalable branch-and-bound model selection with non-monotonic criteria including AIC, BIC and Mallows's $\mathit{C_p}$

arXiv.org Machine Learning

Model selection is a pivotal process in the quantitative sciences, where researchers must navigate between numerous candidate models of varying complexity. Traditional information criteria, such as the corrected Akaike Information Criterion (AICc), Bayesian Information Criterion (BIC), and Mallows's $\mathit{C_p}$, are valuable tools for identifying optimal models. However, the exponential increase in candidate models with each additional model parameter renders the evaluation of these criteria for all models -- a strategy known as exhaustive, or brute-force, searches -- computationally prohibitive. Consequently, heuristic approaches like stepwise regression are commonly employed, albeit without guarantees of finding the globally-optimal model. In this study, we challenge the prevailing notion that non-monotonicity in information criteria precludes bounds on the search space. We introduce a simple but novel bound that enables the development of branch-and-bound algorithms tailored for these non-monotonic functions. We demonstrate that our approach guarantees identification of the optimal model(s) across diverse model classes, sizes, and applications, often with orders of magnitude computational speedups. For instance, in one previously-published model selection task involving $2^{32}$ (approximately 4 billion) candidate models, our method achieves a computational speedup exceeding 6,000. These findings have broad implications for the scalability and effectiveness of model selection in complex scientific domains.