Goto

Collaborating Authors

 amortization





StochasticAmortization

Neural Information Processing Systems

We therefore explore training amortized models with noisy labels, and we find that this is inexpensive and surprisingly effective.


Generative Bayesian Hyperparameter Tuning

Lopes, Hedibert, Polson, Nick, Sokolov, Vadim

arXiv.org Machine Learning

\noindent Hyper-parameter selection is a central practical problem in modern machine learning, governing regularization strength, model capacity, and robustness choices. Cross-validation is often computationally prohibitive at scale, while fully Bayesian hyper-parameter learning can be difficult due to the cost of posterior sampling. We develop a generative perspective on hyper-parameter tuning that combines two ideas: (i) optimization-based approximations to Bayesian posteriors via randomized, weighted objectives (weighted Bayesian bootstrap), and (ii) amortization of repeated optimization across many hyper-parameter settings by learning a transport map from hyper-parameters (including random weights) to the corresponding optimizer. This yields a ``generator look-up table'' for estimators, enabling rapid evaluation over grids or continuous ranges of hyper-parameters and supporting both predictive tuning objectives and approximate Bayesian uncertainty quantification. We connect this viewpoint to weighted $M$-estimation, envelope/auxiliary-variable representations that reduce non-quadratic losses to weighted least squares, and recent generative samplers for weighted $M$-estimators.



Iterative Amortized Inference: Unifying In-Context Learning and Learned Optimizers

Mittal, Sarthak, Mahajan, Divyat, Lajoie, Guillaume, Pezeshki, Mohammad

arXiv.org Artificial Intelligence

Modern learning systems increasingly rely on amortized learning - the idea of reusing computation or inductive biases shared across tasks to enable rapid generalization to novel problems. This principle spans a range of approaches, including meta-learning, in-context learning, prompt tuning, learned optimizers and more. While motivated by similar goals, these approaches differ in how they encode and leverage task-specific information, often provided as in-context examples. In this work, we propose a unified framework which describes how such methods differ primarily in the aspects of learning they amortize - such as initializations, learned updates, or predictive mappings - and how they incorporate task data at inference. We introduce a taxonomy that categorizes amortized models into parametric, implicit, and explicit regimes, based on whether task adaptation is externalized, internalized, or jointly modeled. Building on this view, we identify a key limitation in current approaches: most methods struggle to scale to large datasets because their capacity to process task data at inference (e.g., context length) is often limited. To address this, we propose iterative amortized inference, a class of models that refine solutions step-by-step over mini-batches, drawing inspiration from stochastic optimization. Our formulation bridges optimization-based meta-learning with forward-pass amortization in models like LLMs, offering a scalable and extensible foundation for general-purpose task adaptation.




On the scalability of MSC V ariational inference based on KL(q||p) is scalable in the sense that it works by subsam-4 pling datasets both for exchangeable data, p (x

Neural Information Processing Systems

We thank the reviewers for the constructive feedback, which will significantly improve the paper. We elaborate on this first and address specific comments and questions from the reviewers below. RWS, etc.) applications assumes the data is generated iid and and achieve scalability through use of subsampling and The current discussion in Section 3.5 for MSC on the other hand focuses on the more challenging case, We will clarify this in the revision. We compare the base versions of the respective algorithms. We will add these references to the related work section.