amortization
- North America > United States > California (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Italy > Marche > Ancona Province > Ancona (0.04)
- Europe > France (0.04)
Generative Bayesian Hyperparameter Tuning
Lopes, Hedibert, Polson, Nick, Sokolov, Vadim
\noindent Hyper-parameter selection is a central practical problem in modern machine learning, governing regularization strength, model capacity, and robustness choices. Cross-validation is often computationally prohibitive at scale, while fully Bayesian hyper-parameter learning can be difficult due to the cost of posterior sampling. We develop a generative perspective on hyper-parameter tuning that combines two ideas: (i) optimization-based approximations to Bayesian posteriors via randomized, weighted objectives (weighted Bayesian bootstrap), and (ii) amortization of repeated optimization across many hyper-parameter settings by learning a transport map from hyper-parameters (including random weights) to the corresponding optimizer. This yields a ``generator look-up table'' for estimators, enabling rapid evaluation over grids or continuous ranges of hyper-parameters and supporting both predictive tuning objectives and approximate Bayesian uncertainty quantification. We connect this viewpoint to weighted $M$-estimation, envelope/auxiliary-variable representations that reduce non-quadratic losses to weighted least squares, and recent generative samplers for weighted $M$-estimators.
- North America > United States > Texas (0.04)
- North America > United States > Alabama (0.04)
- Asia > Middle East > Jordan (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Iterative Amortized Inference: Unifying In-Context Learning and Learned Optimizers
Mittal, Sarthak, Mahajan, Divyat, Lajoie, Guillaume, Pezeshki, Mohammad
Modern learning systems increasingly rely on amortized learning - the idea of reusing computation or inductive biases shared across tasks to enable rapid generalization to novel problems. This principle spans a range of approaches, including meta-learning, in-context learning, prompt tuning, learned optimizers and more. While motivated by similar goals, these approaches differ in how they encode and leverage task-specific information, often provided as in-context examples. In this work, we propose a unified framework which describes how such methods differ primarily in the aspects of learning they amortize - such as initializations, learned updates, or predictive mappings - and how they incorporate task data at inference. We introduce a taxonomy that categorizes amortized models into parametric, implicit, and explicit regimes, based on whether task adaptation is externalized, internalized, or jointly modeled. Building on this view, we identify a key limitation in current approaches: most methods struggle to scale to large datasets because their capacity to process task data at inference (e.g., context length) is often limited. To address this, we propose iterative amortized inference, a class of models that refine solutions step-by-step over mini-batches, drawing inspiration from stochastic optimization. Our formulation bridges optimization-based meta-learning with forward-pass amortization in models like LLMs, offering a scalable and extensible foundation for general-purpose task adaptation.
- North America > United States (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > North Sea > Southern North Sea (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
- Europe (0.45)
- North America > United States (0.28)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Europe > Netherlands > North Holland > Amsterdam (0.05)
- North America > United States (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)
- Information Technology > Software (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
On the scalability of MSC V ariational inference based on KL(q||p) is scalable in the sense that it works by subsam-4 pling datasets both for exchangeable data, p (x
We thank the reviewers for the constructive feedback, which will significantly improve the paper. We elaborate on this first and address specific comments and questions from the reviewers below. RWS, etc.) applications assumes the data is generated iid and and achieve scalability through use of subsampling and The current discussion in Section 3.5 for MSC on the other hand focuses on the more challenging case, We will clarify this in the revision. We compare the base versions of the respective algorithms. We will add these references to the related work section.