reward maximization
A Appendix
For out of distribution (OOD) inference, it is desired that the model can assign high epistemic uncertainty to the OOD regions compared to their ID counterparts. A.2 Policy Gradient based Reward Maximization for Segmentation Backbone This approach enables us to efficiently achieve the optimal solution for reward maximization. We present some examples of generated OOD examples in Figure 1(a). The results are presented in Figure 1(b)-(d). In Table 1, we present the results of our uncertainty estimation framework when applied to the Cityscapes dataset.
- Asia > Singapore (0.05)
- Asia > Middle East > Israel (0.05)
- Asia > Singapore (0.04)
- Asia > China > Hong Kong (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- (2 more...)
- Health & Medicine > Diagnostic Medicine (0.47)
- Health & Medicine > Surgery (0.46)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Flow Density Control: Generative Optimization Beyond Entropy-Regularized Fine-Tuning
De Santi, Riccardo, Vlastelica, Marin, Hsieh, Ya-Ping, Shen, Zebang, He, Niao, Krause, Andreas
Adapting large-scale foundation flow and diffusion generative models to optimize task-specific objectives while preserving prior information is crucial for real-world applications such as molecular design, protein docking, and creative image generation. Existing principled fine-tuning methods aim to maximize the expected reward of generated samples, while retaining knowledge from the pre-trained model via KL-divergence regularization. In this work, we tackle the significantly more general problem of optimizing general utilities beyond average rewards, including risk-averse and novelty-seeking reward maximization, diversity measures for exploration, and experiment design objectives among others. Likewise, we consider more general ways to preserve prior information beyond KL-divergence, such as optimal transport distances and Renyi divergences. To this end, we introduce Flow Density Control (FDC), a simple algorithm that reduces this complex problem to a specific sequence of simpler fine-tuning tasks, each solvable via scalable established methods. We derive convergence guarantees for the proposed scheme under realistic assumptions by leveraging recent understanding of mirror flows. Finally, we validate our method on illustrative settings, text-to-image, and molecular design tasks, showing that it can steer pre-trained generative models to optimize objectives and solve practically relevant tasks beyond the reach of current fine-tuning schemes.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Vision (0.87)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts (0.04)
- North America > Canada > Manitoba (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.98)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Ohio > Franklin County > Columbus (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (3 more...)
- Health & Medicine (0.47)
- Education (0.31)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts (0.04)
- North America > Canada > Manitoba (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.98)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Asia > Singapore (0.05)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.05)
- Asia > Singapore (0.04)
- Asia > China > Hong Kong (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- (2 more...)
- Health & Medicine > Diagnostic Medicine (0.47)
- Health & Medicine > Surgery (0.46)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)