AITopics | reward maximization

Collaborating Authors

reward maximization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Reward-Instruct: AReward-Centric Approach to Fast Photo-Realistic Image Generation

Neural Information Processing SystemsJun-23-2026, 02:41:02 GMT

This paper addresses the challenge of achieving high-quality and fast image generation that aligns with complex human preferences. While recent advancements in diffusion models and distillation have enabled rapid generation, the effective integration of reward feedback for improved abilities like controllability and preference alignment remains a key open problem. Existing reward-guided post-training approaches targeting accelerated few-step generation often deem diffusion distillation losses indispensable. However, in this paper, we identify an interesting yet fundamental paradigm shift: as conditions become more specific, well-designed reward functions emerge as the primary driving force in training strong, few-step image generative models. Motivated by this insight, we introduce Reward-Instruct, a novel and surprisingly simple reward-centric approach for converting pre-trained base diffusion models into reward-enhanced few-step generators. Unlike existing methods, Reward-Instruct does not rely on expensive yet tricky diffusion distillation losses.

artificial intelligence, arxiv preprint arxiv, machine learning, (16 more...)

Neural Information Processing Systems

Country: Asia (0.46)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Uncertainty Estimation for Safety-critical Scene Segmentation via Fine-grained Reward Maximization

Neural Information Processing SystemsApr-28-2026, 14:26:18 GMT

Uncertainty estimation plays an important role for future reliable deployment of deep segmentation models in safety-critical scenarios such as medical applications. However, existing methods for uncertainty estimation have been limited by the lack of explicit guidance for calibrating the prediction risk and model confidence. In this work, we propose a novel fine-grained reward maximization (FGRM) framework, to address uncertainty estimation by directly utilizing an uncertainty metric related reward function with a reinforcement learning based model tuning algorithm. This would benefit the model uncertainty estimation through direct optimization guidance for model calibration. Specifically, our method designs a new uncertainty estimation reward function using the calibration metric, which is maximized to fine-tune an evidential learning pre-trained segmentation model for calibrating prediction risk.

machine learning, reinforcement learning, uncertainty estimation, (15 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Industry:

Health & Medicine > Diagnostic Medicine (0.47)
Health & Medicine > Surgery (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

A Appendix

Neural Information Processing SystemsFeb-14-2026, 06:22:29 GMT

For out of distribution (OOD) inference, it is desired that the model can assign high epistemic uncertainty to the OOD regions compared to their ID counterparts. A.2 Policy Gradient based Reward Maximization for Segmentation Backbone This approach enables us to efficiently achieve the optimal solution for reward maximization. We present some examples of generated OOD examples in Figure 1(a). The results are presented in Figure 1(b)-(d). In Table 1, we present the results of our uncertainty estimation framework when applied to the Cityscapes dataset.

artificial intelligence, dataset, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.05)
Asia > Middle East > Israel (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.49)

Add feedback

71ec377d5df1fc61ee7770857820519b-Paper-Conference.pdf

Neural Information Processing SystemsFeb-14-2026, 06:22:26 GMT

machine learning, reinforcement learning, uncertainty estimation, (16 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
(2 more...)

Genre: Research Report (0.46)

Industry:

Health & Medicine > Diagnostic Medicine (0.47)
Health & Medicine > Surgery (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)

Add feedback

A Unified Bellman Optimality Principle Combining Reward Maximization and Empowerment

Felix Leibfried, Sergio Pascual-Díaz, Jordi Grau-Moya

Neural Information Processing SystemsFeb-11-2026, 12:55:22 GMT

Neural Information Processing Systems http://nips.cc/

empowerment, equation, proceedings, (12 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts (0.04)
North America > Canada > Manitoba (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.98)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

9ec0cfdc84044494e10582436e013e64-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 14:05:30 GMT

allocation, controller, deadline, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Ohio > Franklin County > Columbus (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(3 more...)

Industry:

Health & Medicine (0.47)
Education (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

284afdc2309f9667d2d4fb9290235b0c-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 00:14:53 GMT

Theseoutcome-conditioned imitationlearningmethodsare appealing because of their simplicity, strong performance, and close ties with supervisedlearning.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

Flow Density Control: Generative Optimization Beyond Entropy-Regularized Fine-Tuning

De Santi, Riccardo, Vlastelica, Marin, Hsieh, Ya-Ping, Shen, Zebang, He, Niao, Krause, Andreas

arXiv.org Artificial IntelligenceDec-1-2025

Adapting large-scale foundation flow and diffusion generative models to optimize task-specific objectives while preserving prior information is crucial for real-world applications such as molecular design, protein docking, and creative image generation. Existing principled fine-tuning methods aim to maximize the expected reward of generated samples, while retaining knowledge from the pre-trained model via KL-divergence regularization. In this work, we tackle the significantly more general problem of optimizing general utilities beyond average rewards, including risk-averse and novelty-seeking reward maximization, diversity measures for exploration, and experiment design objectives among others. Likewise, we consider more general ways to preserve prior information beyond KL-divergence, such as optimal transport distances and Renyi divergences. To this end, we introduce Flow Density Control (FDC), a simple algorithm that reduces this complex problem to a specific sequence of simpler fine-tuning tasks, each solvable via scalable established methods. We derive convergence guarantees for the proposed scheme under realistic assumptions by leveraging recent understanding of mirror flows. Finally, we validate our method on illustrative settings, text-to-image, and molecular design tasks, showing that it can steer pre-trained generative models to optimize objectives and solve practically relevant tasks beyond the reach of current fine-tuning schemes.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2511.2264

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

Add feedback

A Probabilistic Model of Social Decision Making based on Reward Maximization

Neural Information Processing SystemsNov-21-2025, 14:22:28 GMT

A fundamental problem in cognitive neuroscience is how humans make decisions, act, and behave in relation to other humans. Here we adopt the hypothesis that when we are in an interactive social setting, our brains perform Bayesian inference of the intentions and cooperativeness of others using probabilistic representations. We employ the framework of partially observable Markov decision processes (POMDPs) to model human decision making in a social context, focusing specifically on the volunteer's dilemma in a version of the classic Public Goods Game. We show that the POMDP model explains both the behavior of subjects as well as neural activity recorded using fMRI during the game. The decisions of subjects can be modeled across all trials using two interpretable parameters. Furthermore, the expected reward predicted by the model for each subject was correlated with the activation of brain areas related to reward expectation in social interactions. Our results suggest a probabilistic basis for human social decision making within the framework of expected reward maximization.

name change, probabilistic model, social decision, (3 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.60)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.60)

Technology: