Functional Natural Policy Gradients

Bibaut, Aurelien, Zenati, Houssam, Rahier, Thibaud, Kallus, Nathan

Apr-6-2026–arXiv.org Machine Learning

Personalized decision policies are increasingly central in areas such as healthcare [Bertsimas et al., 2017], education[Mandeletal.,2014], andpublicpolicy[Kubeetal.,2019], wheretailoringactions to individual characteristics can improve outcomes. In many of these settings, however, actively experimenting with new policies to generate "online data" is expensive, risky, or infeasible, which motivates methods that can evaluate and optimize policies using pre-existing "offline data." A variety of work studies semiparametric efficient estimation of the value of a fixed policy from offline data [Chernozhukov et al., 2018, Dud ık et al., 2011, Jiang and Li, 2016, Kallus and Uehara, 2020, 2022, Kallus et al., 2022, Scharfstein et al., 1999]. And, a variety of work considers selecting the policy that optimizes such estimates over policies in a given class [Athey and Wager, 2021, Chernozhukov et al., 2019, Foster and Syrgkanis, 2023, Kallus, 2021, Zhang et al., 2013, Zhou et al., 2023], which generally yields rates the scale with policy class complexity, e.g., OP(N 1/2) for VC classes. Luedtke and Chambaz [2020] get regret acceleration to oP(N 1/2) by leveraging an equicontinuity argument.

artificial intelligence, gradient, machine learning, (13 more...)

arXiv.org Machine Learning

Apr-6-2026

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.50)

Industry:
- Health & Medicine (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found