Test-Time Learning and Inference-Time Deliberation for Efficiency-First Offline Reinforcement Learning in Care Coordination and Population Health Management
Basu, Sanjay, Patel, Sadiq Y., Sheth, Parth, Muralidharan, Bhairavi, Elamaran, Namrata, Kinra, Aakriti, Batniji, Rajaie
–arXiv.org Artificial Intelligence
Care coordination and population health management (PHM) are core functions of health systems and community partners, impacting large numbers of Americans enrolled in Medicaid and other safety-net programs. These efforts aim to proactively identify needs, prioritize outreach, and escalate appropriately, all within finite staffing and budget constraints. While outreach modalities (text, phone, video, in-person) carry low clinical risk, their time and opportunity costs vary significantly, making efficiency a primary design goal. In practice, the central operational question is when to deploy expensive in-person outreach versus efficient virtual modalities to maximize value and equity under capacity constraints. These decisions must be made in strictly offline settings, where policies are learned from logged data without exploration at deployment [1]. Classical approaches include constrained Markov decision processes [2], risk-sensitive objectives, and conservative offline RL (e.g., CQL/IQL) [3, 4]. Conformal prediction can provide calibrated error control [5, 6]; ensembles provide practical uncertainty quantification [7]; and decision-time computation is common in control [8]. In health services research and health economic evaluation, cost-effectiveness and cost-benefit analyses (CEA/CBA) guide program-level choices [9-12], but they are not designed for per-patient, per-decision recommendations that adapt to granular state features and logged behavior constraints. 1
arXiv.org Artificial Intelligence
Sep-23-2025
- Country:
- North America > United States > California > San Francisco County > San Francisco (0.29)
- Genre:
- Research Report (1.00)