Sample Complexity Bounds for Iterative Stochastic Policy Optimization
–Neural Information Processing Systems
This paper is concerned with robustness analysis of decision making under uncertainty. We consider a class of iterative stochastic policy optimization problems and analyze the resulting expected performance for each newly updated policy at each iteration. In particular, we employ concentration-of-measure inequalities to compute future expected cost and probability of constraint violation using empirical runs. A novel inequality bound is derived that accounts for the possibly unbounded change-of-measure likelihood ratio resulting from iterative policy adaptation. The bound serves as a high-confidence certificate for providing future performance or safety guarantees. The approach is illustrated with a simple robot control scenario and initial steps towards applications to challenging aerial vehicle navigation problems are presented.
Neural Information Processing Systems
Oct-2-2025, 11:23:41 GMT
- Country:
- Europe > United Kingdom
- England > Oxfordshire > Oxford (0.04)
- North America > United States
- Europe > United Kingdom
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Representation & Reasoning
- Optimization (0.89)
- Uncertainty (0.66)
- Robots (1.00)
- Information Technology > Artificial Intelligence