Goto

Collaborating Authors

 horizon problem


Automate 2023 recap and the receding horizon problem

Robohub

"Thirty million developers" are the answer to driving billion-dollar robot startups, exclaimed Eliot Horowitz of Viam last week at Automate. The hushed crowd of about 200 hardware entrepreneurs listened intensely to MongoDB's founder and former CTO (a $20Bn success story). Now, Horowitz aims to take the same approach that he took to democratizing cloud data applications to mechatronics. As I nudged him with questions about how his new platform will speed complex robot deployments to market, he shared his vision of the Viam developer army (currently 1,000 strong) creating applications that can be seamlessly downloaded on the fly to any system and workflow. Unlike RoS which is primarily targeted to the current community of roboticists, Viam is luring the engineers that birthed ChatGPT to revolutionize uncrewed systems with new mechanical tasks addressing everyday needs.


Linear programming-based solution methods for constrained partially observable Markov decision processes

Helmeczi, Robert K., Kavaklioglu, Can, Cevik, Mucahit

arXiv.org Artificial Intelligence

Constrained partially observable Markov decision processes (CPOMDPs) have been used to model various real-world phenomena. However, they are notoriously difficult to solve to optimality, and there exist only a few approximation methods for obtaining high-quality solutions. In this study, grid-based approximations are used in combination with linear programming (LP) models to generate approximate policies for CPOMDPs. A detailed numerical study is conducted with six CPOMDP problem instances considering both their finite and infinite horizon formulations. The quality of approximation algorithms for solving unconstrained POMDP problems is established through a comparative analysis with exact solution methods. Then, the performance of the LP-based CPOMDP solution approaches for varying budget levels is evaluated. Finally, the flexibility of LP-based approaches is demonstrated by applying deterministic policy constraints, and a detailed investigation into their impact on rewards and CPU run time is provided. For most of the finite horizon problems, deterministic policy constraints are found to have little impact on expected reward, but they introduce a significant increase to CPU run time. For infinite horizon problems, the reverse is observed: deterministic policies tend to yield lower expected total rewards than their stochastic counterparts, but the impact of deterministic constraints on CPU run time is negligible in this case. Overall, these results demonstrate that LP models can effectively generate approximate policies for both finite and infinite horizon problems while providing the flexibility to incorporate various additional constraints into the underlying model.


Global Optimality Guarantees For Policy Gradient Methods

Bhandari, Jalaj, Russo, Daniel

arXiv.org Machine Learning

Policy gradients methods are perhaps the most widely used class of reinforcement learning algorithms. These methods apply to complex, poorly understood, control problems by performing stochastic gradient descent over a parameterized class of polices. Unfortunately, even for simple control problems solvable by classical techniques, policy gradient algorithms face non-convex optimization problems and are widely understood to converge only to local minima. This work identifies structural properties -- shared by finite MDPs and several classic control problems -- which guarantee that policy gradient objective function has no suboptimal local minima despite being non-convex. When these assumptions are relaxed, our work gives conditions under which any local minimum is near-optimal, where the error bound depends on a notion of the expressive capacity of the policy class.


EM Algorithm and Stochastic Control in Economics

Kou, Steven, Peng, Xianhua, Xu, Xingbo

arXiv.org Machine Learning

Generalising the idea of the classical EM algorithm that is widely used for computing maximum likelihood estimates, we propose an EM-Control (EM-C) algorithm for solving multi-period finite time horizon stochastic control problems. The new algorithm sequentially updates the control policies in each time period using Monte Carlo simulation in a forward-backward manner; in other words, the algorithm goes forward in simulation and backward in optimization in each iteration. Similar to the EM algorithm, the EM-C algorithm has the monotonicity of performance improvement in each iteration, leading to good convergence properties. We demonstrate the effectiveness of the algorithm by solving stochastic control problems in the monopoly pricing of perishable assets and in the study of real business cycle.


Efficient Inference in Markov Control Problems

Furmston, Thomas, Barber, David

arXiv.org Artificial Intelligence

Efficient Inference in Markov Control ProblemsThomas Furmston Computer Science Department University College London London, WC1E 6BT David Barber Computer Science Department University College London London, WC1E 6BT Abstract Markov control algorithms that perform smooth, non-greedy updates of the policy have been shown to be very general and versatile, with policy gradient and Expectation Maximisation algorithms being particularly popular. For these algorithms, marginal inference of the reward weighted trajectory distribution is required to perform policy updates. We discuss a new exact inference algorithm for these marginals in the finite horizon case that is more efficient than the standard approach based on classical forward-backward recursions. We also provide a principled extension to infinite horizon Markov Decision Problems that explicitly accounts for an infinite horizon. This extension provides a novel algorithm for both policy gradients and Expectation Maximisation in infinite horizon problems. The state and action spaces can be either discrete or continuous. For a discount factorγ the reward is defined as R t(s t,a t) γ t 1 R (s t,a t) for a stationary reward R (s t,a t), whereγ [0, 1).