Goto

Collaborating Authors

 admissible cost


Recursively-Constrained Partially Observable Markov Decision Processes

arXiv.org Artificial Intelligence

In many problems, it is desirable to optimize an objective function while imposing constraints on some other objectives. A Constrained Partially Observable Markov Decision Process (C-POMDP) allows modeling of such problems under transition uncertainty and partial observability. Typically, the constraints in C-POMDPs enforce a threshold on expected cumulative costs starting from an initial state distribution. In this work, we first show that optimal C-POMDP policies may violate Bellman's principle of optimality and thus may exhibit unintuitive behaviors, which can be undesirable for some (e.g., safety critical) applications. Additionally, online re-planning with C-POMDPs is often ineffective due to the inconsistency resulting from the violation of Bellman's principle of optimality. To address these drawbacks, we introduce a new formulation: the Recursively-Constrained POMDP (RC-POMDP), that imposes additional history-dependent cost constraints on the C-POMDP. We show that, unlike C-POMDPs, RC-POMDPs always have deterministic optimal policies, and that optimal policies obey Bellman's principle of optimality. We also present a point-based dynamic programming algorithm that synthesizes admissible near-optimal policies for RC-POMDPs. Evaluations on a set of benchmark problems demonstrate the efficacy of our algorithm and show that policies for RC-POMDPs produce more desirable behaviors than policies for C-POMDPs.


Generalized active learning and design of statistical experiments for manifold-valued data

arXiv.org Machine Learning

In computer graphics and computer vision, usually either physically inspired analytic reflectance models, like Cook and Torrance (1981) or He et al. (1991), or parametric reflectance models chosen via qualitative criteria, like Phong (1975), or Lafortune et al. (1997), are used to model BRDFs. These BRDF models are only crude approximations of the reflectance of real materials. In multidimensional reflectometry, an alternative approach is usually taken. One directly measures values of the BRDF for different combinations of the incoming and outgoing angles and then fits the measured data to a selected analytic model using optimization techniques. There were numerous efforts to use modern machine learning techniques to construct datadriven BRDF models. Brady et al. (2014) proposed a method to generate new analytical BRDFs using a heuristic distance-based search procedure called Genetic Programming. In Brochu et al. (2008), an active learning algorithm using discrete perceptional data was developed and applied to learning parameters of BRDF models such as the Ashikhmin - Shirley model Ashikhmin and Shirley (2000), while Langovoy et al. (2016) treated active learning for the Cook - Torrance model Cook and Torrance (1981). Analysis of BRDF data with statistical and machine learning methods was discussed in Langovoy (2015b), Langovoy (2015a), Sole et al. (2018), Doctor and Byers (2018).


Point-Based Value Iteration for Constrained POMDPs

AAAI Conferences

Constrained partially observable Markov decision processes (CPOMDPs) extend the standard POMDPs by allowing the specification of constraints on some aspects of the policy in addition to the optimality objective for the value function. CPOMDPs have many practical advantages over standard POMDPs since they naturally model problems involving limited resource or multiple objectives. In this paper, we show that the optimal policies in CPOMDPs can be randomized, and present exact and approximate dynamic programming methods for computing randomized optimal policies. While the exact method requires solving a minimax quadratically constrained program (QCP) in each dynamic programming update, the approximate method utilizes the point-based value update with a linear program (LP). We show that the randomized policies are significantly better than the deterministic ones. We also demonstrate that the approximate point-based method is scalable to solve large problems.