Partially observable Markov decision processes (POMDPs) provide a principled framework for sequential planning in uncertain single agent settings. An extension of POMDPs to multiagent settings, called interactive POMDPs (I-POMDPs), replaces POMDP belief spaces with interactive hierarchical belief systems which represent an agent's belief about the physical world, about beliefs of other agents, and about their beliefs about others' beliefs. This modification makes the difficulties of obtaining solutions due to complexity of the belief and policy spaces even more acute. We describe a general method for obtaining approximate solutions of I-POMDPs based on particle filtering (PF). We introduce the interactive PF, which descends the levels of the interactive belief hierarchies and samples and propagates beliefs at each level. The interactive PF is able to mitigate the belief space complexity, but it does not address the policy space complexity. To mitigate the policy space complexity -- sometimes also called the curse of history -- we utilize a complementary method based on sampling likely observations while building the look ahead reachability tree. While this approach does not completely address the curse of history, it beats back the curse's impact substantially. We provide experimental results and chart future work.
One issue might be that many people have moved to ALE & OpenAI's Gym interface for API/environment implementations, and Python for implementation language. Your C library makes Python sound like a very second-class citizen, which is discouraging, and C is increasingly disfavored for its complexity & low-level nature. Just to get started with this, one has to learn the'Cassandra POMDP format', whatever that is, and then deal with C rather than Python. Are there that many people who want to solve MDPs in a tabular form whose preferred language is C and love defining their models in Cassandra POMDP format? You also don't have any impressive use-cases or demos of things which one can do easily in AIToolbox which can't be done elsewhere as easily, or as fast, or at all - what gives me any confidence that this is really mature and I won't simply invest days into learning it only to discover some severe limitation which makes it useless for me?
We propose Symbolic heuristic search value iteration (Symbolic HSVI) algorithm, which extends the heuristic search value iteration (HSVI) algorithm in order to handle factored partially observable Markov decision processes (factored POMDPs). The idea is to use algebraic decision diagrams (ADDs) for compactly representing the problem itself and all the relevant intermediate computation results in the algorithm. We leverage Symbolic Perseus for computing the lower bound of the optimal value function using ADD operators, and provide a novel ADDbased procedure for computing the upper bound. Experiments on a number of standard factored POMDP problems show that we can achieve an order of magnitude improvement in performance over previously proposed algorithms. Partially observable Markov decision processes (POMDPs) are widely used for modeling stochastic sequential decision problems with noisy observations.