Planning in partially observable environments remains a challenging problem, despite significant recent advances in offline approximation techniques. A few online methods have also been proposed recently, and proven to be remarkably scalable, but without the theoretical guarantees of their offline counterparts. Thus it seems natural to try to unify offline and online techniques, preserving the theoretical properties of the former, and exploiting the scalability of the latter. In this paper, we provide theoretical guarantees on an anytime algorithm for POMDPs which aims to reduce the error made by approximate offline value iteration algorithms through the use of an efficient online searching procedure. The algorithm uses search heuristics based on an error analysis of lookahead search, to guide the online search towards reachable beliefs with the most potential to reduce error. We provide a general theorem showing that these search heuristics are admissible, and lead to complete and epsilon-optimal algorithms. This is, to the best of our knowledge, the strongest theoretical result available for online POMDP solution methods. We also provide empirical evidence showing that our approach is also practical, and can find (provably) near-optimal solutions in reasonable time.
Planning is often not a one-shot task because either the world or the agent's knowledge of the world changes. In this paper, we introduce a new principle that can be used to solve a series of similar search tasks faster with heuristic search methods than running individual searches in isolation, by updating the heuristics over time to make them more informed and thus future searches more focused. This principle is simple and easy to integrate into heuristic search methods, and it is easy to prove the correctness of the resulting heuristic search methods.
We present an intuitive explanation for the limited effectiveness of front-to-end bidirectional heuristic search, supported with extensive evidence from many commonly-studied domains. While previous work has proved the limitations of specific algorithms, we show that any front-to-end bidirectional heuristic search algorithm will likely be dominated by unidirectional heuristic search or bidirectional brute-force search. We also demonstrate a pathological case where bidirectional heuristic search is the dominant algorithm, so a stronger claim cannot be made. Finally, we show that on the four-peg Towers Of Hanoi with arbitrary start and goal states, bidirectional brute-force search outperforms unidirectional heuristic search using pattern-database heuristics.
Influence maximization is a problem to find small sets of highly influential individuals in a social network to maximize the spread of influence under stochastic cascade models of propagation. Although the problem has been well-studied, it is still highly challenging to find solutions of high quality in large-scale networks of the day. While Monte-Carlo-simulation-based methods produce near-optimal solutions with a theoretical guarantee, they are prohibitively slow for large graphs. As a result, many heuristic methods without any theoretical guarantee have been developed, but all of them substantially compromise solution quality. To address this issue, we propose a new method for the influence maximization problem. Unlike other recent heuristic methods, the proposed method is a Monte-Carlo-simulation-based method, and thus it consistently produces solutions of high quality with the theoretical guarantee. On the other hand, unlike other previous Monte-Carlo-simulation-based methods, it runs as fast as other state-of-the-art methods, and can be applied to large networks of the day. Through our extensive experiments, we demonstrate the scalability and the solution quality of the proposed method.
Heuristic search is a central component of many important applications in AI including automated planning. While we can find optimal solutions to heuristic search problems, doing so may take hours or days. For practical applications, this is unacceptably slow, and we must rely on algorithms which find solutions of high, but not optimal, quality or ones which bound the time used directly. In my dissertation, I present and analyze algorithms for the following settings: quality bounded heuristic search and time bounded heuristic search. The central theme of my doctoral work will be that taking advantage of additional information can improve the performance of heuristic search algorithms.