Goto

Collaborating Authors

 Hatem, Matthew


Solving Large Problems with Heuristic Search: General-Purpose Parallel External-Memory Search

Journal of Artificial Intelligence Research

Classic best-first heuristic search algorithms, like A*, record every unique state they encounter in RAM, making them infeasible for solving large problems. In this paper, we demonstrate how best-first search can be scaled to solve much larger problems by exploiting disk storage and parallel processing and, in some cases, slightly relaxing the strict best-first node expansion order. Some previous disk-based search algorithms abandon best-first search order in an attempt to increase efficiency. We present two case studies showing that A*, when augmented with Delayed Duplicate Detection, can actually be more efficient than these non-best-first search orders. First, we present a straightforward external variant of A*, called PEDAL, that slightly relaxes best-first order in order to be I/O efficient in both theory and practice, even on problems featuring real-valued node costs. Because it is easy to parallelize, PEDAL can be faster than in-memory IDA* even on domains with few duplicate states, such as the sliding-tile puzzle. Second, we present a variant of PEDAL, called PE2A*, that uses partial expansion to handle problems that have large branching factors. When tested on the problem of Multiple Sequence Alignment, PE2A* is the first algorithm capable of solving the entire Reference Set 1 of the standard BAliBASE benchmark using a biologically accurate cost function. This work shows that classic best-first algorithms like A* can be applied to large real-world problems. We also provide a detailed implementation guide with source code both for generic parallel disk-based best-first search and for Multiple Sequence Alignment with a biologically accurate cost function. Given its effectiveness as a general-purpose problem-solving method, we hope that this makes parallel and disk-based search accessible to a wider audience.


Recursive Best-First Search with Bounded Overhead

AAAI Conferences

There are two major paradigms for linear-space heuristic search: iterative deepening (IDA*) and recursive best-first search (RBFS). While the node regeneration overhead of IDA* is easily characterized in terms of the heuristic branching factor, the overhead of RBFS depends on how widely the promising nodes are separated in the search tree, and is harder to anticipate. In this paper, we present two simple techniques for improving the performance of RBFS while maintaining its advantages over IDA*. While these techniques work well in practice, they do not provide any theoretical bounds on the amount of regeneration overhead. To this end, we introduce RBFScr, the first method for provably bounding the regeneration overhead of RBFS. We show empirically that this improves its performance in several domains, both for optimal and suboptimal search, and also yields a better linear-space anytime heuristic search. RBFScr is the first linear space best-first search robust enough to solve a variety of domains with varying operator costs.


Simpler Bounded Suboptimal Search

AAAI Conferences

It is commonly appreciated that solving search problems optimally can take too long. Bounded suboptimal search algorithms trade increased solution cost for reduced solving time. Explicit Estimation Search (EES) is a recent state-of-the-art algorithm specifically designed for bounded suboptimal search. Although it tends to expand fewer nodes than alternative algorithms, such as weighted A* (WA*), its per-node expansion overhead is higher, causing it to sometimes take longer. In this paper, we present simplified variants of EES (SEES) and an earlier algorithm, A*epsilon (SA*epsilon), that use different implementations of the same motivating ideas to significantly reduce search overhead and implementation complexity. In an empirical evaluation, we find that SEES, like EES, outperforms classic bounded suboptimal search algorithms, such as WA*, on domains tested where distance-to-go estimates enable better search guidance. We also confirm that, while SEES and SA*epsilon expand roughly the same number of nodes as their progenitors, they solve problems significantly faster and are much easier to implement. This work widens the applicability of state-of the-art bounded suboptimal search by making it easier to deploy.


Heuristic Search for Large Problems with Real Costs

AAAI Conferences

Heuristic search is a fundamental technique for solving problems in artificial intelligence. However, many heuristic search algorithms, such as A* are limited by the amount of main memory available. External memory search overcomes the memory limitation of A* by taking advantage of cheap secondary storage, such as disk. Previous work in this area assumes that edge costs fall within a narrow range of integer values and relies on uniformed search order. The goal of this dissertation research is to develop novel techniques that enable heuristic search algorithms to solve problems with real values using a best-first search order while exploiting external memory and multiple processors. This work will be organized into four components. The first component will discuss external memory search and present a novel technique for incorporating real-valued edge costs. The second component will present a novel algorithm for solving problems with large branching factors with application to the challenging problem of Multiple Sequence Alignment (MSA). The third component will cover bounded suboptimal external search for practical MSA applications. The final component of this research will be the development of a novel distributed search framework; allowing parallel and external memory heuristic search algorithms to run cooperatively on a commodity computing cluster. Together these four components will enable heuristic search to scale to large problems in practical settings while exploiting modern hardware.


External Memory Best-First Search for Multiple Sequence Alignment

AAAI Conferences

Multiple sequence alignment (MSA) is a central problem in computational biology. It is well known that MSA can be formulated as a shortest path problem and solved using heuristic search, but the memory requirement of A* makes it impractical for all but the smallest problems. Partial Expansion A* (PEA*) reduces the space complexity of A* by generating only the most promising successor nodes. However, even PEA* exhausts available memory on many problems. Another alternative is Iterative Deepening Dynamic Programming, which uses an uninformed search order but stores only the nodes along the search frontier. However, it too cannot scale to the largest problems. In this paper, we propose storing nodes on cheap and plentiful secondary storage. We present a new general-purpose algorithm, Parallel External PEA* (\xppea), that combines PEA* with Delayed Duplicate Detection to take advantage of external memory and multiple processors to solve large MSA problems. In our experiments, \xppea\ is the first algorithm capable of solving the entire Reference Set 1 of the standard BAliBASE benchmark using a biologically accurate cost function. This work suggests that external best-first search can effectively use heuristic information to surpass methods that rely on uninformed search orders.


Heuristic Search for Large Problems With Real Costs

AAAI Conferences

The memory requirements of basic best-first heuristic search algorithms like A* make them infeasible for solving large problems. External disk storage is cheap and plentiful com- pared to the cost of internal RAM. Unfortunately, state-of- the-art external memory search algorithms either rely on brute-force search techniques, such as breadth-first search, or they rely on all node values falling in a narrow range of in- tegers, and thus perform poorly on real-world domains with real-valued costs. We present a new general-purpose algo- rithm, PEDAL, that uses external memory and parallelism to perform a best-first heuristic search capable of solving large problems with real costs. We show theoretically that PEDAL is I/O efficient and empirically that it is both better on a stan- dard unit-cost benchmark, surpassing internal IDA* on the 15-puzzle, and gives far superior performance on problems with real costs.