Independent Additive Heuristics Reduce Search Multiplicatively

AAAI Conferences

This paper analyzes the performance of IDA* using additive heuristics. We show that the reduction in the number of nodes expanded using multiple independent additive heuristics is the product of the reductions achieved by the individual heuristics. First, we formally state and prove this result on unit edge-cost undirected graphs with a uniform branching factor. Then, we empirically verify it on a model of the 4-peg Towers of Hanoi problem. We also run experiments on the multiple sequence alignment problem showing more general applicability to non-unit edge-cost directed graphs. Then, we extend an existing model to predict the performance of IDA* with a single pattern database to independent additive disjoint pattern databases. This is the first analysis of the performance of independent additive heuristics.


PDDL+ Planning with Temporal Pattern Databases

AAAI Conferences

The introduction of PDDL+ allowed more accurate representations of complex real-world problems of interest to the scientific community. However, PDDL+ problems are notoriously challenging to planners, requiring more advanced heuristics. We introduce the Temporal Pattern Database (TPDB), a new domain-independent heuristic technique designed for PDDL+ domains with mixed discrete/continuous behaviour, non-linear system dynamics, processes, and events. The pattern in the TPDB is obtained through an abstraction based on time and state discretisation. Our approach combines constraint relaxation and abstraction techniques, and uses solutions to the relaxed problem, as a guide to solving the concrete problem with a discretisation fine enough to satisfy the continuous model's constraints.


An Improved Search Algorithm for Optimal Multiple-Sequence Alignment

AAAI Conferences

Multiple sequence alignment (MSA) is a ubiquitous problem in computational biology. Although it is NPhard to find an optimal solution for an arbitrary number of sequences, due to the importance of this problem researchers are trying to push the limits of exact algorithms further. Since MSA can be cast as a classical path finding problem, it is attracting a growing number of AI researchers interested in heuristic search algorithms as a challenge with actual practical relevance.


An Improved Search Algorithm for Optimal Multiple-Sequence Alignment

Journal of Artificial Intelligence Research

Multiple sequence alignment (MSA) is a ubiquitous problem in computational biology. Although it is NP-hard to find an optimal solution for an arbitrary number of sequences, due to the importance of this problem researchers are trying to push the limits of exact algorithms further. Since MSA can be cast as a classical path finding problem, it is attracting a growing number of AI researchers interested in heuristic search algorithms as a challenge with actual practical relevance. In this paper, we first review two previous, complementary lines of research. Based on Hirschberg's algorithm, Dynamic Programming needs O(kN^(k-1)) space to store both the search frontier and the nodes needed to reconstruct the solution path, for k sequences of length N. Best first search, on the other hand, has the advantage of bounding the search space that has to be explored using a heuristic. However, it is necessary to maintain all explored nodes up to the final solution in order to prevent the search from re-expanding them at higher cost. Earlier approaches to reduce the Closed list are either incompatible with pruning methods for the Open list, or must retain at least the boundary of the Closed list. In this article, we present an algorithm that attempts at combining the respective advantages; like A* it uses a heuristic for pruning the search space, but reduces both the maximum Open and Closed size to O(kN^(k-1)), as in Dynamic Programming. The underlying idea is to conduct a series of searches with successively increasing upper bounds, but using the DP ordering as the key for the Open priority queue. With a suitable choice of thresholds, in practice, a running time below four times that of A* can be expected. In our experiments we show that our algorithm outperforms one of the currently most successful algorithms for optimal multiple sequence alignments, Partial Expansion A*, both in time and memory. Moreover, we apply a refined heuristic based on optimal alignments not only of pairs of sequences, but of larger subsets. This idea is not new; however, to make it practically relevant we show that it is equally important to bound the heuristic computation appropriately, or the overhead can obliterate any possible gain. Furthermore, we discuss a number of improvements in time and space efficiency with regard to practical implementations. Our algorithm, used in conjunction with higher-dimensional heuristics, is able to calculate for the first time the optimal alignment for almost all of the problems in Reference 1 of the benchmark database BAliBASE.


Learning Sparse Neural Networks through $L_0$ Regularization

arXiv.org Machine Learning

We propose a practical method for $L_0$ norm regularization for neural networks: pruning the network during training by encouraging weights to become exactly zero. Such regularization is interesting since (1) it can greatly speed up training and inference, and (2) it can improve generalization. AIC and BIC, well-known model selection criteria, are special cases of $L_0$ regularization. However, since the $L_0$ norm of weights is non-differentiable, we cannot incorporate it directly as a regularization term in the objective function. We propose a solution through the inclusion of a collection of non-negative stochastic gates, which collectively determine which weights to set to zero. We show that, somewhat surprisingly, for certain distributions over the gates, the expected $L_0$ norm of the resulting gated weights is differentiable with respect to the distribution parameters. We further propose the \emph{hard concrete} distribution for the gates, which is obtained by "stretching" a binary concrete distribution and then transforming its samples with a hard-sigmoid. The parameters of the distribution over the gates can then be jointly optimized with the original network parameters. As a result our method allows for straightforward and efficient learning of model structures with stochastic gradient descent and allows for conditional computation in a principled way. We perform various experiments to demonstrate the effectiveness of the resulting approach and regularizer.