Valenzano, Richard Anthony
Expected Runtime Comparisons Between Breadth-First Search and Constant-Depth Restarting Random Walks
Platnick, Daniel, Valenzano, Richard Anthony
When greedy search algorithms encounter a local minima or plateau, the search typically devolves into a breadth-first search (BrFS), or a local search technique is used in an attempt to find a way out. In this work, we formally analyze the performance of BrFS and constant-depth restarting random walks (RRW) -- two methods often used for finding exits to a plateau/local minima -- to better understand when each is best suited. In particular, we formally derive the expected runtime for BrFS in the case of a uniformly distributed set of goals at a given goal depth. We then prove RRW will be faster than BrFS on trees if there are enough goals at that goal depth. We refer to this threshold as the crossover point. Our bound shows that the crossover point grows linearly with the branching factor of the tree, the goal depth, and the error in the random walk depth, while the size of the tree grows exponentially in branching factor and goal depth. Finally, we discuss the practical implications and applicability of this bound.
GANsemble for Small and Imbalanced Data Sets: A Baseline for Synthetic Microplastics Data
Platnick, Daniel, Khanzadeh, Sourena, Sadeghian, Alireza, Valenzano, Richard Anthony
Microplastic particle ingestion or inhalation by humans is a problem of growing concern. Unfortunately, current research methods that use machine learning to understand their potential harms are obstructed by a lack of available data. Deep learning techniques in particular are challenged by such domains where only small or imbalanced data sets are available. Overcoming this challenge often involves oversampling underrepresented classes or augmenting the existing data to improve model performance. This paper proposes GANsemble: a two-module framework connecting data augmentation with conditional generative adversarial networks (cGANs) to generate class-conditioned synthetic data. First, the data chooser module automates augmentation strategy selection by searching for the best data augmentation strategy. Next, the cGAN module uses this strategy to train a cGAN for generating enhanced synthetic data. We experiment with the GANsemble framework on a small and imbalanced microplastics data set. A Microplastic-cGAN (MPcGAN) algorithm is introduced, and baselines for synthetic microplastics (SYMP) data are established in terms of Frechet Inception Distance (FID) and Inception Scores (IS). We also provide a synthetic microplastics filter (SYMP-Filter) algorithm to increase the quality of generated SYMP. Additionally, we show the best amount of oversampling with augmentation to fix class imbalance in small microplastics data sets. To our knowledge, this study is the first application of generative AI to synthetically create microplastics data.
On the Completeness of Best-First Search Variants That Use Random Exploration
Valenzano, Richard Anthony (University of Toronto) | Xie, Fan (University of Alberta)
While suboptimal best-first search algorithms like Greedy Best-First Search are frequently used when building automated planning systems, their greedy nature can make them susceptible to being easily misled by flawed heuristics. This weakness has motivated the development of best-first search variants like epsilon-greedy node selection, type-based exploration, and diverse best-first search, which all use random exploration to mitigate the impact of heuristic error. In this paper, we provide a theoretical justification for this increased robustness by formally analyzing how these algorithms behave on infinite graphs. In particular, we show that when using these approaches on any infinite graph, the probability of not finding a solution can be made arbitrarily small given enough time. This result is shown to hold for a class of algorithms that includes the three mentioned above, regardless of how misleading the heuristic is.
Worst-Case Solution Quality Analysis When Not Re-Expanding Nodes in Best-First Search
Valenzano, Richard Anthony (University of Alberta) | Sturtevant, Nathan R. (University of Denver) | Schaeffer, Jonathan (University of Alberta)
The use of inconsistent heuristics with A* can result in increased runtime due to the need to re-expand nodes. Poor performance can also be seen with Weighted A* if nodes are re-expanded. While the negative impact of re-expansions can often be minimized by setting these algorithms to never expand nodes more than once, the result can be a lower solution quality. In this paper, we formally show that the loss in solution quality can be bounded based on the amount of inconsistency along optimal solution paths. This bound holds regardless of whether the heuristic is admissible or inadmissible, though if the heuristic is admissible the bound can be used to show that not re-expanding nodes can have at most a quadratic impact on the quality of solutions found when using A*. We then show that the bound is tight by describing a process for the construction of graphs for which a best-first search that does not re-expand nodes will find solutions whose quality is arbitrarily close to that given by the bound. Finally, we will use the bound to extend a known result regarding the solution quality of WA* when weighting a consistent heuristic, so that it also applies to other types of heuristic weighting.
A Comparison of Knowledge-Based GBFS Enhancements and Knowledge-Free Exploration
Valenzano, Richard Anthony (University of Alberta) | Sturtevant, Nathan R. (University of Denver) | Schaeffer, Jonathan (University of Alberta) | Xie, Fan (University of Alberta)
GBFS-based satisficing planners often augment their search with knowledge-based enhancements such as preferred operators and multiple heuristics. These techniques seek to improve planner performance by making the search more informed. In our work, we will focus on how these enhancements impact coverage and we will use a simple technique called epsilon-greedy node selection to demonstrate that planner coverage can also be improved by introducing knowledge-free random exploration into the search. We then revisit the existing knowledge-based enhancements so as to determine if the knowledge these enhancements employ is offering necessary guidance, or if the impact of this knowledge is to add exploration which can be achieved more simply using randomness. This investigation provides further evidence of the importance of preferred operators and shows that the knowledge added when using an additional heuristic is crucial in certain domains, while not being as effective as random exploration in others. Finally, we demonstrate that random exploration can also improve the coverage of LAMA, a planner which already employs multiple enhancements. This suggests that knowledge-based enhancements need to be compared to appropriate knowledge-free random baselines so as to ensure the importance of the knowledge being used.
Simultaneously Searching with Multiple Settings: An Alternative to Parameter Tuning for Suboptimal Single-Agent Search Algorithms
Valenzano, Richard Anthony (University of Alberta) | Sturtevant, Nathan (University of Alberta) | Schaeffer, Jonathan (University of Alberta) | Buro, Karen (Grant MacEwan University) | Kishimoto, Akihiro (Tokyo Institute of Technology and Japan Science and Technology Agency)
Many search algorithms have parameters that need to be tuned to get the best performance. Typically, the parameters are tuned offline, resulting in a generic setting that is supposed to be effective on all problem instances. For suboptimal single-agent search, problem-instance-specific parameter settings can result in substantially reduced search effort. We consider the use of dovetailing as a way to take advantage of this fact. Dovetailing is a procedure that performs search with multiple parameter settings simultaneously. Dovetailing is shown to improve the search speed of weighted IDA* by several orders of magnitude and to generally enhance the performance of weighted RBFS. This procedure is trivially parallelizable and is shown to be an effective form of parallelization for WA* and BULB. In particular, using WA* with parallel dovetailing yields good speedups in the sliding-tile puzzle domain, and increases the number of problems solved when used in an automated planning system.