Goto

Collaborating Authors

 pdb




Learning Admissible Heuristics for A*: Theory and Practice

Futuhi, Ehsan, Sturtevant, Nathan R.

arXiv.org Artificial Intelligence

Heuristic functions are central to the performance of search algorithms such as A-star, where admissibility - the property of never overestimating the true shortest-path cost - guarantees solution optimality. Recent deep learning approaches often disregard admissibility and provide limited guarantees on generalization beyond the training data. This paper addresses both of these limitations. First, we pose heuristic learning as a constrained optimization problem and introduce Cross-Entropy Admissibility (CEA), a loss function that enforces admissibility during training. On the Rubik's Cube domain, this method yields near-admissible heuristics with significantly stronger guidance than compressed pattern database (PDB) heuristics. Theoretically, we study the sample complexity of learning heuristics. By leveraging PDB abstractions and the structural properties of graphs such as the Rubik's Cube, we tighten the bound on the number of training samples needed for A-star to generalize. Replacing a general hypothesis class with a ReLU neural network gives bounds that depend primarily on the network's width and depth, rather than on graph size. Using the same network, we also provide the first generalization guarantees for goal-dependent heuristics.


PERC: a suite of software tools for the curation of cryoEM data with application to simulation, modelling and machine learning

Costa-Gomes, Beatriz, Greer, Joel, Juraschko, Nikolai, Parkhurst, James, Mirecka, Jola, Famili, Marjan, Rangel-Smith, Camila, Strickson, Oliver, Lowe, Alan, Basham, Mark, Burnley, Tom

arXiv.org Artificial Intelligence

Ease of access to data, tools and models expedites scientific research. In structural biology there are now numerous open repositories of experimental and simulated datasets. Being able to easily access and utilise these is crucial for allowing researchers to make optimal use of their research effort. The tools presented here are useful for collating existing public cryoEM datasets and/or creating new synthetic cryoEM datasets to aid the development of novel data processing and interpretation algorithms. In recent years, structural biology has seen the development of a multitude of machine-learning based algorithms for aiding numerous steps in the processing and reconstruction of experimental datasets and the use of these approaches has become widespread. Developing such techniques in structural biology requires access to large datasets which can be cumbersome to curate and unwieldy to make use of. In this paper we present a suite of Python software packages which we collectively refer to as PERC (profet, EMPIARreader and CAKED). These are designed to reduce the burden which data curation places upon structural biology research. The protein structure fetcher (profet) package allows users to conveniently download and cleave sequences or structures from the Protein Data Bank or Alphafold databases. EMPIARreader allows lazy loading of Electron Microscopy Public Image Archive datasets in a machine-learning compatible structure. The Class Aggregator for Key Electron-microscopy Data (CAKED) package is designed to seamlessly facilitate the training of machine learning models on electron microscopy data, including electron-cryo-microscopy-specific data augmentation and labelling. These packages may be utilised independently or as building blocks in workflows. All are available in open source repositories and designed to be easily extensible to facilitate more advanced workflows if required.


PDB: Not All Drivers Are the Same -- A Personalized Dataset for Understanding Driving Behavior

Wei, Chuheng, Qin, Ziye, Li, Siyan, Zhang, Ziyan, Zhao, Xuanpeng, Abdelraouf, Amr, Gupta, Rohit, Han, Kyungtae, Barth, Matthew J., Wu, Guoyuan

arXiv.org Artificial Intelligence

Driving behavior is inherently personal, influenced by individual habits, decision-making styles, and physiological states. However, most existing datasets treat all drivers as homogeneous, overlooking driver-specific variability. To address this gap, we introduce the Personalized Driving Behavior (PDB) dataset, a multi-modal dataset designed to capture personalization in driving behavior under naturalistic driving conditions. Unlike conventional datasets, PDB minimizes external influences by maintaining consistent routes, vehicles, and lighting conditions across sessions. It includes sources from 128-line LiDAR, front-facing camera video, GNSS, 9-axis IMU, CAN bus data (throttle, brake, steering angle), and driver-specific signals such as facial video and heart rate. The dataset features 12 participants, approximately 270,000 LiDAR frames, 1.6 million images, and 6.6 TB of raw sensor data. The processed trajectory dataset consists of 1,669 segments, each spanning 10 seconds with a 0.2-second interval. By explicitly capturing drivers' behavior, PDB serves as a unique resource for human factor analysis, driver identification, and personalized mobility applications, contributing to the development of human-centric intelligent transportation systems.


Refinements on the Complementary PDB Construction Mechanism

Zou, Yufeng

arXiv.org Artificial Intelligence

Pattern database (PDB) is one of the most popular automated heuristic generation techniques. A PDB maps states in a planning task to abstract states by considering a subset of variables and stores their optimal costs to the abstract goal in a look up table. As the result of the progress made on symbolic search over recent years, symbolic-PDB-based planners achieved impressive results in the International Planning Competition (IPC) 2018. Among them, Complementary 1 (CPC1) tied as the second best planners and the best non-portfolio planners in the cost optimal track, only 2 tasks behind the winner. It uses a combination of different pattern generation algorithms to construct PDBs that are complementary to existing ones. As shown in the post contest experiments, there is room for improvement. In this paper, we would like to present our work on refining the PDB construction mechanism of CPC1. By testing on IPC 2018 benchmarks, the results show that a significant improvement is made on our modified planner over the original version.


DiAMoNDBack: Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping of C{\alpha} Protein Traces

Jones, Michael S., Shmilovich, Kirill, Ferguson, Andrew L.

arXiv.org Artificial Intelligence

Coarse-grained molecular models of proteins permit access to length and time scales unattainable by all-atom models and the simulation of processes that occur on long-time scales such as aggregation and folding. The reduced resolution realizes computational accelerations but an atomistic representation can be vital for a complete understanding of mechanistic details. Backmapping is the process of restoring all-atom resolution to coarse-grained molecular models. In this work, we report DiAMoNDBack (Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping) as an autoregressive denoising diffusion probability model to restore all-atom details to coarse-grained protein representations retaining only C{\alpha} coordinates. The autoregressive generation process proceeds from the protein N-terminus to C-terminus in a residue-by-residue fashion conditioned on the C{\alpha} trace and previously backmapped backbone and side chain atoms within the local neighborhood. The local and autoregressive nature of our model makes it transferable between proteins. The stochastic nature of the denoising diffusion process means that the model generates a realistic ensemble of backbone and side chain all-atom configurations consistent with the coarse-grained C{\alpha} trace. We train DiAMoNDBack over 65k+ structures from Protein Data Bank (PDB) and validate it in applications to a hold-out PDB test set, intrinsically-disordered protein structures from the Protein Ensemble Database (PED), molecular dynamics simulations of fast-folding mini-proteins from DE Shaw Research, and coarse-grained simulation data. We achieve state-of-the-art reconstruction performance in terms of correct bond formation, avoidance of side chain clashes, and diversity of the generated side chain configurational states. We make DiAMoNDBack model publicly available as a free and open source Python package.


A Context-Aware Approach for Textual Adversarial Attack through Probability Difference Guided Beam Search

Liu, Huijun, Yu, Jie, Li, Shasha, Ma, Jun, Ji, Bin

arXiv.org Artificial Intelligence

Textual adversarial attacks expose the vulnerabilities of text classifiers and can be used to improve their robustness. Existing context-aware methods solely consider the gold label probability and use the greedy search when searching an attack path, often limiting the attack efficiency. To tackle these issues, we propose PDBS, a context-aware textual adversarial attack model using Probability Difference guided Beam Search. The probability difference is an overall consideration of all class label probabilities, and PDBS uses it to guide the selection of attack paths. In addition, PDBS uses the beam search to find a successful attack path, thus avoiding suffering from limited search space. Extensive experiments and human evaluation demonstrate that PDBS outperforms previous best models in a series of evaluation metrics, especially bringing up to a +19.5% attack success rate. Ablation studies and qualitative analyses further confirm the efficiency of PDBS.


Sturtevant

AAAI Conferences

Pattern databases (PDBs) have been widely used as heuristics for many types of search spaces,but they have always been computed so as to fit in the main memory of the machine usingthe PDB. This paper studies the how external-memory PDBs can be used. It presentsresults of both using hard disk drives and solid-state drives directly to access the data, and of justloading a portion of the PDB into RAM. For the time being, all of these approaches are inferiorto building the largest PDB that fits into RAM.


Clausecker

AAAI Conferences

A pattern database (PDB) is a pre-computed lookup table storing shortest distances from abstract states to abstract goal states. PDBs are key components in heuristic search as their entries are used to prune paths that cannot lead to an optimal solution. With the sliding-tile puzzle as an exemplary application domain, we present methods to improve the precision and size of PDBs by improving additive pattern databases to zero-aware additive pattern databases (ZPDBs), reducing the compression rate from previously 1.6 bit to 1 bit per entry, generating optimal additive pattern partitionings, and building effective collections of pattern databases. With these enhancements, we achieve an overall 8.59-fold performance gain on the 24-puzzle compared to the previously best set of 6-tile PDBs.