Goto

Collaborating Authors

 Materials








Appendix A Patch based Negative Data Augmentation Reduces Texture Bias

Neural Information Processing Systems

Figure 5: ViTs trained only on our patch-based transformations exhibit stronger texture bias. Each bar is the texture accuracy ( %) on Conflict Stimuli (Geirhos et al., 2018), and a higher texture accuracy indicates the model has a higher bias towards texture. The "texture accuracy" is defined as the percentage of images that are classified as the "texture" label, provided the image is classified as either "texture" or "shape" label. The baseline model is ViT -B/16 in (Dosovitskiy et al., 2021) trained on original images. Other models are trained on patch-based transformed images, e.g., "P-Shuffle" stands for a ViT -B/16 model trained on patch-based shuffled images.



A Guide to Bayesian Optimization in Bioprocess Engineering

arXiv.org Machine Learning

Bayesian optimization has become widely popular across various experimental sciences due to its favorable attributes: it can handle noisy data, perform well with relatively small datasets, and provide adaptive suggestions for sequential experimentation. While still in its infancy, Bayesian optimization has recently gained traction in bioprocess engineering. However, experimentation with biological systems is highly complex and the resulting experimental uncertainty requires specific extensions to classical Bayesian optimization. Moreover, current literature often targets readers with a strong statistical background, limiting its accessibility for practitioners. In light of these developments, this review has two aims: first, to provide an intuitive and practical introduction to Bayesian optimization; and second, to outline promising application areas and open algorithmic challenges, thereby highlighting opportunities for future research in machine learning.


Performance of universal machine-learned potentials with explicit long-range interactions in biomolecular simulations

arXiv.org Artificial Intelligence

Universal machine-learned potentials promise transferable accuracy across compositional and vibrational degrees of freedom, yet their application to biomolecular simulations remains underexplored. This work systematically evaluates equivariant message-passing architectures trained on the SPICE-v2 dataset with and without explicit long-range dispersion and electrostatics. We assess the impact of model size, training data composition, and electrostatic treatment across in- and out-of-distribution benchmark datasets, as well as molecular simulations of bulk liquid water, aqueous NaCl solutions, and biomolecules, including alanine tripeptide, the mini-protein Trp-cage, and Crambin. While larger models improve accuracy on benchmark datasets, this trend does not consistently extend to properties obtained from simulations. Predicted properties also depend on the composition of the training dataset. Long-range electrostatics show no systematic impact across systems. However, for Trp-cage, their inclusion yields increased conformational variability. Our results suggest that imbalanced datasets and immature evaluation practices currently challenge the applicability of universal machine-learned potentials to biomolecular simulations.