Goto

Collaborating Authors

 deepchem


DeepChem Equivariant: SE(3)-Equivariant Support in an Open-Source Molecular Machine Learning Library

Siguenza, Jose, Ramsundar, Bharath

arXiv.org Artificial Intelligence

Neural networks that incorporate geometric relationships respecting SE(3) group transformations (e.g. rotations and translations) are increasingly important in molecular applications, such as molecular property prediction, protein structure modeling, and materials design. These models, known as SE(3)-equivariant neural networks, ensure outputs transform predictably with input coordinate changes by explicitly encoding spatial atomic positions. Although libraries such as E3NN [4] and SE(3)-TRANSFORMER [3 ] offer powerful implementations, they often require substantial deep learning or mathematical prior knowledge and lack complete training pipelines. We extend DEEPCHEM [ 13] with support for ready-to-use equivariant models, enabling scientists with minimal deep learning background to build, train, and evaluate models, such as SE(3)-Transformer and Tensor Field Networks. Our implementation includes equivariant models, complete training pipelines, and a toolkit of equivariant utilities, supported with comprehensive tests and documentation, to facilitate both application and further development of SE(3)-equivariant models.


Open-Source Protein Language Models for Function Prediction and Protein Design

Pandi, Shivasankaran Vanaja, Ramsundar, Bharath

arXiv.org Artificial Intelligence

Protein language models (PLMs) have shown promise in improving the understanding of protein sequences, contributing to advances in areas such as function prediction and protein engineering. However, training these models from scratch requires significant computational resources, limiting their accessibility. To address this, we integrate a PLM into DeepChem, an open-source framework for computational biology and chemistry, to provide a more accessible platform for protein-related tasks. We evaluate the performance of the integrated model on various protein prediction tasks, showing that it achieves reasonable results across benchmarks. Additionally, we present an exploration of generating plastic-degrading enzyme candidates using the model's embeddings and latent space manipulation techniques. While the results suggest that further refinement is needed, this approach provides a foundation for future work in enzyme design. This study aims to facilitate the use of PLMs in research fields like synthetic biology and environmental sustainability, even for those with limited computational resources.


Open source Differentiable ODE Solving Infrastructure

Singh, Rakshit Kr., Menezes, Aaron Rock, Irfan, Rida, Ramsundar, Bharath

arXiv.org Artificial Intelligence

Ordinary Differential Equations (ODEs) are widely used in physics, chemistry, and biology to model dynamic systems, including reaction kinetics, population dynamics, and biological processes. In this work, we integrate GPU-accelerated ODE solvers into the open-source DeepChem framework, making these tools easily accessible. These solvers support multiple numerical methods and are fully differentiable, enabling easy integration into more complex differentiable programs. We demonstrate the capabilities of our implementation through experiments on Lotka-Volterra predator-prey dynamics, pharmacokinetic compartment models, neural ODEs, and solving PDEs using reaction-diffusion equations. Our solvers achieved high accuracy with mean squared errors ranging from $10^{-4}$ to $10^{-6}$ and showed scalability in solving large systems with up to 100 compartments.


Open Source Infrastructure for Automatic Cell Segmentation

Menezes, Aaron Rock, Ramsundar, Bharath

arXiv.org Artificial Intelligence

Automated cell segmentation is crucial for various biological and medical applications, facilitating tasks like cell counting, morphology analysis, and drug discovery. However, manual segmentation is time-consuming and prone to subjectivity, necessitating robust automated methods. This paper presents open-source infrastructure, utilizing the UNet model, a deep-learning architecture noted for its effectiveness in image segmentation tasks. This implementation is integrated into the open-source DeepChem package, enhancing accessibility and usability for researchers and practitioners. The resulting tool offers a convenient and user-friendly interface, reducing the barrier to entry for cell segmentation while maintaining high accuracy.


Open-Source Molecular Processing Pipeline for Generating Molecules

Shreyas, V, Siguenza, Jose, Bania, Karan, Ramsundar, Bharath

arXiv.org Artificial Intelligence

The discovery of new molecules and materials is crucial for addressing challenges in chemistry, such as treating diseases and tackling climate change [Liu et al., 2023, Sanchez and Aspuru-Guzik, 2018]. Traditional methods rely on human expertise and are time-consuming and costly, limiting the exploration of the vast chemical space [Polishchuk et al., 2013]. Generative models offer a promising solution using deep learning to design molecules based on desired properties, rapidly identifying diverse and optimized molecules for specific applications. These models vary in their approaches and have seen rapid development, with benchmarks now in place to evaluate their performance in terms of distribution learning and chemical diversity. Although these models are publicly available, practitioners require extensive Python and machine learning knowledge to reap their benefits. Thus, we introduce open-source molecular generative model infrastructure into DeepChem Ramsundar et al. [2019], a widely used open-source library for molecular machine learning.


Open-Source Fermionic Neural Networks with Ionic Charge Initialization

Pranesh, Shai, Zhu, Shang, Viswanathan, Venkat, Ramsundar, Bharath

arXiv.org Artificial Intelligence

Finding accurate solutions to the electronic Schr\"odinger equation plays an important role in discovering important molecular and material energies and characteristics. Consequently, solving systems with large numbers of electrons has become increasingly important. Variational Monte Carlo (VMC) methods, especially those approximated through deep neural networks, are promising in this regard. In this paper, we aim to integrate one such model called the FermiNet, a post-Hartree-Fock (HF) Deep Neural Network (DNN) model, into a standard and widely used open source library, DeepChem. We also propose novel initialization techniques to overcome the difficulties associated with the assignment of excess or lack of electrons for ions.


Google Summer of Code 2022

#artificialintelligence

These SMILES can be analysed using the RDKit library to get information about the atoms and bonds in the molecules. Molecular fingerprinting is a vectorized representation of molecules capturing precise details of atomic configurations. During the featurization process, a molecule is decomposed into substructures (e.g., fragments) of a fixed-length binary fingerprint assembled into an array whose each element is either 1 or 0. For this project, I implemented atomic and bond-level featurization and molecule-level (global) featurization in DeepChem, specific to D-MPNN model requirements. The D-MPNN paper [1] suggested 133 features for each atom and 14 features for each bond in a molecule. The individual features are extracted from SMILES using RDKit library and one-hot encoded to get vectorized representation.


Molecular Deep Learning using DeepChem

#artificialintelligence

I vividly remember my high school Chemistry teacher teaching us about covalent bonds using a 3D model of a water molecule. I also remember enjoying my time in the Chemistry lab trying to determine if a given salt is more acidic or alkaline by performing many tests. How would this setup change if we needed to replace the human performing these experiments with a machine? Recently, my curiosity about applying deep learning architectures in the life sciences resulted in an interesting learning opportunity. I stumbled onto some libraries like RDKit and DeepChem that help with training and developing deep learning data models for use in Drug Discovery.


AI in drug discovery is overhyped: examples from AstraZeneca, Harvard, Stanford and Insilico…

@machinelearnbot

Investments in AI for drug discovery are surging. Big Pharmas are throwing big bucks. Sanofi signed a 300 Million dollars deal with the startup Exscientia, and GSK did the same for 42 Million dollars. The Silicon Valley VC firm Andreessen Horowitz launched a new 450 Million dollars bio investment fund, with one focus area in applications of AI to drug discovery. In this craze, lots of pharma/biotech companies and investors wonder whether they should jump on the bandwagon in 2018, or wait and see.


AI in drug discovery is overhyped: examples from AstraZeneca, Harvard, Stanford and Insilico…

#artificialintelligence

Investments in AI for drug discovery are surging. Big Pharmas are throwing big bucks. Sanofi signed a 300 Million dollars deal with the startup Exscientia, and GSK did the same for 42 Million dollars. The Silicon Valley VC firm Andreessen Horowitz launched a new 450 Million dollars bio investment fund, with one focus area in applications of AI to drug discovery. In this craze, lots of pharma/biotech companies and investors wonder whether they should jump on the bandwagon in 2018, or wait and see.