Goto

Collaborating Authors

 molecular geometry



Piloting Structure-Based Drug Design via Modality-Specific Optimal Schedule

Qiu, Keyue, Song, Yuxuan, Fan, Zhehuan, Liu, Peidong, Zhang, Zhe, Zheng, Mingyue, Zhou, Hao, Ma, Wei-Ying

arXiv.org Artificial Intelligence

Structure-Based Drug Design (SBDD) is crucial for identifying bioactive molecules. Recent deep generative models are faced with challenges in geometric structure modeling. A major bottleneck lies in the twisted probability path of multi-modalities -- continuous 3D positions and discrete 2D topologies -- which jointly determine molecular geometries. By establishing the fact that noise schedules decide the Variational Lower Bound (VLB) for the twisted probability path, we propose VLB-Optimal Scheduling (VOS) strategy in this under-explored area, which optimizes VLB as a path integral for SBDD. Our model effectively enhances molecular geometries and interaction modeling, achieving state-of-the-art PoseBusters passing rate of 95.9% on CrossDock, more than 10% improvement upon strong baselines, while maintaining high affinities and robust intramolecular validity evaluated on held-out test set. Code is available at https://github.com/AlgoMole/MolCRAFT.


Reviews: Symmetry-adapted generation of 3d point sets for the targeted discovery of molecules

Neural Information Processing Systems

I think the task is original and important for computational chemistry. The underlying generative model is a variant of SchNet that expands the molecule one atom at a time along with its distance to previous atoms. In that regard, the model is similar to GraphRNN (You et al., 2018), but operating over point clouds instead of graphs. The related work is mostly complete, but I think the author should discuss how is their method different from Mansimov et al., 2019, which is also a generative model for 3D molecular geometry. AFAIK, Mansimov et al.'s model only generates 3D geometry, while GSchNet learns to generate both the molecule (atoms and bonds) as well as their 3D geometry (distances).


PubChemQC B3LYP/6-31G*//PM6 dataset: the Electronic Structures of 86 Million Molecules using B3LYP/6-31G* calculations

Nakata, Maho, Maeda, Toshiyuki

arXiv.org Artificial Intelligence

This article presents the "PubChemQC B3LYP/6-31G*//PM6" dataset, containing electronic properties of 85,938,443 molecules. It includes orbitals, orbital energies, total energies, dipole moments, and other relevant properties. The dataset encompasses a wide range of molecules, from essential compounds to biomolecules up to 1000 molecular weight, covering 94.0% of the original PubChem Compound catalog (as of August 29, 2016). The electronic properties were calculated using the B3LYP/6-31G* and PM6 methods. The dataset is available in three formats: (i) GAMESS quantum chemistry program files, (ii) selected JSON output files, and (iii) a PostgreSQL database, enabling researchers to query molecular properties. Five sub-datasets offer more specific data. The first two subsets include molecules with C, H, O, and N, under 300 and 500 molecular weight respectively. The third and fourth subsets contain C, H, N, O, P, S, F, and Cl, under 300 and 500 molecular weight respectively. The fifth subset includes C, H, N, O, P, S, F, Cl, Na, K, Mg, and Ca, under 500 molecular weight. Coefficients of determination ranged from 0.892 (CHON500) to 0.803 (whole) for the HOMO-LUMO energy gap. These findings represent extensive investigations and can be utilized for drug discovery, material science, and other applications. The datasets are available under the Creative Commons Attribution 4.0 International license at https://nakatamaho.riken.jp/pubchemqc.riken.jp/b3lyp_pm6_datasets.html.


New Research on Exhaustive Search part1(Machine Learning)

#artificialintelligence

Abstract: In this paper, by constructing hard examples of CSP (with large domains) and SAT (with long clauses), we prove that such examples cannot be solved without exhaustive search, which implies a weaker conclusion P NP. This constructive approach for proving impossibility results is very different (and missing) from those currently used in computational complexity theory, but is similar to that used by Kurt Gödel in proving his famous logical impossibility results. Just as shown by Gödel's results that formal unprovability is provable in mathematics, the results of this paper show that proving computational hardness is not hard in mathematics. Abstract: We propose a nonvariational scheme for geometry optimization of molecules for the first-quantized eigensolver, a recently proposed framework for quantum chemistry using the probabilistic imaginary-time evolution (PITE) on a quantum computer. While the electrons in a molecule are treated in the scheme as quantum mechanical particles, the nuclei are treated as classical point charges.


Distance-Geometric Graph Attention Network (DG-GAT) for 3D Molecular Geometry

Chang, Daniel T.

arXiv.org Artificial Intelligence

Deep learning for molecular science has so far mainly focused on 2D molecular graphs. Recently, however, there has been work to extend it to 3D molecular geometry, due to its scientific significance and critical importance in real-world applications. The 3D distance-geometric graph representation (DG-GR) adopts a unified scheme (distance) for representing the geometry of 3D graphs. It is invariant to rotation and translation of the graph, and it reflects pair-wise node interactions and their generally local nature, particularly relevant for 3D molecular geometry. To facilitate the incorporation of 3D molecular geometry in deep learning for molecular science, we adopt the new graph attention network with dynamic attention (GATv2) for use with DG-GR and propose the 3D distance-geometric graph attention network (DG-GAT). GATv2 is a great fit for DG-GR since the attention can vary by node and by distance between nodes. Experimental results of DG-GAT for the ESOL and FreeSolv datasets show major improvement (31% and 38%, respectively) over those of the standard graph convolution network based on 2D molecular graphs. The same is true for the QM9 dataset. Our work demonstrates the utility and value of DG-GAT for deep learning based on 3D molecular geometry.


Molecule3D: A Benchmark for Predicting 3D Geometries from Molecular Graphs

Xu, Zhao, Luo, Youzhi, Zhang, Xuan, Xu, Xinyi, Xie, Yaochen, Liu, Meng, Dickerson, Kaleb, Deng, Cheng, Nakata, Maho, Ji, Shuiwang

arXiv.org Artificial Intelligence

Graph neural networks are emerging as promising methods for modeling molecular graphs, in which nodes and edges correspond to atoms and chemical bonds, respectively. Recent studies show that when 3D molecular geometries, such as bond lengths and angles, are available, molecular property prediction tasks can be made more accurate. However, computing of 3D molecular geometries requires quantum calculations that are computationally prohibitive. For example, accurate calculation of 3D geometries of a small molecule requires hours of computing time using density functional theory (DFT). Here, we propose to predict the ground-state 3D geometries from molecular graphs using machine learning methods. To make this feasible, we develop a benchmark, known as Molecule3D, that includes a dataset with precise ground-state geometries of approximately 4 million molecules derived from DFT. We also provide a set of software tools for data processing, splitting, training, and evaluation, etc. Specifically, we propose to assess the error and validity of predicted geometries using four metrics. We implement two baseline methods that either predict the pairwise distance between atoms or atom coordinates in 3D space. Experimental results show that, compared with generating 3D geometries with RDKit, our method can achieve comparable prediction accuracy but with much smaller computational costs. Our Molecule3D is available as a module of the MoleculeX software library (https://github.com/divelab/MoleculeX).


Neural Message Passing on High Order Paths

Flam-Shepherd, Daniel, Wu, Tony, Friederich, Pascal, Aspuru-Guzik, Alan

arXiv.org Machine Learning

Graph neural network have achieved impressive results in predicting molecular properties, but they do not directly account for local and hidden structures in the graph such as functional groups and molecular geometry. At each propagation step, GNNs aggregate only over first order neighbours, ignoring important information contained in subsequent neighbours as well as the relationships between those higher order connections. In this work, we generalize graph neural nets to pass messages and aggregate across higher order paths. This allows for information to propagate over various levels and substructures of the graph. We demonstrate our model on a few tasks in molecular property prediction.


Machine learning enables long time scale molecular photodynamics simulations

Westermayr, Julia, Gastegger, Michael, Menger, Maximilian F. S. J., Mai, Sebastian, González, Leticia, Marquetand, Philipp

arXiv.org Machine Learning

Abstract: Photo-inducedprocesses are fundamental in nature, but accurate simulations are seriously limited by the cost of the underlying quantum chemical calculations, hampering their application for long time scales. Here we introduce a method based on machine learning to overcome this bottleneck and enable accurate photodynamics on nanosecond time scales, which are otherwise out of reach with contemporary approaches. Instead of expensive quantum chemistry during molecular dynamics simulations, we use deep neural networks to learn the relationship between a molecular geometry and its high-dimensional electronic properties. As an example, the time evolution of the methylenimmonium cation for one nanosecond is used to demonstrate that machine learning algorithms can outperform standard excited-state molecular dynamics approaches in their computational efficiency while delivering the same accuracy. Introduction Machine learning (ML) is revolutionizing the most diverse domains, like image recognition [1], playing board games [2], or society integration of refugees [3]. Also in chemistry, anincreasing range of applications is being tackled with ML, for example, the design and discovery of new molecules and materials [4, 5, 6]. In the present study, we show how ML enables efficient photodynamics simulations. Photodynamics is the study of photo-induced processes that occur after a molecule is exposed to light. Photosynthesis or DNA photodamage leading to skin cancer are only two examples of phenomena that involve molecules interacting with light [7, 8, 9, 10, 11]. The simulation of such processes has been key to learn structure-dynamicsfunction relationshipsthat can be used to guide the design of photonic materials, such as photosensitive drugs [12], photocatalysts [4] and photovoltaics [13, 14].