It Takes Two to Tango: Directly Optimizing for Constrained Synthesizability in Generative Molecular Design

Oct-15-2024–arXiv.org Artificial Intelligence

Constrained synthesizability is an unaddressed challenge in generative molecular design. In particular, designing molecules satisfying multi-parameter optimization objectives, while simultaneously being synthesizable and enforcing the presence of specific commercial building blocks in the synthesis. This is practically important for molecule re-purposing, sustainability, and efficiency. In this work, we propose a novel reward function called TANimoto Group Overlap (TANGO), which uses chemistry principles to transform a sparse reward function into a dense and learnable reward function -- crucial for reinforcement learning. TANGO can augment general-purpose molecular generative models to directly optimize for constrained synthesizability while simultaneously optimizing for other properties relevant to drug discovery using reinforcement learning. Our framework is general and addresses starting-material, intermediate, and divergent synthesis constraints. Contrary to most existing works in the field, we show that incentivizing a general-purpose (without any inductive biases) model is a productive approach to navigating challenging optimization scenarios. We demonstrate this by showing that the trained models explicitly learn a desirable distribution. Our framework is the first generative approach to tackle constrained synthesizability.

machine learning, natural language, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

Oct-15-2024

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.82)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks > Deep Learning (0.67)
    - Reinforcement Learning (0.68)
  - Natural Language (1.00)
  - Representation & Reasoning (1.00)