Goto

Collaborating Authors

 Singh, Shantanu


Learning Molecular Representation in a Cell

arXiv.org Artificial Intelligence

Predicting drug efficacy and safety in vivo requires information on biological responses (e.g., cell morphology and gene expression) to small molecule perturbations. However, current molecular representation learning methods do not provide a comprehensive view of cell states under these perturbations and struggle to remove noise, hindering model generalization. We introduce the Information Alignment (InfoAlign) approach to learn molecular representations through the information bottleneck method in cells. We integrate molecules and cellular response data as nodes into a context graph, connecting them with weighted edges based on chemical, biological, and computational criteria. For each molecule in a training batch, InfoAlign optimizes the encoder's latent representation with a minimality objective to discard redundant structural information. A sufficiency objective decodes the representation to align with different feature spaces from the molecule's neighborhood in the context graph. We demonstrate that the proposed sufficiency objective for alignment is tighter than existing encoder-based contrastive methods.


MOTI$\mathcal{VE}$: A Drug-Target Interaction Graph For Inductive Link Prediction

arXiv.org Artificial Intelligence

Drug-target interaction (DTI) prediction is crucial for identifying new therapeutics and detecting mechanisms of action. While structure-based methods accurately model physical interactions between a drug and its protein target, cell-based assays such as Cell Painting can better capture complex DTI interactions. This paper introduces MOTI$\mathcal{VE}$, a Morphological cOmpound Target Interaction Graph dataset that comprises Cell Painting features for $11,000$ genes and $3,600$ compounds along with their relationships extracted from seven publicly available databases. We provide random, cold-source (new drugs), and cold-target (new genes) data splits to enable rigorous evaluation under realistic use cases. Our benchmark results show that graph neural networks that use Cell Painting features consistently outperform those that learn from graph structure alone, feature-based models, and topological heuristics. MOTI$\mathcal{VE}$ accelerates both graph ML research and drug discovery by promoting the development of more reliable DTI prediction models. MOTI$\mathcal{VE}$ resources are available at https://github.com/carpenter-singh-lab/motive.