linker
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Asia > China (0.04)
- Asia > China (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
LinkerNet: Fragment Poses and Linker Co-Design with 3D Equivariant Diffusion
Targeted protein degradation techniques, such as PROteolysis TArgeting Chimeras (PROTACs), have emerged as powerful tools for selectively removing disease-causing proteins. One challenging problem in this field is designing a linker to connect different molecular fragments to form a stable drug-candidate molecule. Existing models for linker design assume that the relative positions of the fragments are known, which may not be the case in real scenarios. In this work, we address a more general problem where the poses of the fragments are in 3D space. We develop a 3D equivariant diffusion model that jointly learns the generative process of both fragment poses and the 3D structure of the linker. By viewing fragments as rigid bodies, we design a fragment pose prediction module inspired by the Newton-Euler equations in rigid body mechanics. Empirical studies on ZINC and PROTAC-DB datasets demonstrate that our model can generate chemically valid, synthetically-accessible, and low-energy molecules under both unconstrained and constrained generation settings.
AI Agent for Source Finding by SoFiA-2 for SKA-SDC2
Zhou, Xingchen, Li, Nan, Jia, Peng, Liu, Yingfeng, Deng, Furen, Shu, Shuanghao, Li, Ying, Cao, Liang, Shan, Huanyuan, Ibitoye, Ayodeji
Source extraction is crucial in analyzing data from next-generation, large-scale sky surveys in radio bands, such as the Square Kilometre Array (SKA). Several source extraction programs, including SoFiA and Aegean, have been developed to address this challenge. However, finding optimal parameter configurations when applying these programs to real observations is non-trivial. For example, the outcomes of SoFiA intensely depend on several key parameters across its preconditioning, source-finding, and reliability-filtering modules. To address this issue, we propose a framework to automatically optimize these parameters using an AI agent based on a state-of-the-art reinforcement learning (RL) algorithm, i.e., Soft Actor-Critic (SAC). The SKA Science Data Challenge 2 (SDC2) dataset is utilized to assess the feasibility and reliability of this framework. The AI agent interacts with the environment by adjusting parameters based on the feedback from the SDC2 score defined by the SDC2 Team, progressively learning to select parameter sets that yield improved performance. After sufficient training, the AI agent can automatically identify an optimal parameter configuration that outperform the benchmark set by Team SoFiA within only 100 evaluation steps and with reduced time consumption. Our approach could address similar problems requiring complex parameter tuning, beyond radio band surveys and source extraction. Yet, high-quality training sets containing representative observations and catalogs of ground truth are essential.
- Asia > Middle East > Israel (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > Minnesota (0.04)
- (7 more...)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Asia > China (0.04)
Functional-Group-Based Diffusion for Pocket-Specific Molecule Generation and Elaboration
In this way, however, it is hard to generate realistic fragments with complicated structures. To solve this, we propose D3FG, a functional-group-based diffusion model for pocket-specific molecule generation and elaboration. D3FG decomposes molecules into two categories of components: functional groups defined as rigid bodies and linkers as mass points.
- Asia > China (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
Steering an Active Learning Workflow Towards Novel Materials Discovery via Queue Prioritization
Schwarting, Marcus, Ward, Logan, Hudson, Nathaniel, Yan, Xiaoli, Blaiszik, Ben, Chaudhuri, Santanu, Huerta, Eliu, Foster, Ian
Abstract--Generative AI poses both opportunities and risks for solving inverse design problems in the sciences. Generative tools provide the ability to expand and refine a search space autonomously, but do so at the cost of exploring low-quality regions until sufficiently fine tuned. Here, we propose a queue prioritization algorithm that combines generative modeling and active learning in the context of a distributed workflow for exploring complex design spaces. We find that incorporating an active learning model to prioritize top design candidates can prevent a generative AI workflow from expending resources on nonsensical candidates and halt potential generative model decay. For an existing generative AI workflow for discovering novel molecular structure candidates for carbon capture, our active learning approach significantly increases the number of high-quality candidates identified by the generative model. We find that, out of 1000 novel candidates, our workflow without active learning can generate an average of 281 high-performing candidates, while our proposed prioritization with active learning can generate an average 604 high-performing candidates. In recent years, Generative AI (GenAI) models have significantly changed how computational screening workflows are designed and executed.
- Energy (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.30)
PhenoMoler: Phenotype-Guided Molecular Optimization via Chemistry Large Language Model
Current molecular generative models primarily focus on improving drug-target binding affinity and specificity, often neglecting the system-level phenotypic effects elicited by compounds. Transcriptional profiles, as molecule-level readouts of drug-induced phenotypic shifts, offer a powerful opportunity to guide molecular design in a phenotype-aware manner. We present PhenoMoler, a phenotype-guided molecular generation framework that integrates a chemistry large language model with expression profiles to enable biologically informed drug design. By conditioning the generation on drug-induced differential expression signatures, PhenoMoler explicitly links transcriptional responses to chemical structure. By selectively masking and reconstructing specific substructures-scaffolds, side chains, or linkers-PhenoMoler supports fine-grained, controllable molecular optimization. Extensive experiments demonstrate that PhenoMoler generates chemically valid, novel, and diverse molecules aligned with desired phenotypic profiles. Compared to FDA-approved drugs, the generated compounds exhibit comparable or enhanced drug-likeness (QED), optimized physicochemical properties, and superior binding affinity to key cancer targets. These findings highlight PhenoMoler's potential for phenotype-guided and structure-controllable molecular optimization.
- North America > United States (1.00)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Asia > Middle East > Lebanon > Keserwan-Jbeil Governorate > Blat (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.95)
FragmentGPT: A Unified GPT Model for Fragment Growing, Linking, and Merging in Molecular Design
Liu, Xuefeng, Jiang, Songhao, Huang, Qinan, Xu, Tinson, Foster, Ian, Wang, Mengdi, Lin, Hening, Stevens, Rick
Fragment-Based Drug Discovery (FBDD) is a popular approach in early drug development, but designing effective linkers to combine disconnected molecular fragments into chemically and pharmacologically viable candidates remains challenging. Further complexity arises when fragments contain structural redundancies, like duplicate rings, which cannot be addressed by simply adding or removing atoms or bonds. To address these challenges in a unified framework, we introduce FragmentGPT, which integrates two core components: (1) a novel chemically-aware, energy-based bond cleavage pre-training strategy that equips the GPT-based model with fragment growing, linking, and merging capabilities, and (2) a novel Reward Ranked Alignment with Expert Exploration (RAE) algorithm that combines expert imitation learning for diversity enhancement, data selection and augmentation for Pareto and composite score optimality, and Supervised Fine-Tuning (SFT) to align the learner policy with multi-objective goals. Conditioned on fragment pairs, FragmentGPT generates linkers that connect diverse molecular subunits while simultaneously optimizing for multiple pharmaceutical goals. It also learns to resolve structural redundancies-such as duplicated fragments-through intelligent merging, enabling the synthesis of optimized molecules. FragmentGPT facilitates controlled, goal-driven molecular assembly. Experiments and ablation studies on real-world cancer datasets demonstrate its ability to generate chemically valid, high-quality molecules tailored for downstream drug discovery tasks.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > San Marino > Fiorentino > Fiorentino (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Weinheim (0.04)
Dynamic Injection of Entity Knowledge into Dense Retrievers
Yamada, Ikuya, Ri, Ryokan, Kojima, Takeshi, Iwasawa, Yusuke, Matsuo, Yutaka
Dense retrievers often struggle with queries involving less-frequent entities due to their limited entity knowledge. We propose the Knowledgeable Passage Retriever (KPR), a BERT-based retriever enhanced with a context-entity attention layer and dynamically updatable entity embeddings. This design enables KPR to incorporate external entity knowledge without retraining. Experiments on three datasets demonstrate that KPR consistently improves retrieval accuracy, with particularly large gains on the EntityQuestions dataset. When built on the off-the-shelf bge-base retriever, KPR achieves state-of-the-art performance among similarly sized models on two datasets. Models and code are released at https://github.com/knowledgeable-embedding/knowledgeable-embedding.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- North America > United States > New York (0.04)
- North America > United States > Arizona (0.04)
- (2 more...)