Goto

Collaborating Authors

 monomer


PepEVOLVE: Position-Aware Dynamic Peptide Optimization via Group-Relative Advantage

Nguyen, Trieu, Pang, Hao-Wei, Feng, Shasha

arXiv.org Artificial Intelligence

Macrocyclic peptides are an emerging modality that combines biologics-like affinity with small-molecule-like developability, but their vast combinatorial space and multi-parameter objectives make lead optimization slow and challenging. Prior generative approaches such as PepINVENT require chemists to pre-specify mutable positions for optimization, choices that are not always known a priori, and rely on static pretraining and optimization algorithms that limit the model's ability to generalize and effectively optimize peptide sequences. We introduce PepEVOLVE, a position-aware, dynamic framework that learns both where to edit and how to dynamically optimize peptides for multi-objective improvement. PepEVOLVE (i) augments pretraining with dynamic masking and CHUCKLES shifting to improve generalization, (ii) uses a context-free multi-armed bandit router that discovers high-reward residues, and (iii) couples a novel evolving optimization algorithm with group-relative advantage to stabilize reinforcement updates. During in silico evaluations, the router policy reliably learns and concentrates probability on chemically meaningful sites that influence the peptide's properties. On a therapeutically motivated Rev-binding macrocycle benchmark, PepEVOLVE outperformed PepINVENT by reaching higher mean scores (approximately 0.8 vs. 0.6), achieving best candidates with a score of 0.95 (vs. 0.87), and converging in fewer steps under the task of optimizing permeability and lipophilicity with structural constraints. Overall, PepEVOLVE offers a practical, reproducible path to peptide lead optimization when optimal edit sites are unknown, enabling more efficient exploration and improving design quality across multiple objectives.


PepThink-R1: LLM for Interpretable Cyclic Peptide Optimization with CoT SFT and Reinforcement Learning

Wang, Ruheng, Zhang, Hang, Nguyen, Trieu, Feng, Shasha, Pang, Hao-Wei, Yu, Xiang, Xiao, Li, Zhang, Peter Zhiping

arXiv.org Artificial Intelligence

Designing therapeutic peptides with tailored properties is hindered by the vastness of sequence space, limited experimental data, and poor interpretability of current generative models. To address these challenges, we introduce PepThink-R1, a generative framework that integrates large language models (LLMs) with chain-of-thought (CoT) supervised fine-tuning and reinforcement learning (RL). Unlike prior approaches, PepThink-R1 explicitly reasons about monomer-level modifications during sequence generation, enabling interpretable design choices while optimizing for multiple pharmacological properties. Guided by a tailored reward function balancing chemical validity and property improvements, the model autonomously explores diverse sequence variants. We demonstrate that PepThink-R1 generates cyclic peptides with significantly enhanced lipophilicity, stability, and exposure, outperforming existing general LLMs (e.g., GPT-5) and domain-specific baseline in both optimization success and interpretability. To our knowledge, this is the first LLM-based peptide design framework that combines explicit reasoning with RL-driven property control, marking a step toward reliable and transparent peptide optimization for therapeutic discovery.



Data-driven identification and analysis of the glass transition in polymer melts

Banerjee, Atreyee, Hsu, Hsiao-Ping, Kremer, Kurt, Kukharenko, Oleksandra

arXiv.org Artificial Intelligence

Understanding the nature of glass transition, as well as precise estimation of the glass transition temperature for polymeric materials, remain open questions in both experimental and theoretical polymer sciences. We propose a data-driven approach, which utilizes the high-resolution details accessible through the molecular dynamics simulation and considers the structural information of individual chains. It clearly identifies the glass transition temperature of polymer melts of weakly semiflexible chains. By combining principal component analysis and clustering, we identify the glass transition temperature in the asymptotic limit even from relatively short-time trajectories, which just reach into the Rouse-like monomer displacement regime. We demonstrate that fluctuations captured by the principal component analysis reflect the change in a chain's behaviour: from conformational rearrangement above to small rearrangements below the glass transition temperature. Our approach is straightforward to apply, and should be applicable to other polymeric glass-forming liquids.


Machine Learning Approach to Polymerization Reaction Engineering: Determining Monomers Reactivity Ratios

Nguyen, Tung, Bavarian, Mona

arXiv.org Artificial Intelligence

The atom & bond features are ranked based on an average score of their influences to the reactivity ratio prediction of trained samples. Each dot represents the impact of the corresponding pair of monomers and copolymer in the training set. Red and blue color indicates the high and low impact on the reactivity ratios, respectively. A positive SHAP value indicates a positive influence on the prediction, and a negative SHAP value indicates a negative influence on the prediction.


Electronic Second Skins Are the Wearables of the Future

WIRED

The skin is the largest organ in our body, and also the most complex. Peer at it under a microscope and you'll see thousands of nerve endings that keep the brain connected to the outside world and allow us to feel touch, pressure, and pain. But when Zhenan Bao looks at it, she sees something else. For Bao, a chemical engineer focused on making polymers, the skin is not only a sensory organ, but also a material. One that, in her words, is flexible, but also stretchable, self-healing, and biodegradable.


Machine Learning in Materials Science

#artificialintelligence

Before getting into what polymers are on a molecular level, let's see some familiar materials that are good examples. Some examples of polymers include: plastic, nylon, rubber, wood, protein, and DNA. In this case, we will focus primarily on synthetic polymers like plastic and nylon. At the molecular level, polymers are composed of long chains of repeating molecules. The molecule that repeats in this chain is known as a monomer (or subunit).


Chemistry-informed Macromolecule Graph Representation for Similarity Computation and Supervised Learning

Mohapatra, Somesh, An, Joyce, Gómez-Bombarelli, Rafael

arXiv.org Machine Learning

Macromolecules are large, complex molecules composed of covalently bonded monomer units, existing in different stereochemical configurations and topologies. As a result of such chemical diversity, representing, comparing, and learning over macromolecules emerge as critical challenges. To address this, we developed a macromolecule graph representation, with monomers and bonds as nodes and edges, respectively. We captured the inherent chemistry of the macromolecule by using molecular fingerprints for node and edge attributes. For the first time, we demonstrated computation of chemical similarity between 2 macromolecules of varying chemistry and topology, using exact graph edit distances and graph kernels. We also trained graph neural networks for a variety of glycan classification tasks, achieving state-of-the-art results. Our work has two-fold implications - it provides a general framework for representation, comparison, and learning of macromolecules; and enables quantitative chemistry-informed decision-making and iterative design in the macromolecular chemical space. Macromolecules are ubiquitous and indispensable, from constituting what we are made up of to being present in almost everything we use. As biological macromolecules, they form the basis of life, serving as drivers of survival and growth functions. As synthetic macromolecules, humans have engineered the composition and topology to design structural components, sensors, shape-memory materials, drugs, encode messages, and much more (Lutz et al., 2016; Romio et al., 2020; Boydston et al., 2020; Thompson & Korley, 2020).


Optimizing Peptides in TensorFlow 2

#artificialintelligence

A guest post by Somesh Mohapatra, Rafael Gómez-Bombarelli of MIT IntroductionA polymer is a material made up of long repeating chains of molecules, like plastic or rubber. Polymers are made up of subunits (monomers) that are chemically bound to one another. The chemical composition and arrangement of monomers dictate the properties of the polymer.


New "Cyborg" Technology Could Enable Merger of Humans and AI

#artificialintelligence

Such devices could monitor for tumor development or stand in for damaged tissues. But connecting electronics directly to human tissues in the body is a huge challenge. Now, a team is reporting new coatings for components that could help them more easily fit into this environment. The researchers will present their results today (Agusut 17, 2020) at the American Chemical Society (ACS) Fall 2020 Virtual Meeting & Expo. ACS is holding the meeting through Thursday.