AITopics | solubility

Collaborating Authors

solubility

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

An Additive MLP-GNN Framework for Characterizing Chemical and Structural Contributions to Aqueous Solubility

Bhattacharya, Sampreeti, Roy, Arkaprava

arXiv.org Machine LearningJul-3-2026

Aqueous solubility is a key property in early-stage drug discovery, but most predictive models merge physicochemical descriptors and molecular graph information into a single representation, obscuring whether a prediction is driven by global chemistry, molecular structure, or both. We present an additive deep-learning framework that keeps these two sources of information separate throughout training: physicochemical descriptors are encoded by a multilayer perceptron (the chemical branch) and molecular graph topology by a graph neural network (the structural branch), with the two outputs combined only at the prediction stage through an additive model with an optional multiplicative interaction. This design provides a direct decomposition of chemical and structural components that can be examined separately after training. Furthermore, pretraining on the larger AqSolDB dataset and fine-tuning on the smaller BigSolDB2 dataset substantially improve accuracy and reduce run-to-run variations, indicating generalizability of the learned features from the data-rich settings. We further interpret the fitted model using best linear projections of the branch outputs, molecule-level embedding summaries across solubility classes, and atom-level GNNExplainer masks aggregated over functional groups. These analyses show that the chemical branch aligns with familiar physicochemical descriptors, while the structural branch captures graph-topological and functional-group patterns associated with solubility. Across both datasets, the framework attains competitive predictive performance while making the distinct roles of chemical and structural information more transparent.

artificial intelligence, machine learning, representation, (19 more...)

arXiv.org Machine Learning

2607.02212

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Predicting Polymer Solubility in Solvents Using SMILES Strings

Reinhard, Andrew

arXiv.org Artificial IntelligenceDec-11-2025

Understanding and predicting polymer solubility in various solvents is critical for applications ranging from recycling to pharmaceutical formulation. This work presents a deep learning framework that predicts polymer solubility, expressed as weight percent (wt%), directly from SMILES representations of both polymers and solvents. A dataset of 8,049 polymer solvent pairs at 25 deg C was constructed from calibrated molecular dynamics simulations (Zhou et al., 2023), and molecular descriptors and fingerprints were combined into a 2,394 feature representation per sample. A fully connected neural network with six hidden layers was trained using the Adam optimizer and evaluated using mean squared error loss, achieving strong agreement between predicted and actual solubility values. Generalizability was demonstrated using experimentally measured data from the Materials Genome Project, where the model maintained high accuracy on 25 unseen polymer solvent combinations. These findings highlight the viability of SMILES based machine learning models for scalable solubility prediction and high-throughput solvent screening, supporting applications in green chemistry, polymer processing, and materials design.

artificial intelligence, machine learning, polymer, (17 more...)

arXiv.org Artificial Intelligence

2512.09784

Genre: Research Report (0.82)

Industry: Materials > Chemicals > Commodity Chemicals > Petrochemicals > Polymers & Plastics (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Solvaformer: an SE(3)-equivariant graph transformer for small molecule solubility prediction

Broadbent, Jonathan, Bailey, Michael, Li, Mingxuan, Paul, Abhishek, De Lescure, Louis, Chauvin, Paul, Kogler-Anele, Lorenzo, Jangjou, Yasser, Jager, Sven

arXiv.org Artificial IntelligenceNov-14-2025

Accurate prediction of small molecule solubility using material-sparing approaches is critical for accelerating synthesis and process optimization, yet experimental measurement is costly and many learning approaches either depend on quantumderived descriptors or offer limited interpretability. We introduce Solvaformer, a geometry-aware graph transformer that models solutions as multiple molecules with independent SE(3) symmetries. The architecture combines intramolecular SE(3)-equivariant attention with intermolecular scalar attention, enabling cross-molecular communication without imposing spurious relative geometry. We train Solvaformer in a multi-task setting to predict both solubility (log S) and solvation free energy, using an alternating-batch regimen that trains on quantum-mechanical data (CombiSolv-QM) and on experimental measurements (BigSolDB 2.0). Solvaformer attains the strongest overall performance among the learned models and approaches a DFT-assisted gradient-boosting baseline, while outperforming an EquiformerV2 ablation and sequence-based alternatives. In addition, token-level attention produces chemically coherent attributions: case studies recover known intra- vs. inter-molecular hydrogen-bonding patterns that govern solubility differences in positional isomers. Taken together, Solvaformer provides an accurate, scalable, and interpretable approach to solution-phase property prediction by uniting geometric inductive bias with a mixed dataset training strategy on complementary computational and experimental data.

artificial intelligence, machine learning, solvaformer, (17 more...)

arXiv.org Artificial Intelligence

2511.09774

Country: Europe (0.28)

Genre: Research Report (0.82)

Industry: Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)

Add feedback

AReUReDi: Annealed Rectified Updates for Refining Discrete Flows with Multi-Objective Guidance

Chen, Tong, Zhang, Yinuo, Chatterjee, Pranam

arXiv.org Artificial IntelligenceOct-6-2025

Designing sequences that satisfy multiple, often conflicting, objectives is a central challenge in therapeutic and biomolecular engineering. Existing generative frameworks largely operate in continuous spaces with single-objective guidance, while discrete approaches lack guarantees for multi-objective Pareto optimality. We introduce AReUReDi (Annealed Rectified Updates for Refining Discrete Flows), a discrete optimization algorithm with theoretical guarantees of convergence to the Pareto front. Building on Rectified Discrete Flows (ReDi), AReUReDi combines Tchebycheff scalarization, locally balanced proposals, and annealed Metropolis-Hastings updates to bias sampling toward Pareto-optimal states while preserving distributional invariance. Applied to peptide and SMILES sequence design, AReUReDi simultaneously optimizes up to five therapeutic properties (including affinity, solubility, hemolysis, half-life, and non-fouling) and outperforms both evolutionary and diffusion-based baselines. These results establish AReUReDi as a powerful, sequence-based framework for multi-property biomolecule generation.

artificial intelligence, evolutionary algorithm, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.00352

Country: North America > United States (0.68)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)

Add feedback

Convolutional Networks on Graphs for Learning Molecular Fingerprints

Neural Information Processing SystemsOct-2-2025, 16:43:17 GMT

We introduce a convolutional neural network that operates directly on graphs. These networks allow end-to-end learning of prediction pipelines whose inputs are graphs of arbitrary size and shape. The architecture we present generalizes standard molecular feature extraction methods based on circular fingerprints. We show that these data-driven features are more interpretable, and have better predictive performance on a variety of tasks.

artificial intelligence, fingerprint, machine learning, (16 more...)

Neural Information Processing Systems

Industry:

Health & Medicine (1.00)
Energy (0.69)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)

Add feedback

Meta-Learning Linear Models for Molecular Property Prediction

Pimonova, Yulia, Taylor, Michael G., Allen, Alice, Yang, Ping, Lubbers, Nicholas

arXiv.org Artificial IntelligenceSep-18-2025

Chemists in search of structure-property relationships face great challenges due to limited high quality, concordant datasets. Machine learning (ML) has significantly advanced predictive capabilities in chemical sciences, but these modern data-driven approaches have increased the demand for data. In response to the growing demand for explainable AI (XAI) and to bridge the gap between predictive accuracy and human comprehensibility, we introduce LAMeL - a Linear Algorithm for Meta-Learning that preserves interpretability while improving the prediction accuracy across multiple properties. While most approaches treat each chemical prediction task in isolation, LAMeL leverages a meta-learning framework to identify shared model parameters across related tasks, even if those tasks do not share data, allowing it to learn a common functional manifold that serves as a more informed starting point for new unseen tasks. Our method delivers performance improvements ranging from 1.1- to 25-fold over standard ridge regression, depending on the domain of the dataset. While the degree of performance enhancement varies across tasks, LAMeL consistently outperforms or matches traditional linear methods, making it a reliable tool for chemical property prediction where both accuracy and interpretability are critical.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.13527

Country: North America > United States (0.95)

Genre: Research Report > New Finding (1.00)

Industry:

Energy (1.00)
Government (0.94)
Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.76)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Steering Protein Language Models

Huang, Long-Kai, Zhu, Rongyi, He, Bing, Yao, Jianhua

arXiv.org Artificial IntelligenceSep-15-2025

Protein Language Models (PLMs), pre-trained on extensive evolutionary data from natural proteins, have emerged as indispensable tools for protein design. While powerful, PLMs often struggle to produce proteins with precisely specified functionalities or properties due to inherent challenges in controlling their outputs. In this work, we investigate the potential of Activation Steering, a technique originally developed for controlling text generation in Large Language Models (LLMs), to direct PLMs toward generating protein sequences with targeted properties. We propose a simple yet effective method that employs activation editing to steer PLM outputs, and extend this approach to protein optimization through a novel editing site identification module. Through comprehensive experiments on lysozyme-like sequence generation and optimization, we demonstrate that our methods can be seamlessly integrated into both auto-encoding and autoregressive PLMs without requiring additional training. These results highlight a promising direction for precise protein engineering using foundation models.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.07983

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Enhancing Chemical Explainability Through Counterfactual Masking

Janisiów, Łukasz, Kochańczyk, Marek, Zieliński, Bartosz, Danel, Tomasz

arXiv.org Artificial IntelligenceAug-27-2025

Molecular property prediction is a crucial task that guides the design of new compounds, including drugs and materials. While explainable artificial intelligence methods aim to scrutinize model predictions by identifying influential molecular substructures, many existing approaches rely on masking strategies that remove either atoms or atom-level features to assess importance via fidelity metrics. These methods, however, often fail to adhere to the underlying molecular distribution and thus yield unintuitive explanations. In this work, we propose counterfactual masking, a novel framework that replaces masked substructures with chemically reasonable fragments sampled from generative models trained to complete molecular graphs. Rather than evaluating masked predictions against implausible zeroed-out baselines, we assess them relative to counterfactual molecules drawn from the data distribution. Our method offers two key benefits: (1) molecular realism underpinning robust and distribution-consistent explanations, and (2) meaningful counterfactuals that directly indicate how structural modifications may affect predicted properties. We demonstrate that counterfactual masking is well-suited for benchmarking model explainers and yields more actionable insights across multiple datasets and property prediction tasks.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2508.18561

Country: Europe > Poland (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Fragment-Wise Interpretability in Graph Neural Networks via Molecule Decomposition and Contribution Analysis

Musiał, Sebastian, Zieliński, Bartosz, Danel, Tomasz

arXiv.org Artificial IntelligenceAug-22-2025

Graph neural networks have demonstrated remarkable success in predicting molecular properties by leveraging the rich structural information encoded in molecular graphs. However, their black-box nature reduces interpretability, which limits trust in their predictions for important applications such as drug discovery and materials design. Furthermore, existing explanation techniques often fail to reliably quantify the contribution of individual atoms or substructures due to the entangled message-passing dynamics. We introduce SEAL (Substructure Explanation via Attribution Learning), a new interpretable graph neural network that attributes model predictions to meaningful molecular subgraphs. SEAL decomposes input graphs into chemically relevant fragments and estimates their causal influence on the output. The strong alignment between fragment contributions and model predictions is achieved by explicitly reducing inter-fragment message passing in our proposed model architecture. Extensive evaluations on synthetic benchmarks and real-world molecular datasets demonstrate that SEAL outperforms other explain-ability methods in both quantitative attribution metrics and human-aligned interpretability. A user study further confirms that SEAL provides more intuitive and trustworthy explanations to domain experts. By bridging the gap between predictive performance and interpretability, SEAL offers a promising direction for more transparent and actionable molecular modeling.

artificial intelligence, contribution, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2508.15015

Country: Europe (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Enhancing Drug Discovery: Autoencoder-Based Latent Space Augmentation for Improved Molecular Solubility Prediction using LatMixSol

Hasankhani, Mohammad Saleh

arXiv.org Artificial IntelligenceJun-3-2025

Accurate prediction of molecular solubility is a cornerstone of early-stage drug discovery, yet conventional machine learning models face significant challenges due to limited labeled data and the high-dimensional nature of molecular descriptors. To address these issues, we propose LatMixSol, a novel latent space augmentation framework that combines autoencoder-based feature compression with guided interpolation to enrich training data. Our approach first encodes molecular descriptors into a low-dimensional latent space using a two-layer autoencoder. Spectral clustering is then applied to group chemically similar molecules, enabling targeted MixUp-style interpolation within clusters. Synthetic samples are generated by blending latent vectors of cluster members and decoding them back to the original feature space. Evaluated on the Huuskonen solubility benchmark, LatMixSol demonstrates consistent improvements across three of four gradient-boosted regressors (CatBoost, LightGBM, HistGradientBoosting), achieving RMSE reductions of 3.2-7.6% and R-squared increases of 0.5-1.5%. Notably, HistGradientBoosting shows the most significant enhancement with a 7.6% RMSE improvement. Our analysis confirms that cluster-guided latent space augmentation preserves chemical validity while expanding dataset diversity, offering a computationally efficient strategy to enhance predictive models in resource-constrained drug discovery pipelines.

artificial intelligence, machine learning, prediction, (18 more...)

arXiv.org Artificial Intelligence

2506.00223

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback