target property
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Europe > Switzerland (0.04)
- North America > Canada (0.04)
- Asia > China (0.04)
- North America > United States (0.04)
Molecule Design by Latent Prompt Transformer
This work explores the challenging problem of molecule design by framing it as a conditional generative modeling task, where target biological properties or desired chemical constraints serve as conditioning variables.We propose the Latent Prompt Transformer (LPT), a novel generative model comprising three components: (1) a latent vector with a learnable prior distribution modeled by a neural transformation of Gaussian white noise; (2) a molecule generation model based on a causal Transformer, which uses the latent vector as a prompt; and (3) a property prediction model that predicts a molecule's target properties and/or constraint values using the latent prompt. LPT can be learned by maximum likelihood estimation on molecule-property pairs. During property optimization, the latent prompt is inferred from target properties and constraints through posterior sampling and then used to guide the autoregressive molecule generation.After initial training on existing molecules and their properties, we adopt an online learning algorithm to progressively shift the model distribution towards regions that support desired target properties. Experiments demonstrate that LPT not only effectively discovers useful molecules across single-objective, multi-objective, and structure-constrained optimization tasks, but also exhibits strong sample efficiency.
Property-Aware Relation Networks for Few-Shot Molecular Property Prediction
Molecular property prediction plays a fundamental role in drug discovery to identify candidate molecules with target properties. However, molecular property prediction is essentially a few-shot problem, which makes it hard to use regular machine learning models. In this paper, we propose Property-Aware Relation networks (PAR) to handle this problem. In comparison to existing works, we leverage the fact that both relevant substructures and relationships among molecules change across different molecular properties. We first introduce a property-aware embedding function to transform the generic molecular embeddings to substructure-aware space relevant to the target property. Further, we design an adaptive relation graph learning module to jointly estimate molecular relation graph and refine molecular embeddings w.r.t. the target property, such that the limited labels can be effectively propagated among similar molecules. We adopt a meta-learning strategy where the parameters are selectively updated within tasks in order to model generic and property-aware knowledge separately. Extensive experiments on benchmark molecular property prediction datasets show that PAR consistently outperforms existing methods and can obtain property-aware molecular embeddings and model molecular relation graph properly.
Accelerating Materials Discovery: Learning a Universal Representation of Chemical Processes for Cross-Domain Property Prediction
Tsitsvero, Mikhail, Nakao, Atsuyuki, Ikebata, Hisaki
Experimental validation of chemical processes is slow and costly, limiting exploration in materials discovery. Machine learning can prioritize promising candidates, but existing data in patents and literature is heterogeneous and difficult to use. We introduce a universal directed-tree process-graph representation that unifies unstructured text, molecular structures, and numeric measurements into a single machine-readable format. To learn from this structured data, we developed a multi-modal graph neural network with a property-conditioned attention mechanism. Trained on approximately 700,000 process graphs from nearly 9,000 diverse documents, our model learns semantically rich embeddings that generalize across domains. When fine-tuned on compact, domain-specific datasets, the pretrained model achieves strong performance, demonstrating that universal process representations learned at scale transfer effectively to specialized prediction tasks with minimal additional data.
- North America > United States (0.46)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Asia > Middle East > Israel > Southern District (0.04)
- Asia > China (0.04)
- Research Report (1.00)
- Workflow (0.94)
EGMOF: Efficient Generation of Metal-Organic Frameworks Using a Hybrid Diffusion-Transformer Architecture
Han, Seunghee, Kang, Yeonghun, Bae, Taeun, Bernales, Varinia, Aspuru-Guzik, Alan, Kim, Jihan
Designing materials with targeted properties remain s challenging due to the vastness of chemical space and the scarcity of propert y-labeled data. While r ecent advances in generative models offer a promising w ay for inverse design, most approaches require large datasets and must be retrained for every new target property. Here, we introduce the EGMOF ( Efficient Generation of MOFs), a hybrid diffusion-transformer framework that overcome s these limitations through a modular, descriptor - mediated workflow. EGMOF decomposes inverse design into two steps: (1) a one -dimensional diffusion model (Prop2Desc) that maps desired properties to chemically meaningful descriptors followed by (2) a transformer model (Desc2MOF) that generates structures from the se descriptors. This modular hybrid design enables minimal retraining and maintains high accuracy even under small-data conditions. On a hydrogen uptake dataset, EGMOF achieved over 95 % validity and 84% hit rate, representing significant improvements of up to 57 % in validity and 14% in hit rate compared to existing methods, while remaining effective with only 1,000 training samples . Moreover, our model successfully performed conditional generation across 29 diverse property datasets, including CoREMOF, QMOF, and text - mined experimental datasets, whereas previous models have not. This work presents a data - efficient, generalizable approach to the inverse design of diverse MOFs and highlights the potential of modular inverse design workflows for broader materials discovery.
- North America > Canada > Ontario > Toronto (0.17)
- Asia > South Korea > Daejeon > Daejeon (0.04)
- Workflow (1.00)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Can We Infer Confidential Properties of Training Data from LLMs?
Huang, Pengrun, Yadav, Chhavi, Chaudhuri, Kamalika, Wu, Ruihan
Large language models (LLMs) are increasingly fine-tuned on domain-specific datasets to support applications in fields such as healthcare, finance, and law. These fine-tuning datasets often have sensitive and confidential dataset-level properties -- such as patient demographics or disease prevalence -- that are not intended to be revealed. While prior work has studied property inference attacks on discriminative models (e.g., image classification models) and generative models (e.g., GANs for image data), it remains unclear if such attacks transfer to LLMs. In this work, we introduce PropInfer, a benchmark task for evaluating property inference in LLMs under two fine-tuning paradigms: question-answering and chat-completion. Built on the ChatDoctor dataset, our benchmark includes a range of property types and task configurations. We further propose two tailored attacks: a prompt-based generation attack and a shadow-model attack leveraging word frequency signals. Empirical evaluations across multiple pretrained LLMs show the success of our attacks, revealing a previously unrecognized vulnerability in LLMs.