ADataset for Distilling Knowledge Priors from Literature for Therapeutic Design

Jun-17-2026, 03:48:01 GMT–Neural Information Processing Systems

AI-driven discovery can greatly reduce design time and enhance new therapeutics' effectiveness. Models using simulators explore broad design spaces but risk violating implicit constraints due to a lack of experimental priors. For example, in a new analysis across diverse models on the GuacaMol benchmark using supervised classifiers, over 60% of molecules proposed had a high probability of being mutagenic. In this work, we introduce Medex, a dataset of priors for design problems extracted from literature describing compounds used in lab settings. It is constructed with LLM pipelines for discovering therapeutic entities in relevant paragraphs and summarizing information in concise fair-use facts. Medex consists of 32.3 million pairs of natural language facts, and appropriate entity representations (i.e.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Jun-17-2026, 03:48:01 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (1.00)

Genre:
- Overview (0.92)
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.67)

Industry:
- Health & Medicine
  - Pharmaceuticals & Biotechnology (1.00)
  - Therapeutic Area > Infections and Infectious Diseases (0.93)
- Government > Regional Government
  - North America Government > United States Government (0.92)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Performance Analysis > Accuracy (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found