STimage-1K4M: A histopathology image-gene expression dataset for spatial transcriptomics
–Neural Information Processing Systems
Recent advances in multi-modal algorithms have driven and been driven by the increasing availability of large image-text datasets, leading to significant strides in various fields, including computational pathology. However, in most existing medical image-text datasets, the text typically provides high-level summaries that may not sufficiently describe sub-tile regions within a large pathology image. For example, an image might cover an extensive tissue area containing cancerous and healthy regions, but the accompanying text might only specify that this image is a cancer slide, lacking the nuanced details needed for in-depth analysis. In this study, we introduce STimage-1K4M, a novel dataset designed to bridge this gap by providing genomic features for sub-tile images. STimage-1K4M contains 1,149 images derived from spatial transcriptomics data, which captures gene expression information at the level of individual spatial spots within a pathology image. Specifically, each image in the dataset is broken down into smaller sub-image tiles, with each tile paired with 15, 000 30, 000 dimensional gene expressions. With 4, 293, 195 pairs of sub-tile images and gene expressions, STimage-1K4M offers unprecedented granularity, paving the way for a wide range of advanced research in multi-modal data analysis an innovative applications in computational pathology, and beyond.
Neural Information Processing Systems
May-29-2025, 06:14:13 GMT
- Country:
- Asia (0.14)
- Europe (0.14)
- North America > United States (0.14)
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Health & Medicine
- Pharmaceuticals & Biotechnology (1.00)
- Therapeutic Area
- Dermatology (0.93)
- Gastroenterology (0.67)
- Hepatology (0.67)
- Immunology (1.00)
- Infections and Infectious Diseases (1.00)
- Neurology (1.00)
- Oncology (1.00)
- Health & Medicine
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning
- Neural Networks > Deep Learning (0.67)
- Statistical Learning > Regression (0.46)
- Natural Language > Large Language Model (0.68)
- Representation & Reasoning (0.93)
- Vision (0.93)
- Machine Learning
- Communications > Social Media (0.93)
- Sensing and Signal Processing > Image Processing (1.00)
- Artificial Intelligence
- Information Technology