Goto

Collaborating Authors

 Ji, Yuanfeng


SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding

arXiv.org Artificial Intelligence

Despite the progress made by multimodal large language models (MLLMs) in computational pathology, they remain limited by a predominant focus on patch-level analysis, missing essential contextual information at the whole-slide level. The lack of large-scale instruction datasets and the gigapixel scale of whole slide images (WSIs) pose significant developmental challenges. In this paper, we present SlideChat, the first vision-language assistant capable of understanding gigapixel whole-slide images, exhibiting excellent multimodal conversational capability and response complex instruction across diverse pathology scenarios. To support its development, we created SlideInstruction, the largest instruction-following dataset for WSIs consisting of 4.2K WSI captions and 176K VQA pairs with multiple categories. Furthermore, we propose SlideBench, a multimodal benchmark that incorporates captioning and VQA tasks to assess SlideChat's capabilities in varied clinical settings such as microscopy, diagnosis. Compared to both general and specialized MLLMs, SlideChat exhibits exceptional capabilities achieving state-of-the-art performance on 18 of 22 tasks. For example, it achieved an overall accuracy of 81.17% on SlideBench-VQA (TCGA), and 54.15% on SlideBench-VQA (BCNB). We will fully release SlideChat, SlideInstruction and SlideBench as open-source resources to facilitate research and development in computational pathology.


MedShapeNet -- A Large-Scale Dataset of 3D Medical Shapes for Computer Vision

arXiv.org Artificial Intelligence

Prior to the deep learning era, shape was commonly used to describe the objects. Nowadays, state-of-the-art (SOTA) algorithms in medical imaging are predominantly diverging from computer vision, where voxel grids, meshes, point clouds, and implicit surface models are used. This is seen from numerous shape-related publications in premier vision conferences as well as the growing popularity of ShapeNet (about 51,300 models) and Princeton ModelNet (127,915 models). For the medical domain, we present a large collection of anatomical shapes (e.g., bones, organs, vessels) and 3D models of surgical instrument, called MedShapeNet, created to facilitate the translation of data-driven vision algorithms to medical applications and to adapt SOTA vision algorithms to medical problems. As a unique feature, we directly model the majority of shapes on the imaging data of real patients. As of today, MedShapeNet includes 23 dataset with more than 100,000 shapes that are paired with annotations (ground truth). Our data is freely accessible via a web interface and a Python application programming interface (API) and can be used for discriminative, reconstructive, and variational benchmarks as well as various applications in virtual, augmented, or mixed reality, and 3D printing. Exemplary, we present use cases in the fields of classification of brain tumors, facial and skull reconstructions, multi-class anatomy completion, education, and 3D printing. In future, we will extend the data and improve the interfaces. The project pages are: https://medshapenet.ikim.nrw/ and https://github.com/Jianningli/medshapenet-feedback


SyNDock: N Rigid Protein Docking via Learnable Group Synchronization

arXiv.org Artificial Intelligence

The regulation of various cellular processes heavily relies on the protein complexes within a living cell, necessitating a comprehensive understanding of their three-dimensional structures to elucidate the underlying mechanisms. While neural docking techniques have exhibited promising outcomes in binary protein docking, the application of advanced neural architectures to multimeric protein docking remains uncertain. This study introduces SyNDock, an automated framework that swiftly assembles precise multimeric complexes within seconds, showcasing performance that can potentially surpass or be on par with recent advanced approaches. SyNDock possesses several appealing advantages not present in previous approaches. Firstly, SyNDock formulates multimeric protein docking as a problem of learning global transformations to holistically depict the placement of chain units of a complex, enabling a learning-centric solution. Secondly, SyNDock proposes a trainable two-step SE(3) algorithm, involving initial pairwise transformation and confidence estimation, followed by global transformation synchronization. This enables effective learning for assembling the complex in a globally consistent manner. Lastly, extensive experiments conducted on our proposed benchmark dataset demonstrate that SyNDock outperforms existing docking software in crucial performance metrics, including accuracy and runtime. For instance, it achieves a 4.5% improvement in performance and a remarkable millionfold acceleration in speed.


DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise Annotations

arXiv.org Artificial Intelligence

AI-aided drug discovery (AIDD) is gaining increasing popularity due to its promise of making the search for new pharmaceuticals quicker, cheaper and more efficient. In spite of its extensive use in many fields, such as ADMET prediction, virtual screening, protein folding and generative chemistry, little has been explored in terms of the out-of-distribution (OOD) learning problem with \emph{noise}, which is inevitable in real world AIDD applications. In this work, we present DrugOOD, a systematic OOD dataset curator and benchmark for AI-aided drug discovery, which comes with an open-source Python package that fully automates the data curation and OOD benchmarking processes. We focus on one of the most crucial problems in AIDD: drug target binding affinity prediction, which involves both macromolecule (protein target) and small-molecule (drug compound). In contrast to only providing fixed datasets, DrugOOD offers automated dataset curator with user-friendly customization scripts, rich domain annotations aligned with biochemistry knowledge, realistic noise annotations and rigorous benchmarking of state-of-the-art OOD algorithms. Since the molecular data is often modeled as irregular graphs using graph neural network (GNN) backbones, DrugOOD also serves as a valuable testbed for \emph{graph OOD learning} problems. Extensive empirical studies have shown a significant performance gap between in-distribution and out-of-distribution experiments, which highlights the need to develop better schemes that can allow for OOD generalization under noise for AIDD.