Li, Yanjun
SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding
Chen, Ying, Wang, Guoan, Ji, Yuanfeng, Li, Yanjun, Ye, Jin, Li, Tianbin, Zhang, Bin, Pei, Nana, Yu, Rongshan, Qiao, Yu, He, Junjun
Despite the progress made by multimodal large language models (MLLMs) in computational pathology, they remain limited by a predominant focus on patch-level analysis, missing essential contextual information at the whole-slide level. The lack of large-scale instruction datasets and the gigapixel scale of whole slide images (WSIs) pose significant developmental challenges. In this paper, we present SlideChat, the first vision-language assistant capable of understanding gigapixel whole-slide images, exhibiting excellent multimodal conversational capability and response complex instruction across diverse pathology scenarios. To support its development, we created SlideInstruction, the largest instruction-following dataset for WSIs consisting of 4.2K WSI captions and 176K VQA pairs with multiple categories. Furthermore, we propose SlideBench, a multimodal benchmark that incorporates captioning and VQA tasks to assess SlideChat's capabilities in varied clinical settings such as microscopy, diagnosis. Compared to both general and specialized MLLMs, SlideChat exhibits exceptional capabilities achieving state-of-the-art performance on 18 of 22 tasks. For example, it achieved an overall accuracy of 81.17% on SlideBench-VQA (TCGA), and 54.15% on SlideBench-VQA (BCNB). We will fully release SlideChat, SlideInstruction and SlideBench as open-source resources to facilitate research and development in computational pathology.
Morphological Profiling for Drug Discovery in the Era of Deep Learning
Tang, Qiaosi, Ratnayake, Ranjala, Seabra, Gustavo, Jiang, Zhe, Fang, Ruogu, Cui, Lina, Ding, Yousong, Kahveci, Tamer, Bian, Jiang, Li, Chenglong, Luesch, Hendrik, Li, Yanjun
Morphological profiling is a valuable tool in phenotypic drug discovery. The advent of high-throughput automated imaging has enabled the capturing of a wide range of morphological features of cells or organisms in response to perturbations at the single-cell resolution. Concurrently, significant advances in machine learning and deep learning, especially in computer vision, have led to substantial improvements in analyzing large-scale high-content images at high-throughput. These efforts have facilitated understanding of compound mechanism-of-action (MOA), drug repurposing, characterization of cell morphodynamics under perturbation, and ultimately contributing to the development of novel therapeutics. In this review, we provide a comprehensive overview of the recent advances in the field of morphological profiling. We summarize the image profiling analysis workflow, survey a broad spectrum of analysis strategies encompassing feature engineering- and deep learning-based approaches, and introduce publicly available benchmark datasets. We place a particular emphasis on the application of deep learning in this pipeline, covering cell segmentation, image representation learning, and multimodal learning. Additionally, we illuminate the application of morphological profiling in phenotypic drug discovery and highlight potential challenges and opportunities in this field.
BatmanNet: Bi-branch Masked Graph Transformer Autoencoder for Molecular Representation
Wang, Zhen, Feng, Zheng, Li, Yanjun, Li, Bowen, Wang, Yongrui, Sha, Chulin, He, Min, Li, Xiaolin
Although substantial efforts have been made using graph neural networks (GNNs) for AI-driven drug discovery (AIDD), effective molecular representation learning remains an open challenge, especially in the case of insufficient labeled molecules. Recent studies suggest that big GNN models pre-trained by self-supervised learning on unlabeled datasets enable better transfer performance in downstream molecular property prediction tasks. However, the approaches in these studies require multiple complex self-supervised tasks and large-scale datasets, which are time-consuming, computationally expensive, and difficult to pre-train end-to-end. Here, we design a simple yet effective self-supervised strategy to simultaneously learn local and global information about molecules, and further propose a novel bi-branch masked graph transformer autoencoder (BatmanNet) to learn molecular representations. BatmanNet features two tailored complementary and asymmetric graph autoencoders to reconstruct the missing nodes and edges, respectively, from a masked molecular graph. With this design, BatmanNet can effectively capture the underlying structure and semantic information of molecules, thus improving the performance of molecular representation. BatmanNet achieves state-of-the-art results for multiple drug discovery tasks, including molecular properties prediction, drug-drug interaction, and drug-target interaction, on 13 benchmark datasets, demonstrating its great potential and superiority in molecular representation learning.
Identifying acute illness phenotypes via deep temporal interpolation and clustering network on physiologic signatures
Ren, Yuanfang, Li, Yanjun, Loftus, Tyler J., Balch, Jeremy, Abbott, Kenneth L., Datta, Shounak, Ruppert, Matthew M., Guan, Ziyuan, Shickel, Benjamin, Rashidi, Parisa, Ozrazgat-Baslanti, Tezcan, Bihorac, Azra
Initial hours of hospital admission impact clinical trajectory, but early clinical decisions often suffer due to data paucity. With clustering analysis for vital signs within six hours of admission, patient phenotypes with distinct pathophysiological signatures and outcomes may support early clinical decisions. We created a single-center, longitudinal EHR dataset for 75,762 adults admitted to a tertiary care center for 6+ hours. We proposed a deep temporal interpolation and clustering network to extract latent representations from sparse, irregularly sampled vital sign data and derived distinct patient phenotypes in a training cohort (n=41,502). Model and hyper-parameters were chosen based on a validation cohort (n=17,415). Test cohort (n=16,845) was used to analyze reproducibility and correlation with biomarkers. The training, validation, and testing cohorts had similar distributions of age (54-55 yrs), sex (55% female), race, comorbidities, and illness severity. Four clusters were identified. Phenotype A (18%) had most comorbid disease with higher rate of prolonged respiratory insufficiency, acute kidney injury, sepsis, and three-year mortality. Phenotypes B (33%) and C (31%) had diffuse patterns of mild organ dysfunction. Phenotype B had favorable short-term outcomes but second-highest three-year mortality. Phenotype C had favorable clinical outcomes. Phenotype D (17%) had early/persistent hypotension, high rate of early surgery, and substantial biomarker rate of inflammation but second-lowest three-year mortality. After comparing phenotypes' SOFA scores, clustering results did not simply repeat other acuity assessments. In a heterogeneous cohort, four phenotypes with distinct categories of disease and outcomes were identified by a deep temporal interpolation and clustering network. This tool may impact triage decisions and clinical decision-support under time constraints.
Tableaux for the Logic of Strategically Knowing How
Li, Yanjun
Epistemic logic proposed by von Wright and Hintikka (see [24, 11]) is a logical formalism for reasoning about knowledge of agents. It deals with propositional knowledge, that is, the knowledge expressed as knowing that ฯ is true. In recent years, other patterns of knowledge besides knowing that are attracting increasing attention in logic community, such as knowing whether [8, 4], knowing who [3], knowing the value [2, 6], and knowing why [28] (see a survey in [27]). Motivated by different scenarios in philosophy and AI, reasoning about knowing how assertions are particularly interesting [23]. The discussion about formalizing the notion of knowing how can date back to [16, 17]. Currently, there are two main approaches of formalizing knowing how. One of them is connecting knowing how with logics of knowing that and ability (see e.g.
A Bioinspired Synthetic Nervous System Controller for Pick-and-Place Manipulation
Li, Yanjun, Sukhnandan, Ravesh, Gill, Jeffrey P., Chiel, Hillel J., Webster-Wood, Victoria, Quinn, Roger D.
The Synthetic Nervous System (SNS) is a biologically inspired neural network (NN). Due to its capability of capturing complex mechanisms underlying neural computation, an SNS model is a candidate for building compact and interpretable NN controllers for robots. Previous work on SNSs has focused on applying the model to the control of legged robots and the design of functional subnetworks (FSNs) to realize dynamical systems. However, the FSN approach has previously relied on the analytical solution of the governing equations, which is difficult for designing more complex NN controllers. Incorporating plasticity into SNSs and using learning algorithms to tune the parameters offers a promising solution for systematic design in this situation. In this paper, we theoretically analyze the computational advantages of SNSs compared with other classical artificial neural networks. We then use learning algorithms to develop compact subnetworks for implementing addition, subtraction, division, and multiplication. We also combine the learning-based methodology with a bioinspired architecture to design an interpretable SNS for the pick-and-place control of a simulated gantry system. Finally, we show that the SNS controller is successfully transferred to a real-world robotic platform without further tuning of the parameters, verifying the effectiveness of our approach.
Knowing How to Plan
Li, Yanjun, Wang, Yanjing
Various planning-based know-how logics have been studied in the recent literature. In this paper, we use such a logic to do know-how-based planning via model checking. In particular, we can handle the higher-order epistemic planning involving know-how formulas as the goal, e.g., find a plan to make sure p such that the adversary does not know how to make p false in the future. We give a PTIME algorithm for the model checking problem over finite epistemic transition systems and axiomatize the logic under the assumption of perfect recall.
PRI-VAE: Principle-of-Relevant-Information Variational Autoencoders
Li, Yanjun, Yu, Shujian, Principe, Jose C., Li, Xiaolin, Wu, Dapeng
Although substantial efforts have been made to learn disentangled representations under the variational autoencoder (VAE) framework, the fundamental properties to the dynamics of learning of most VAE models still remain unknown and under-investigated. We then present an information-theoretic perspective to analyze existing VAE models by inspecting the evolution of some critical information-theoretic quantities across training epochs. Our observations unveil some fundamental properties associated with VAEs. Empirical results also demonstrate the effectiveness of PRI-VAE on four benchmark data sets. A central goal for representation learning models is that the resulting latent representation should be compact yet disentangled. Compact requires the representation z does not contain any nuance factors in the input signal x that are not relevant for the desired response y [1], whereas disentangled means that z is factorizable and has consistent semantics associated to different generating factors of the underlying data generation process. Yanjun Li, Jose C. Principe and Dapeng Wu are with the NSF Center for Big Learning, University of Florida, U.S.A (email: yanjun.li@ufl.edu, Shujian Yu is with the Machine Learning Group, NEC Laboratories Europe, Germany (email: Shujian.Yu@neclab.eu). To whom correspondence should be addressed.
Comfort-Centered Design of a Lightweight and Backdrivable Knee Exoskeleton
Wang, Junlin, Li, Xiao, Huang, Tzu-Hao, Yu, Shuangyue, Li, Yanjun, Chen, Tianyao, Carriero, Alessandra, Oh-Park, Mooyeon, Su, Hao
This paper presents design principles for comfort-centered wearable robots and their application in a lightweight and backdrivable knee exoskeleton. The mitigation of discomfort is treated as mechanical design and control issues and three solutions are proposed in this paper: 1) a new wearable structure optimizes the strap attachment configuration and suit layout to ameliorate excessive shear forces of conventional wearable structure design; 2) rolling knee joint and double-hinge mechanisms reduce the misalignment in the sagittal and frontal plane, without increasing the mechanical complexity and inertia, respectively; 3) a low impedance mechanical transmission reduces the reflected inertia and damping of the actuator to human, thus the exoskeleton is highly-backdrivable. Kinematic simulations demonstrate that misalignment between the robot joint and knee joint can be reduced by 74% at maximum knee flexion. In experiments, the exoskeleton in the unpowered mode exhibits 1.03 Nm root mean square (RMS) low resistive torque. The torque control experiments demonstrate 0.31 Nm RMS torque tracking error in three human subjects.
Global Geometry of Multichannel Sparse Blind Deconvolution on the Sphere
Li, Yanjun, Bresler, Yoram
Multichannel blind deconvolution is the problem of recovering an unknown signal $f$ and multiple unknown channels $x_i$ from convolutional measurements $y_i=x_i \circledast f$ ($i=1,2,\dots,N$). We consider the case where the $x_i$'s are sparse, and convolution with $f$ is invertible. Our nonconvex optimization formulation solves for a filter $h$ on the unit sphere that produces sparse output $y_i\circledast h$. Under some technical assumptions, we show that all local minima of the objective function correspond to the inverse filter of $f$ up to an inherent sign and shift ambiguity, and all saddle points have strictly negative curvatures. This geometric structure allows successful recovery of $f$ and $x_i$ using a simple manifold gradient descent algorithm with random initialization. Our theoretical findings are complemented by numerical experiments, which demonstrate superior performance of the proposed approach over the previous methods.