Not enough data to create a plot.
Try a different view from the menu above.
Jiang, Chen
Aneumo: A Large-Scale Comprehensive Synthetic Dataset of Aneurysm Hemodynamics
Li, Xigui, Zhou, Yuanye, Xiao, Feiyang, Guo, Xin, Zhang, Yichi, Jiang, Chen, Ge, Jianchao, Wang, Xiansheng, Wang, Qimeng, Zhang, Taiwei, Lin, Chensen, Cheng, Yuan, Qi, Yuan
Intracranial aneurysm (IA) is a common cerebrovascular disease that is usually asymptomatic but may cause severe subarachnoid hemorrhage (SAH) if ruptured. Although clinical practice is usually based on individual factors and morphological features of the aneurysm, its pathophysiology and hemodynamic mechanisms remain controversial. To address the limitations of current research, this study constructed a comprehensive hemodynamic dataset of intracranial aneurysms. The dataset is based on 466 real aneurysm models, and 10,000 synthetic models were generated by resection and deformation operations, including 466 aneurysm-free models and 9,534 deformed aneurysm models. The dataset also provides medical image-like segmentation mask files to support insightful analysis. In addition, the dataset contains hemodynamic data measured at eight steady-state flow rates (0.001 to 0.004 kg/s), including critical parameters such as flow velocity, pressure, and wall shear stress, providing a valuable resource for investigating aneurysm pathogenesis and clinical prediction. This dataset will help advance the understanding of the pathologic features and hemodynamic mechanisms of intracranial aneurysms and support in-depth research in related fields. Dataset hosted at https://github.com/Xigui-Li/Aneumo.
Personalize to generalize: Towards a universal medical multi-modality generalization through personalization
Tan, Zhaorui, Yang, Xi, Pan, Tan, Liu, Tianyi, Jiang, Chen, Guo, Xin, Wang, Qiufeng, Nguyen, Anh, Qi, Yuan, Huang, Kaizhu, Cheng, Yuan
The differences among medical imaging modalities, driven by distinct underlying principles, pose significant challenges for generalization in multi-modal medical tasks. Beyond modality gaps, individual variations, such as differences in organ size and metabolic rate, further impede a model's ability to generalize effectively across both modalities and diverse populations. Despite the importance of personalization, existing approaches to multi-modal generalization often neglect individual differences, focusing solely on common anatomical features. This limitation may result in weakened generalization in various medical tasks. In this paper, we unveil that personalization is critical for multi-modal generalization. Specifically, we propose an approach to achieve personalized generalization through approximating the underlying personalized invariant representation ${X}_h$ across various modalities by leveraging individual-level constraints and a learnable biological prior. We validate the feasibility and benefits of learning a personalized ${X}_h$, showing that this representation is highly generalizable and transferable across various multi-modal medical tasks. Extensive experimental results consistently show that the additionally incorporated personalization significantly improves performance and generalization across diverse scenarios, confirming its effectiveness.
On the Sequence Evaluation based on Stochastic Processes
Zhang, Tianhao, Lin, Zhexiao, Sheng, Zhecheng, Jiang, Chen, Kang, Dongyeop
Modeling and analyzing long sequences of text is an essential task for Natural Language Processing. Success in capturing long text dynamics using neural language models will facilitate many downstream tasks such as coherence evaluation, text generation, machine translation and so on. This paper presents a novel approach to model sequences through a stochastic process. We introduce a likelihood-based training objective for the text encoder and design a more thorough measurement (score) for long text evaluation compared to the previous approach. The proposed training objective effectively preserves the sequence coherence, while the new score comprehensively captures both temporal and spatial dependencies. Theoretical properties of our new score show its advantages in sequence evaluation. Experimental results show superior performance in various sequence evaluation tasks, including global and local discrimination within and between documents of different lengths. We also demonstrate the encoder achieves competitive results on discriminating human and AI written text.
Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning
Jiang, Chen, Liu, Hong, Yu, Xuzheng, Wang, Qing, Cheng, Yuan, Xu, Jia, Liu, Zhongyi, Guo, Qingpei, Chu, Wei, Yang, Ming, Qi, Yuan
In recent years, the explosion of web videos makes text-video retrieval increasingly essential and popular for video filtering, recommendation, and search. Text-video retrieval aims to rank relevant text/video higher than irrelevant ones. The core of this task is to precisely measure the cross-modal similarity between texts and videos. Recently, contrastive learning methods have shown promising results for text-video retrieval, most of which focus on the construction of positive and negative pairs to learn text and video representations. Nevertheless, they do not pay enough attention to hard negative pairs and lack the ability to model different levels of semantic similarity. To address these two issues, this paper improves contrastive learning using two novel techniques. First, to exploit hard examples for robust discriminative power, we propose a novel Dual-Modal Attention-Enhanced Module (DMAE) to mine hard negative pairs from textual and visual clues. By further introducing a Negative-aware InfoNCE (NegNCE) loss, we are able to adaptively identify all these hard negatives and explicitly highlight their impacts in the training loss. Second, our work argues that triplet samples can better model fine-grained semantic similarity compared to pairwise samples. We thereby present a new Triplet Partial Margin Contrastive Learning (TPM-CL) module to construct partial order triplet samples by automatically generating fine-grained hard negatives for matched text-video pairs. The proposed TPM-CL designs an adaptive token masking strategy with cross-modal interaction to model subtle semantic differences. Extensive experiments demonstrate that the proposed approach outperforms existing methods on four widely-used text-video retrieval datasets, including MSR-VTT, MSVD, DiDeMo and ActivityNet.
BBScore: A Brownian Bridge Based Metric for Assessing Text Coherence
Sheng, Zhecheng, Zhang, Tianhao, Jiang, Chen, Kang, Dongyeop
Measuring the coherence of text is a vital aspect of evaluating the quality of written content. Recent advancements in neural coherence modeling have demonstrated their efficacy in capturing entity coreference and discourse relations, thereby enhancing coherence evaluation. However, many existing methods heavily depend on static embeddings or focus narrowly on nearby context, constraining their capacity to measure the overarching coherence of long texts. In this paper, we posit that coherent texts inherently manifest a sequential and cohesive interplay among sentences, effectively conveying the central theme, purpose, or standpoint. To explore this abstract relationship, we introduce the "BBScore," a novel reference-free metric grounded in Brownian bridge theory for assessing text coherence. Our findings showcase that when synergized with a simple additional classification component, this metric attains a performance level comparable to state-of-the-art techniques on standard artificial discrimination tasks. We also establish in downstream tasks that this metric effectively differentiates between human-written documents and text generated by large language models under a specific domain. Furthermore, we illustrate the efficacy of this approach in detecting written styles attributed to diverse large language models, underscoring its potential for generalizability. In summary, we present a novel Brownian bridge coherence metric capable of measuring both local and global text coherence, while circumventing the need for end-to-end model training. This flexibility allows for its application in various downstream tasks.
Bridging Low-level Geometry to High-level Concepts in Visual Servoing of Robot Manipulation Task Using Event Knowledge Graphs and Vision-Language Models
Jiang, Chen, Jagersand, Martin
In this paper, we propose a framework of building knowledgeable robot control in the scope of smart human-robot interaction, by empowering a basic uncalibrated visual servoing controller with contextual knowledge through the joint usage of event knowledge graphs (EKGs) and large-scale pretrained vision-language models (VLMs). The framework is expanded in twofold: first, we interpret low-level image geometry as high-level concepts, allowing us to prompt VLMs and to select geometric features of points and lines for motor control skills; then, we create an event knowledge graph (EKG) to conceptualize a robot manipulation task of interest, where the main body of the EKG is characterized by an executable behavior tree, and the leaves by semantic concepts relevant to the manipulation context. We demonstrate, in an uncalibrated environment with real robot trials, that our method lowers the reliance of human annotation during task interfacing, allows the robot to perform activities of daily living more easily by treating low-level geometric-based motor control skills as high-level concepts, and is beneficial in building cognitive thinking for smart robot applications.
CLIPUNetr: Assisting Human-robot Interface for Uncalibrated Visual Servoing Control with CLIP-driven Referring Expression Segmentation
Jiang, Chen, Yang, Yuchen, Jagersand, Martin
The classical human-robot interface in uncalibrated image-based visual servoing (UIBVS) relies on either human annotations or semantic segmentation with categorical labels. Both methods fail to match natural human communication and convey rich semantics in manipulation tasks as effectively as natural language expressions. In this paper, we tackle this problem by using referring expression segmentation, which is a prompt-based approach, to provide more in-depth information for robot perception. To generate high-quality segmentation predictions from referring expressions, we propose CLIPUNetr - a new CLIP-driven referring expression segmentation network. CLIPUNetr leverages CLIP's strong vision-language representations to segment regions from referring expressions, while utilizing its ``U-shaped'' encoder-decoder architecture to generate predictions with sharper boundaries and finer structures. Furthermore, we propose a new pipeline to integrate CLIPUNetr into UIBVS and apply it to control robots in real-world environments. In experiments, our method improves boundary and structure measurements by an average of 120% and can successfully assist real-world UIBVS control in an unstructured manipulation environment.
Using Adamic-Adar Index Algorithm to Predict Volunteer Collaboration: Less is More
Wu, Chao, Chen, Peng, Yin, Baiqiao, Lin, Zijuan, Jiang, Chen, Yu, Di, Zou, Changhong, Lui, Chunwang
Social networks exhibit a complex graph-like structure due to the uncertainty surrounding potential collaborations among participants. Machine learning algorithms possess generic outstanding performance in multiple real-world prediction tasks. However, whether machine learning algorithms outperform specific algorithms designed for graph link prediction remains unknown to us. To address this issue, the Adamic-Adar Index (AAI), Jaccard Coefficient (JC) and common neighbour centrality (CNC) as representatives of graph-specific algorithms were applied to predict potential collaborations, utilizing data from volunteer activities during the Covid-19 pandemic in Shenzhen city, along with the classical machine learning algorithms such as random forest, support vector machine, and gradient boosting as single predictors and components of ensemble learning. This paper introduces that the AAI algorithm outperformed the traditional JC and CNC, and other machine learning algorithms in analyzing graph node attributes for this task.
Single-photon Image Super-resolution via Self-supervised Learning
Chen, Yiwei, Jiang, Chen, Pan, Yu
Single-Photon Image Super-Resolution (SPISR) aims to recover a high-resolution volumetric photon counting cube from a noisy low-resolution one by computational imaging algorithms. In real-world scenarios, pairs of training samples are often expensive or impossible to obtain. By extending Equivariant Imaging (EI) to volumetric single-photon data, we propose a self-supervised learning framework for the SPISR task. Particularly, using the Poisson unbiased Kullback-Leibler risk estimator and equivariance, our method is able to learn from noisy measurements without ground truths. Comprehensive experiments on simulated and real-world dataset demonstrate that the proposed method achieves comparable performance with supervised learning and outperforms interpolation-based methods.
InGVIO: A Consistent Invariant Filter for Fast and High-Accuracy GNSS-Visual-Inertial Odometry
Liu, Changwu, Jiang, Chen, Wang, Haowen
Combining Global Navigation Satellite System (GNSS) with visual and inertial sensors can give smooth pose estimation without drifting. The fusion system gradually degrades to Visual-Inertial Odometry (VIO) with the number of satellites decreasing, which guarantees robust global navigation in GNSS unfriendly environments. In this letter, we propose an open-sourced invariant filter-based platform, InGVIO, to tightly fuse monocular/stereo visual-inertial measurements, along with raw data from GNSS. InGVIO gives highly competitive results in terms of computational load compared to current graph-based algorithms, meanwhile possessing the same or even better level of accuracy. Thanks to our proposed marginalization strategies, the baseline for triangulation is large although only a few cloned poses are kept. Moreover, we define the infinitesimal symmetries of the system and exploit the various structures of its symmetry group, being different from the total symmetries of the VIO case, which elegantly gives results for the pattern of degenerate motions and the structure of unobservable subspaces. We prove that the properly-chosen invariant error is still compatible with all possible symmetry group structures of InGVIO and has intrinsic consistency properties. Besides, InGVIO has strictly linear error propagation without linearization error. InGVIO is tested on both open datasets and our proposed fixed-wing datasets with variable levels of difficulty and various numbers of satellites. The latter datasets, to the best of our knowledge, are the first datasets open-sourced to the community on a fixed-wing aircraft with raw GNSS.