Goto

Collaborating Authors

 Zhang, Yafei


RankPO: Preference Optimization for Job-Talent Matching

arXiv.org Artificial Intelligence

Matching job descriptions (JDs) with suitable talent requires models capable of understanding not only textual similarities between JDs and candidate resumes but also contextual factors such as geographical location and academic seniority. To address this challenge, we propose a two-stage training framework for large language models (LLMs). In the first stage, a contrastive learning approach is used to train the model on a dataset constructed from real-world matching rules, such as geographical alignment and research area overlap. While effective, this model primarily learns patterns that defined by the matching rules. In the second stage, we introduce a novel preference-based fine-tuning method inspired by Direct Preference Optimization (DPO), termed Rank Preference Optimization (RankPO), to align the model with AI-curated pairwise preferences emphasizing textual understanding. Our experiments show that while the first-stage model achieves strong performance on rule-based data (nDCG@20 = 0.706), it lacks robust textual understanding (alignment with AI annotations = 0.46). By fine-tuning with RankPO, we achieve a balanced model that retains relatively good performance in the original tasks while significantly improving the alignment with AI preferences. The code and data are available at https://github.com/yflyzhang/RankPO.


BSAFusion: A Bidirectional Stepwise Feature Alignment Network for Unaligned Medical Image Fusion

arXiv.org Artificial Intelligence

If unaligned multimodal medical images can be simultaneously aligned and fused using a single-stage approach within a unified processing framework, it will not only achieve mutual promotion of dual tasks but also help reduce the complexity of the model. However, the design of this model faces the challenge of incompatible requirements for feature fusion and alignment; specifically, feature alignment requires consistency among corresponding features, whereas feature fusion requires the features to be complementary to each other. To address this challenge, this paper proposes an unaligned medical image fusion method called Bidirectional Stepwise Feature Alignment and Fusion (BSFA-F) strategy. To reduce the negative impact of modality differences on cross-modal feature matching, we incorporate the Modal Discrepancy-Free Feature Representation (MDF-FR) method into BSFA-F. MDF-FR utilizes a Modality Feature Representation Head (MFRH) to integrate the global information of the input image. By injecting the information contained in MFRH of the current image into other modality images, it effectively reduces the impact of modality differences on feature alignment while preserving the complementary information carried by different images. In terms of feature alignment, BSFA-F employs a bidirectional stepwise alignment deformation field prediction strategy based on the path independence of vector displacement between two points. This strategy solves the problem of large spans and inaccurate deformation field prediction in single-step alignment. Finally, Multi-Modal Feature Fusion block achieves the fusion of aligned features. The experimental results across multiple datasets demonstrate the effectiveness of our method. The source code is available at https://github.com/slrl123/BSAFusion.


Prototype-Driven and Multi-Expert Integrated Multi-Modal MR Brain Tumor Image Segmentation

arXiv.org Artificial Intelligence

For multi-modal magnetic resonance (MR) brain tumor image segmentation, current methods usually directly extract the discriminative features from input images for tumor sub-region category determination and localization. However, the impact of information aliasing caused by the mutual inclusion of tumor sub-regions is often ignored. Moreover, existing methods usually do not take tailored efforts to highlight the single tumor sub-region features. To this end, a multi-modal MR brain tumor segmentation method with tumor prototype-driven and multi-expert integration is proposed. It could highlight the features of each tumor sub-region under the guidance of tumor prototypes. Specifically, to obtain the prototypes with complete information, we propose a mutual transmission mechanism to transfer different modal features to each other to address the issues raised by insufficient information on single-modal features. Furthermore, we devise a prototype-driven feature representation and fusion method with the learned prototypes, which implants the prototypes into tumor features and generates corresponding activation maps. With the activation maps, the sub-region features consistent with the prototype category can be highlighted. A key information enhancement and fusion strategy with multi-expert integration is designed to further improve the segmentation performance. The strategy can integrate the features from different layers of the extra feature extraction network and the features highlighted by the prototypes. Experimental results on three competition brain tumor segmentation datasets prove the superiority of the proposed method.


Person Text-Image Matching via Text-Feature Interpretability Embedding and External Attack Node Implantation

arXiv.org Artificial Intelligence

Person text-image matching, also known as text based person search, aims to retrieve images of specific pedestrians using text descriptions. Although person text-image matching has made great research progress, existing methods still face two challenges. First, the lack of interpretability of text features makes it challenging to effectively align them with their corresponding image features. Second, the same pedestrian image often corresponds to multiple different text descriptions, and a single text description can correspond to multiple different images of the same identity. The diversity of text descriptions and images makes it difficult for a network to extract robust features that match the two modalities. To address these problems, we propose a person text-image matching method by embedding text-feature interpretability and an external attack node. Specifically, we improve the interpretability of text features by providing them with consistent semantic information with image features to achieve the alignment of text and describe image region features.To address the challenges posed by the diversity of text and the corresponding person images, we treat the variation caused by diversity to features as caused by perturbation information and propose a novel adversarial attack and defense method to solve it. In the model design, graph convolution is used as the basic framework for feature representation and the adversarial attacks caused by text and image diversity on feature extraction is simulated by implanting an additional attack node in the graph convolution layer to improve the robustness of the model against text and image diversity. Extensive experiments demonstrate the effectiveness and superiority of text-pedestrian image matching over existing methods. The source code of the method is published at