Information Fusion
Cluster and Aggregate: Face Recognition with Large Probe Set Supplementary Material
The number of layers L in CN is equal to 2. For recent SoT A backbone models, the performance is saturated above 98 .5 . The performance gain is observed in both backbones. As the probe size increases, the role of a feature fusion model also increases. The relative performance gain for Fig.1 c) is calculated as We measured the FPS with Nvidia RTX3090. When a few samples' contribution is larger than the others Lower entropy value tells you that the cluster features are deviating from a simple average of all samples.
Robust Data Fusion via Subsampling
Wang, Jing, Wang, HaiYing, Chen, Kun
Data fusion and transfer learning are rapidly growing fields that enhance model performance for a target population by leveraging other related data sources or tasks. The challenges lie in the various potential heterogeneities between the target and external data, as well as various practical concerns that prevent a naïve data integration. We consider a realistic scenario where the target data is limited in size while the external data is large but contaminated with outliers; such data contamination, along with other computational and operational constraints, necessitates proper selection or subsampling of the external data for transfer learning. To our knowledge,transfer learning and subsampling under data contamination have not been thoroughly investigated. We address this gap by studying various transfer learning methods with subsamples of the external data, accounting for outliers deviating from the underlying true model due to arbitrary mean shifts. Two subsampling strategies are investigated: one aimed at reducing biases and the other at minimizing variances. Approaches to combine these strategies are also introduced to enhance the performance of the estimators. We provide non-asymptotic error bounds for the transfer learning estimators, clarifying the roles of sample sizes, signal strength, sampling rates, magnitude of outliers, and tail behaviors of model error distributions, among other factors. Extensive simulations show the superior performance of the proposed methods. Additionally, we apply our methods to analyze the risk of hard landings in A380 airplanes by utilizing data from other airplane types,demonstrating that robust transfer learning can improve estimation efficiency for relatively rare airplane types with the help of data from other types of airplanes.
A Language-Signal-Vision Multimodal Framework for Multitask Cardiac Analysis
Zhang, Yuting, Geng, Tiantian, Hao, Luoying, Cheng, Xinxing, Thorley, Alexander, Wang, Xiaoxia, Lu, Wenqi, Hothi, Sandeep S, Wei, Lei, Qiu, Zhaowen, Kotecha, Dipak, Duan, Jinming
Contemporary cardiovascular management involves complex consideration and integration of multimodal cardiac datasets, where each modality provides distinct but complementary physiological characteristics. While the effective integration of multiple modalities could yield a holistic clinical profile that accurately models the true clinical situation with respect to data modalities and their relatives weightings, current methodologies remain limited by: 1) the scarcity of patient- and time-aligned multimodal data; 2) reliance on isolated single-modality or rigid multimodal input combinations; 3) alignment strategies that prioritize cross-modal similarity over complementarity; and 4) a narrow single-task focus. In response to these limitations, a comprehensive multimodal dataset was curated for immediate application, integrating laboratory test results, electrocardiograms, and echocardiograms with clinical outcomes. Subsequently, a unified framework, Textual Guidance Multimodal fusion for Multiple cardiac tasks (TGMM), was proposed. TGMM incorporated three key components: 1) a MedFlexFusion module designed to capture the unique and complementary characteristics of medical modalities and dynamically integrate data from diverse cardiac sources and their combinations; 2) a textual guidance module to derive task-relevant representations tailored to diverse clinical objectives, including heart disease diagnosis, risk stratification and information retrieval; and 3) a response module to produce final decisions for all these tasks. Furthermore, this study systematically explored key features across multiple modalities and elucidated their synergistic contributions in clinical decision-making. Extensive experiments showed that TGMM outperformed state-of-the-art methods across multiple clinical tasks, with additional validation confirming its robustness on another public dataset.
Research Challenges and Progress in the End-to-End V2X Cooperative Autonomous Driving Competition
Hao, Ruiyang, Yu, Haibao, Zhong, Jiaru, Wang, Chuanye, Wang, Jiahao, Kan, Yiming, Yang, Wenxian, Fan, Siqi, Yin, Huilin, Qiu, Jianing, Mu, Yao, Sun, Jiankai, Chen, Li, Zimmer, Walter, Zhang, Dandan, Zhang, Shanghang, Schwager, Mac, Luo, Ping, Nie, Zaiqing
With the rapid advancement of autonomous driving technology, vehicle-to-everything (V2X) communication has emerged as a key enabler for extending perception range and enhancing driving safety by providing visibility beyond the line of sight. However, integrating multi-source sensor data from both ego-vehicles and infrastructure under real-world constraints, such as limited communication bandwidth and dynamic environments, presents significant technical challenges. T o facilitate research in this area, we organized the End-to-End Autonomous Driving through V2X Cooperation Challenge, which features two tracks: cooperative temporal perception and cooperative end-to-end planning. Built on the UniV2X framework and the V2X-Seq-SPD dataset, the challenge attracted participation from over 30 teams worldwide and established a unified benchmark for evaluating cooperative driving systems. This paper describes the design and outcomes of the challenge, highlights key research problems including bandwidth-aware fusion, robust multi-agent planning, and heterogeneous sensor integration, and analyzes emerging technical trends among top-performing solutions. By addressing practical constraints in communication and data fusion, the challenge contributes to the development of scalable and reliable V2X-cooperative autonomous driving systems.
A Semi-supervised Generative Model for Incomplete Multi-view Data Integration with Missing Labels
Multi-view learning is widely applied to real-life datasets, such as multiple omics biological data, but it often suffers from both missing views and missing labels. Prior probabilistic approaches addressed the missing view problem by using a product-of-experts scheme to aggregate representations from present views and achieved superior performance over deterministic classifiers, using the information bottleneck (IB) principle. However, the IB framework is inherently fully supervised and cannot leverage unlabeled data. In this work, we propose a semi-supervised generative model that utilizes both labeled and unlabeled samples in a unified framework. Our method maximizes the likelihood of unlabeled samples to learn a latent space shared with the IB on labeled data. We also perform cross-view mutual information maximization in the latent space to enhance the extraction of shared information across views. Compared to existing approaches, our model achieves better predictive and imputation performance on both image and multi-omics data with missing views and limited labeled samples.
Bay ReL: Bayesian Relational Learning for Multi-omics Data Integration: Supplementary Materials
To further clarify the model and workflow of our proposed BayReL, we provide a schematic illustration of BayReL in Figure S1, where we only include two views for clarity. Figure S2 shows the inferred bipartite network with the top 200 interactions by BayReL. Schematic illustration of BayReL. 2 Figure S2: The bipartite sub-network with the top 200 interactions inferred by BayReL in AML data, Genes and drugs are shown as blue and red nodes, respectively. D. Details on the experimental setups, hyper-parameter selection, and run time We learn the model for 1000 training epochs and use the validation set for early stopping. Each training epoch for CF, BRCA, and AML took 0.01, 0.42, In all experiments, we used CCAGFA R package as the official implementation of BCCA.