large-scale model
Interaction-Centric Knowledge Infusion and Transfer for Open Vocabulary Scene Graph Generation
Open-vocabulary scene graph generation (OVSGG) extends traditional SGG by recognizing novel objects and relationships beyond predefined categories, leveraging the knowledge from pre-trained large-scale models. Existing OVSGG methods always adopt a two-stage pipeline: 1) Infusing knowledge into large-scale models via pre-training on large datasets; 2) Transferring knowledge from pre-trained models with fully annotated scene graphs during supervised fine-tuning. However, due to a lack of explicit interaction modeling, these methods struggle to distinguish between interacting and non-interacting instances of the same object category. This limitation induces critical issues in both stages of OVSGG: it generates noisy pseudo-supervision from mismatched objects during knowledge infusion, and causes ambiguous query matching during knowledge transfer. To this end, in this paper, we propose an interACtion-Centric end-to-end OVSGG framework (ACC) in an interaction-driven paradigm to minimize these mismatches. For interaction-centric knowledge infusion, ACC employs a bidirectional interaction prompt for robust pseudo-supervision generation to enhance the model's interaction knowledge. For interaction-centric knowledge transfer, ACC first adopts interaction-guided query selection that prioritizes pairing interacting objects to reduce interference from non-interacting ones. Then, it integrates interaction-consistent knowledge distillation to bolster robustness by pushing relational foreground away from the background while retaining general knowledge. Extensive experimental results on three benchmarks show that ACC achieves state-of-the-art performance, demonstrating the potential of interaction-centric paradigms for real-world applications.
Unified Gradient-Based Machine Unlearning with Remain Geometry Enhancement
Machine unlearning (MU) has emerged to enhance the privacy and trustworthiness of deep neural networks. Approximate MU is a practical method for large-scale models. Our investigation into approximate MU starts with identifying the steepest descent direction, minimizing the output Kullback-Leibler divergence to exact MU inside a parameters' neighborhood.
Federated Learning from Pre-Trained Models: A Contrastive Learning Approach
Federated Learning (FL) is a machine learning paradigm that allows decentralized clients to learn collaboratively without sharing their private data. However, excessive computation and communication demands pose challenges to current FL frameworks, especially when training large-scale models. To prevent these issues from hindering the deployment of FL systems, we propose a lightweight framework where clients jointly learn to fuse the representations generated by multiple fixed pre-trained models rather than training a large-scale model from scratch. This leads us to a more practical FL problem by considering how to capture more client-specific and class-relevant information from the pre-trained models and jointly improve each client's ability to exploit those off-the-shelf models. Here, we design a Federated Prototype-wise Contrastive Learning (FedPCL) approach which shares knowledge across clients through their class prototypes and builds client-specific representations in a prototype-wise contrastive manner. Sharing prototypes rather than learnable model parameters allows each client to fuse the representations in a personalized way while keeping the shared knowledge in a compact form for efficient communication. We perform a thorough evaluation of the proposed FedPCL in the lightweight framework, measuring and visualizing its ability to fuse various pre-trained models on popular FL datasets.
Large-Scale Model Enabled Semantic Communication Based on Robust Knowledge Distillation
Ding, Kuiyuan, Guo, Caili, Yang, Yang, Du, Zhongtian, Saad, Walid
Large-scale models (LSMs) can be an effective framework for semantic representation and understanding, thereby providing a suitable tool for designing semantic communication (SC) systems. However, their direct deployment is often hindered by high computational complexity and resource requirements. In this paper, a novel robust knowledge distillation based semantic communication (RKD-SC) framework is proposed to enable efficient and \textcolor{black}{channel-noise-robust} LSM-powered SC. The framework addresses two key challenges: determining optimal compact model architectures and effectively transferring knowledge while maintaining robustness against channel noise. First, a knowledge distillation-based lightweight differentiable architecture search (KDL-DARTS) algorithm is proposed. This algorithm integrates knowledge distillation loss and a complexity penalty into the neural architecture search process to identify high-performance, lightweight semantic encoder architectures. Second, a novel two-stage robust knowledge distillation (RKD) algorithm is developed to transfer semantic capabilities from an LSM (teacher) to a compact encoder (student) and subsequently enhance system robustness. To further improve resilience to channel impairments, a channel-aware transformer (CAT) block is introduced as the channel codec, trained under diverse channel conditions with variable-length outputs. Extensive simulations on image classification tasks demonstrate that the RKD-SC framework significantly reduces model parameters while preserving a high degree of the teacher model's performance and exhibiting superior robustness compared to existing methods.
NeuroDx-LM: A Clinical Large-Scale Model for EEG-based Neurological Disorder Detection
Jin, Guanghao, Liang, Yuan, Ma, Yihan, Wu, Jingpei, Liu, Guoyang
Large-scale models pre-trained on Electroencephalography (EEG) have shown promise in clinical applications such as neurological disorder detection. However, the practical deployment of EEG-based large-scale models faces critical challenges such as limited labeled EEG data and suboptimal performance in clinical scenarios. To address these issues, we propose NeuroDx-LM, a novel large-scale model specifically designed for detecting EEG-based neurological disorders. Our key contributions include (i) a Selective Temporal-Frequency Embedding mechanism that adaptively captures complex temporal and spectral patterns in EEG signals; and (ii) a Progressive Feature-Aware Training strategy that refines feature representation in a two-stage process. In the first stage, our model learns the fundamental discriminative features of EEG activities; in the second stage, the model further extracts more specialized fine-grained features for accurate diagnostic performance. We evaluated NeuroDx-LM on the CHB-MIT and Schizophrenia datasets, achieving state-of-the-art performance in EEG-based seizure and schizophrenia detection, respectively. These results demonstrate the great potential of EEG-based large-scale models to advance clinical applicability.
Unified Gradient-Based Machine Unlearning with Remain Geometry Enhancement
Machine unlearning (MU) has emerged to enhance the privacy and trustworthiness of deep neural networks. Approximate MU is a practical method for large-scale models. Our investigation into approximate MU starts with identifying the steepest descent direction, minimizing the output Kullback-Leibler divergence to exact MU inside a parameters' neighborhood. Such decomposition derived from Euclidean metric encompasses most existing gradient-based MU methods. Nevertheless, adhering to Euclidean space may result in sub-optimal iterative trajectories due to the overlooked geometric structure of the output probability space.
Federated Learning from Pre-Trained Models: A Contrastive Learning Approach
Federated Learning (FL) is a machine learning paradigm that allows decentralized clients to learn collaboratively without sharing their private data. However, excessive computation and communication demands pose challenges to current FL frameworks, especially when training large-scale models. To prevent these issues from hindering the deployment of FL systems, we propose a lightweight framework where clients jointly learn to fuse the representations generated by multiple fixed pre-trained models rather than training a large-scale model from scratch. This leads us to a more practical FL problem by considering how to capture more client-specific and class-relevant information from the pre-trained models and jointly improve each client's ability to exploit those off-the-shelf models. Here, we design a Federated Prototype-wise Contrastive Learning (FedPCL) approach which shares knowledge across clients through their class prototypes and builds client-specific representations in a prototype-wise contrastive manner.
Distilling Calibration via Conformalized Credal Inference
Huang, Jiayi, Park, Sangwoo, Paoletti, Nicola, Simeone, Osvaldo
Deploying artificial intelligence (AI) models on edge devices involves a delicate balance between meeting stringent complexity constraints, such as limited memory and energy resources, and ensuring reliable performance in sensitive decision-making tasks. One way to enhance reliability is through uncertainty quantification via Bayesian inference. This approach, however, typically necessitates maintaining and running multiple models in an ensemble, which may exceed the computational limits of edge devices. This paper introduces a low-complexity methodology to address this challenge by distilling calibration information from a more complex model. In an offline phase, predictive probabilities generated by a high-complexity cloud-based model are leveraged to determine a threshold based on the typical divergence between the cloud and edge models. At run time, this threshold is used to construct credal sets -- ranges of predictive probabilities that are guaranteed, with a user-selected confidence level, to include the predictions of the cloud model. The credal sets are obtained through thresholding of a divergence measure in the simplex of predictive probabilities. Experiments on visual and language tasks demonstrate that the proposed approach, termed Conformalized Distillation for Credal Inference (CD-CI), significantly improves calibration performance compared to low-complexity Bayesian methods, such as Laplace approximation, making it a practical and efficient solution for edge AI deployments.