Huh, Eui-Nam
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance
Thwal, Chu Myaet, Tun, Ye Lin, Nguyen, Minh N. H., Huh, Eui-Nam, Hong, Choong Seon
Beyond the success of Contrastive Language-Image Pre-training (CLIP), recent trends mark a shift toward exploring the applicability of lightweight vision-language models for resource-constrained scenarios. These models often deliver suboptimal performance when relying solely on a single image-text contrastive learning objective, spotlighting the need for more effective training mechanisms that guarantee robust cross-modal feature alignment. In this work, we propose CLIP-PING: Contrastive Language-Image Pre-training with Proximus Intrinsic Neighbors Guidance, a novel yet simple and efficient training paradigm designed to boost the performance of lightweight vision-language models with minimal computational overhead and lower data demands. CLIP-PING bootstraps unimodal features extracted from arbitrary pre-trained encoders to obtain intrinsic guidance of proximus neighbor samples, i.e., nearest-neighbor (NN) and cross nearest-neighbor (XNN). We find that extra contrastive supervision from these neighbors substantially boosts cross-modal alignment, enabling lightweight models to learn more generic features with rich semantic diversity. Extensive experiments reveal that CLIP-PING notably surpasses its peers in zero-shot generalization and cross-modal retrieval tasks. Specifically, a 5.5% gain on zero-shot ImageNet1K classification with 10.7% (I2T) and 5.7% (T2I) on Flickr30K retrieval, compared to the original CLIP when using ViT-XS image encoder trained on 3 million (image, text) pairs. Moreover, CLIP-PING showcases a strong transferability under the linear evaluation protocol across several downstream tasks.
Federated Learning and RAG Integration: A Scalable Approach for Medical Large Language Models
Jung, Jincheol, Jeong, Hongju, Huh, Eui-Nam
This study analyzes the performance of domain-specific Large Language Models (LLMs) for the medical field by integrating Retrieval-Augmented Generation (RAG) systems within a federated learning framework. Leveraging the inherent advantages of federated learning, such as preserving data privacy and enabling distributed computation, this research explores the integration of RAG systems with models trained under varying client configurations to optimize performance. Experimental results demonstrate that the federated learning-based models integrated with RAG systems consistently outperform their non-integrated counterparts across all evaluation metrics. This study highlights the potential of combining federated learning and RAG systems for developing domain-specific LLMs in the medical field, providing a scalable and privacy-preserving solution for enhancing text generation capabilities.
Federated Hybrid Training and Self-Adversarial Distillation: Towards Robust Edge Networks
Qiao, Yu, Adhikary, Apurba, Kim, Kitae, Huh, Eui-Nam, Han, Zhu, Hong, Choong Seon
Federated learning (FL) is a distributed training technology that enhances data privacy in mobile edge networks by allowing data owners to collaborate without transmitting raw data to the edge server. However, data heterogeneity and adversarial attacks pose challenges to develop an unbiased and robust global model for edge deployment. To address this, we propose Federated hyBrid Adversarial training and self-adversarial disTillation (FedBAT), a new framework designed to improve both robustness and generalization of the global model. FedBAT seamlessly integrates hybrid adversarial training and self-adversarial distillation into the conventional FL framework from data augmentation and feature distillation perspectives. From a data augmentation perspective, we propose hybrid adversarial training to defend against adversarial attacks by balancing accuracy and robustness through a weighted combination of standard and adversarial training. From a feature distillation perspective, we introduce a novel augmentation-invariant adversarial distillation method that aligns local adversarial features of augmented images with their corresponding unbiased global clean features. This alignment can effectively mitigate bias from data heterogeneity while enhancing both the robustness and generalization of the global model. Extensive experimental results across multiple datasets demonstrate that FedBAT yields comparable or superior performance gains in improving robustness while maintaining accuracy compared to several baselines.
QTSeg: A Query Token-Based Architecture for Efficient 2D Medical Image Segmentation
Tran, Phuong-Nam, Pham, Nhat Truong, Dang, Duc Ngoc Minh, Huh, Eui-Nam, Hong, Choong Seon
Medical image segmentation is crucial in assisting medical doctors in making diagnoses and enabling accurate automatic diagnosis. While advanced convolutional neural networks (CNNs) excel in segmenting regions of interest with pixel-level precision, they often struggle with long-range dependencies, which is crucial for enhancing model performance. Conversely, transformer architectures leverage attention mechanisms to excel in handling long-range dependencies. However, the computational complexity of transformers grows quadratically, posing resource-intensive challenges, especially with high-resolution medical images. Recent research aims to combine CNN and transformer architectures to mitigate their drawbacks and enhance performance while keeping resource demands low. Nevertheless, existing approaches have not fully leveraged the strengths of both architectures to achieve high accuracy with low computational requirements. To address this gap, we propose a novel architecture for 2D medical image segmentation (QTSeg) that leverages a feature pyramid network (FPN) as the image encoder, a multi-level feature fusion (MLFF) as the adaptive module between encoder and decoder and a multi-query mask decoder (MQM Decoder) as the mask decoder. In the first step, an FPN model extracts pyramid features from the input image. Next, MLFF is incorporated between the encoder and decoder to adapt features from different encoder stages to the decoder. Finally, an MQM Decoder is employed to improve mask generation by integrating query tokens with pyramid features at all stages of the mask decoder. Our experimental results show that QTSeg outperforms state-of-the-art methods across all metrics with lower computational demands than the baseline and the existing methods. Code is available at https://github.com/tpnam0901/QTSeg (v0.1.0)
Design Optimization of NOMA Aided Multi-STAR-RIS for Indoor Environments: A Convex Approximation Imitated Reinforcement Learning Approach
Park, Yu Min, Hassan, Sheikh Salman, Tun, Yan Kyaw, Huh, Eui-Nam, Saad, Walid, Hong, Choong Seon
Sixth-generation (6G) networks leverage simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs) to overcome the limitations of traditional RISs. However, deploying STAR-RISs indoors presents challenges in interference mitigation, power consumption, and real-time configuration. In this work, a novel network architecture utilizing multiple access points (APs) and STAR-RISs is proposed for indoor communication. An optimization problem encompassing user assignment, access point beamforming, and STAR-RIS phase control for reflection and transmission is formulated. The inherent complexity of the formulated problem necessitates a decomposition approach for an efficient solution. This involves tackling different sub-problems with specialized techniques: a many-to-one matching algorithm is employed to assign users to appropriate access points, optimizing resource allocation. To facilitate efficient resource management, access points are grouped using a correlation-based K-means clustering algorithm. Multi-agent deep reinforcement learning (MADRL) is leveraged to optimize the control of the STAR-RIS. Yu Min Park, Sheikh Salman Hassan, Eui-Nam Huh, and Choong Seon Hong are with the Department of Computer Science and Engineering, Kyung Hee University, Yongin-si, Gyeonggi-do 17104, Rep. of Korea, e-mails:{yumin0906, salman0335, johnhuh, cshong}@khu.ac.kr. Yan Kyaw Tun is with the Department of Electronic Systems, Aalborg University, A. C. Meyers Vænge 15, 2450 København, e-mail: ykt@es.aau.dk. Walid Saad is with the Bradley Department of Electrical and Computer Engineering, Virginia Tech, VA, 24061, USA. Additionally, the proposed MADRL approach incorporates convex approximation (CA).
Self-organizing Democratized Learning: Towards Large-scale Distributed Learning Systems
Nguyen, Minh N. H., Pandey, Shashi Raj, Dang, Tri Nguyen, Huh, Eui-Nam, Hong, Choong Seon, Tran, Nguyen H., Saad, Walid
Emerging cross-device artificial intelligence (AI) applications require a transition from conventional centralized learning systems towards large-scale distributed AI systems that can collaboratively perform complex learning tasks. In this regard, democratized learning (Dem-AI) (Minh et al. 2020) lays out a holistic philosophy with underlying principles for building large-scale distributed and democratized machine learning systems. The outlined principles are meant to provide a generalization of distributed learning that goes beyond existing mechanisms such as federated learning. Inspired from this philosophy, a novel distributed learning approach is proposed in this paper. The approach consists of a self-organizing hierarchical structuring mechanism based on agglomerative clustering, hierarchical generalization, and corresponding learning mechanism. Subsequently, a hierarchical generalized learning problem in a recursive form is formulated and shown to be approximately solved using the solutions of distributed personalized learning problems and hierarchical generalized averaging mechanism. To that end, a distributed learning algorithm, namely DemLearn and its variant, DemLearn-P is proposed. Extensive experiments on benchmark MNIST and Fashion-MNIST datasets show that proposed algorithms demonstrate better results in the generalization performance of learning model at agents compared to the conventional FL algorithms. Detailed analysis provides useful configurations to further tune up both the generalization and specialization performance of the learning models in Dem-AI systems.