Goto

Collaborating Authors

 fine-tuning stage





Precoder Design in Multi-User FDD Systems with VQ-VAE and GNN

Allaparapu, Srikar, Baur, Michael, Böck, Benedikt, Joham, Michael, Utschick, Wolfgang

arXiv.org Artificial Intelligence

ABSTRACT Robust precoding is efficiently feasible in frequency divis ion duplex (FDD) systems by incorporating the learnt statistic s of the propagation environment through a generative model. W e build on previous work that successfully designed site-specific precoders based on a combination of Gaussian mixture models (GMMs) and graph neural networks (GNNs). In this paper, by utilizing a vector quantized-variational au toen-coder (VQ-V AE), we circumvent one of the key drawbacks of GMMs, i.e., the number of GMM components scales exponentially to the feedback bits. In addition, the deep lear n-ing architecture of the VQ-V AE allows us to jointly train the GNN together with VQ-V AE along with pilot optimization forming an end-to-end (E2E) model, resulting in considerable performance gains in sum rate for multi-user wireless systems. Simulations demonstrate the superiority of the pr o-posed frameworks over the conventional methods involving the sub-discrete Fourier transform (DFT) pilot matrix and i t-erative precoder algorithms enabling the deployment of sys - tems characterized by fewer pilots or feedback bits.





Multimodal Medical Image Classification via Synergistic Learning Pre-training

Lin, Qinghua, Liu, Guang-Hai, Li, Zuoyong, Li, Yang, Jiang, Yuting, Wu, Xiang

arXiv.org Artificial Intelligence

Multimodal pathological images are usually in clinical diagnosis, but computer vision-based multimodal image-assisted diagnosis faces challenges with modality fusion, especially in the absence of expert-annotated data. To achieve the modality fusion in multimodal images with label scarcity, we propose a novel ``pretraining + fine-tuning" framework for multimodal semi-supervised medical image classification. Specifically, we propose a synergistic learning pretraining framework of consistency, reconstructive, and aligned learning. By treating one modality as an augmented sample of another modality, we implement a self-supervised learning pre-train, enhancing the baseline model's feature representation capability. Then, we design a fine-tuning method for multimodal fusion. During the fine-tuning stage, we set different encoders to extract features from the original modalities and provide a multimodal fusion encoder for fusion modality. In addition, we propose a distribution shift method for multimodal fusion features, which alleviates the prediction uncertainty and overfitting risks caused by the lack of labeled samples. We conduct extensive experiments on the publicly available gastroscopy image datasets Kvasir and Kvasirv2. Quantitative and qualitative results demonstrate that the proposed method outperforms the current state-of-the-art classification methods. The code will be released at: https://github.com/LQH89757/MICS.


Appendixes A An Example for Scenario 2 We give an example of G(A)

Neural Information Processing Systems

Below is a detailed explanation of the comparative methods covered in the paper. The network architecture of PI-DeepONet used for Burgers' equation is such that both In order to solve the Eq. Fig.6 shows model predictions of MAD-L and MAD-LM compared with the reference solutions under Fig.7(a) shows that the accuracy of MAD-L after convergence increases with Fig.7(b) shows that the accuracy and convergence speed of MAD-LM do not change For Burgers' equation, we also consider the scenario when the viscosity coefficients Fig.8 compares the convergence curves of mean MAD-LM has obvious speed and accuracy improvement over From-Scratch and Transfer-Learning . We investigated the effect of the dimension of the latent vector (latent size) in Burgers' equation on performance. As can be seen from Fig.9(a), for MAD-L, different latent sizes have different performances and the best performance is achieved when it is equal to 128.


Appendix A Patch based Negative Data Augmentation Reduces Texture Bias

Neural Information Processing Systems

Figure 5: ViTs trained only on our patch-based transformations exhibit stronger texture bias. Each bar is the texture accuracy ( %) on Conflict Stimuli (Geirhos et al., 2018), and a higher texture accuracy indicates the model has a higher bias towards texture. The "texture accuracy" is defined as the percentage of images that are classified as the "texture" label, provided the image is classified as either "texture" or "shape" label. The baseline model is ViT -B/16 in (Dosovitskiy et al., 2021) trained on original images. Other models are trained on patch-based transformed images, e.g., "P-Shuffle" stands for a ViT -B/16 model trained on patch-based shuffled images.