Goto

Collaborating Authors

 fine-tuning stage


U-Cast: A Surprisingly Simple and Efficient Frontier Probabilistic AI Weather Forecaster

arXiv.org Machine Learning

AI-based weather forecasting now rivals traditional physics-based ensembles, but state-of-the-art (SOTA) models rely on specialized architectures and massive computational budgets, creating a high barrier to entry. We demonstrate that such complexity is unnecessary for frontier performance. We introduce U-Cast, a probabilistic forecaster built on a standard U-Net backbone trained with a simple recipe: deterministic pre-training on Mean Absolute Error followed by short probabilistic fine-tuning on the Continuous Ranked Probability Score (CRPS) using Monte Carlo Dropout for stochasticity. As a result, our model matches or exceeds the probabilistic skill of GenCast and IFS ENS at 1.5$^\circ\$ resolution while reducing training compute by over 10$\times$ compared to leading CRPS-based models and inference latency by over 10$\times$ compared to diffusion-based models. U-Cast trains in under 12 H200 GPU-days and generates a 60-step ensemble forecast in 11 seconds. These results suggest that scalable, general-purpose architectures paired with efficient training curricula can match complex domain-specific designs at a fraction of the cost, opening the training of frontier probabilistic weather models to the broader community. Our code is available at: https://github.com/Rose-STL-Lab/u-cast.





Precoder Design in Multi-User FDD Systems with VQ-VAE and GNN

arXiv.org Artificial Intelligence

ABSTRACT Robust precoding is efficiently feasible in frequency divis ion duplex (FDD) systems by incorporating the learnt statistic s of the propagation environment through a generative model. W e build on previous work that successfully designed site-specific precoders based on a combination of Gaussian mixture models (GMMs) and graph neural networks (GNNs). In this paper, by utilizing a vector quantized-variational au toen-coder (VQ-V AE), we circumvent one of the key drawbacks of GMMs, i.e., the number of GMM components scales exponentially to the feedback bits. In addition, the deep lear n-ing architecture of the VQ-V AE allows us to jointly train the GNN together with VQ-V AE along with pilot optimization forming an end-to-end (E2E) model, resulting in considerable performance gains in sum rate for multi-user wireless systems. Simulations demonstrate the superiority of the pr o-posed frameworks over the conventional methods involving the sub-discrete Fourier transform (DFT) pilot matrix and i t-erative precoder algorithms enabling the deployment of sys - tems characterized by fewer pilots or feedback bits.





Multimodal Medical Image Classification via Synergistic Learning Pre-training

arXiv.org Artificial Intelligence

Multimodal pathological images are usually in clinical diagnosis, but computer vision-based multimodal image-assisted diagnosis faces challenges with modality fusion, especially in the absence of expert-annotated data. To achieve the modality fusion in multimodal images with label scarcity, we propose a novel ``pretraining + fine-tuning" framework for multimodal semi-supervised medical image classification. Specifically, we propose a synergistic learning pretraining framework of consistency, reconstructive, and aligned learning. By treating one modality as an augmented sample of another modality, we implement a self-supervised learning pre-train, enhancing the baseline model's feature representation capability. Then, we design a fine-tuning method for multimodal fusion. During the fine-tuning stage, we set different encoders to extract features from the original modalities and provide a multimodal fusion encoder for fusion modality. In addition, we propose a distribution shift method for multimodal fusion features, which alleviates the prediction uncertainty and overfitting risks caused by the lack of labeled samples. We conduct extensive experiments on the publicly available gastroscopy image datasets Kvasir and Kvasirv2. Quantitative and qualitative results demonstrate that the proposed method outperforms the current state-of-the-art classification methods. The code will be released at: https://github.com/LQH89757/MICS.


Appendixes A An Example for Scenario 2 We give an example of G(A)

Neural Information Processing Systems

Below is a detailed explanation of the comparative methods covered in the paper. The network architecture of PI-DeepONet used for Burgers' equation is such that both In order to solve the Eq. Fig.6 shows model predictions of MAD-L and MAD-LM compared with the reference solutions under Fig.7(a) shows that the accuracy of MAD-L after convergence increases with Fig.7(b) shows that the accuracy and convergence speed of MAD-LM do not change For Burgers' equation, we also consider the scenario when the viscosity coefficients Fig.8 compares the convergence curves of mean MAD-LM has obvious speed and accuracy improvement over From-Scratch and Transfer-Learning . We investigated the effect of the dimension of the latent vector (latent size) in Burgers' equation on performance. As can be seen from Fig.9(a), for MAD-L, different latent sizes have different performances and the best performance is achieved when it is equal to 128.