Goto

Collaborating Authors

 pangu


Forecasting the Future with Yesterday's Climate: Temperature Bias in AI Weather and Climate Models

Landsberg, Jacob B., Barnes, Elizabeth A.

arXiv.org Artificial Intelligence

AI-based climate and weather models have rapidly gained popularity, providing faster forecasts with skill that can match or even surpass that of traditional dynamical models. Despite this success, these models face a key challenge: predicting future climates while being trained only with historical data. In this study, we investigate this issue by analyzing boreal winter land temperature biases in AI weather and climate models. We examine two weather models, FourCastNet V2 Small (FourCastNet) and Pangu Weather (Pangu), evaluating their predictions for 2020-2025 and Ai2 Climate Emulator version 2 (ACE2) for 1996-2010. These time periods lie outside of the respective models' training sets and are significantly more recent than the bulk of their training data, allowing us to assess how well the models generalize to new, i.e. more modern, conditions. We find that all three models produce cold-biased mean temperatures, resembling climates from 15-20 years earlier than the period they are predicting. In some regions, like the Eastern U.S., the predictions resemble climates from as much as 20-30 years earlier. Further analysis shows that FourCastNet's and Pangu's cold bias is strongest in the hottest predicted temperatures, indicating limited training exposure to modern extreme heat events. In contrast, ACE2's bias is more evenly distributed but largest in regions, seasons, and parts of the temperature distribution where climate change has been most pronounced. These findings underscore the challenge of training AI models exclusively on historical data and highlight the need to account for such biases when applying them to future climate prediction.


Intrinsic Fingerprint of LLMs: Continue Training is NOT All You Need to Steal A Model!

Yoon, Do-hyeon, Chun, Minsoo, Allen, Thomas, Müller, Hans, Wang, Min, Sharma, Rajesh

arXiv.org Artificial Intelligence

Large language models (LLMs) face significant copyright and intellectual property challenges as the cost of training increases and model reuse becomes prevalent. While watermarking techniques have been proposed to protect model ownership, they may not be robust to continue training and development, posing serious threats to model attribution and copyright protection. This work introduces a simple yet effective approach for robust LLM fingerprinting based on intrinsic model characteristics. We discover that the standard deviation distributions of attention parameter matrices across different layers exhibit distinctive patterns that remain stable even after extensive continued training. These parameter distribution signatures serve as robust fingerprints that can reliably identify model lineage and detect potential copyright infringement. Our experimental validation across multiple model families demonstrates the effectiveness of our method for model authentication. Notably, our investigation uncovers evidence that a recently Pangu Pro MoE model released by Huawei is derived from Qwen-2.5 14B model through upcycling techniques rather than training from scratch, highlighting potential cases of model plagiarism, copyright violation, and information fabrication. These findings underscore the critical importance of developing robust fingerprinting methods for protecting intellectual property in large-scale model development and emphasize that deliberate continued training alone is insufficient to completely obscure model origins.


Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition

Chen, Hanting, Wang, Yasheng, Han, Kai, Li, Dong, Li, Lin, Bi, Zhenni, Li, Jinpeng, Wang, Haoyu, Mi, Fei, Zhu, Mingjian, Wang, Bin, Song, Kaikai, Fu, Yifei, He, Xu, Luo, Yu, Zhu, Chong, He, Quan, Wu, Xueyu, He, Wei, Hu, Hailin, Tang, Yehui, Tao, Dacheng, Chen, Xinghao, Wang, Yunhe

arXiv.org Artificial Intelligence

This work presents Pangu Embedded, an efficient Large Language Model (LLM) reasoner developed on Ascend Neural Processing Units (NPUs), featuring flexible fast and slow thinking capabilities. Pangu Embedded addresses the significant computational costs and inference latency challenges prevalent in existing reasoning-optimized LLMs. We propose a two-stage training framework for its construction. In Stage 1, the model is finetuned via an iterative distillation process, incorporating inter-iteration model merging to effectively aggregate complementary knowledge. This is followed by reinforcement learning on Ascend clusters, optimized by a latency-tolerant scheduler that combines stale synchronous parallelism with prioritized data queues. The RL process is guided by a Multi-source Adaptive Reward System (MARS), which generates dynamic, task-specific reward signals using deterministic metrics and lightweight LLM evaluators for mathematics, coding, and general problem-solving tasks. Stage 2 introduces a dual-system framework, endowing Pangu Embedded with a "fast" mode for routine queries and a deeper "slow" mode for complex inference. This framework offers both manual mode switching for user control and an automatic, complexity-aware mode selection mechanism that dynamically allocates computational resources to balance latency and reasoning depth. Experimental results on benchmarks including AIME 2024, GPQA, and LiveCodeBench demonstrate that Pangu Embedded with 7B parameters, outperforms similar-size models like Qwen3-8B and GLM4-9B. It delivers rapid responses and state-of-the-art reasoning quality within a single, unified model architecture, highlighting a promising direction for developing powerful yet practically deployable LLM reasoners.


Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity

Tang, Yehui, Li, Xiaosong, Liu, Fangcheng, Guo, Wei, Zhou, Hang, Wang, Yaoyuan, Han, Kai, Yu, Xianzhi, Li, Jinpeng, Zang, Hui, Mi, Fei, Meng, Xiaojun, Liu, Zhicheng, Chen, Hanting, Zheng, Binfan, Chen, Can, Yan, Youliang, Tang, Ruiming, Qin, Peifeng, Chen, Xinghao, Tao, Dacheng, Wang, Yunhe

arXiv.org Artificial Intelligence

The surgence of Mixture of Experts (MoE) in Large Language Models promises a small price of execution cost for a much larger model parameter count and learning capacity, because only a small fraction of parameters are activated for each input token. However, it is commonly observed that some experts are activated far more often than others, leading to system inefficiency when running the experts on different devices in parallel. Therefore, we introduce Mixture of Grouped Experts (MoGE), which groups the experts during selection and balances the expert workload better than MoE in nature. It constrains tokens to activate an equal number of experts within each predefined expert group. When a model execution is distributed on multiple devices, this architectural design ensures a balanced computational load across devices, significantly enhancing throughput, particularly for the inference phase. Further, we build Pangu Pro MoE on Ascend NPUs, a sparse model based on MoGE with 72 billion total parameters, 16 billion of which are activated for each token. The configuration of Pangu Pro MoE is optimized for Ascend 300I Duo and 800I A2 through extensive system simulation studies. Our experiments indicate that MoGE indeed leads to better expert load balancing and more efficient execution for both model training and inference on Ascend NPUs. The inference performance of Pangu Pro MoE achieves 1148 tokens/s per card and can be further improved to 1528 tokens/s per card by speculative acceleration, outperforming comparable 32B and 72B Dense models. Furthermore, we achieve an excellent cost-to-performance ratio for model inference on Ascend 300I Duo. Our studies show that Ascend NPUs are capable of training Pangu Pro MoE with massive parallelization to make it a leading model within the sub-100B total parameter class, outperforming prominent open-source models like GLM-Z1-32B and Qwen3-32B.


Turning Up the Heat: Assessing 2-m Temperature Forecast Errors in AI Weather Prediction Models During Heat Waves

Ennis, Kelsey E., Barnes, Elizabeth A., Arcodia, Marybeth C., Fernandez, Martin A., Maloney, Eric D.

arXiv.org Artificial Intelligence

Extreme heat is the deadliest weather-related hazard in the United States. Furthermore, it is increasing in intensity, frequency, and duration, making skillful forecasts vital to protecting life and property. Traditional numerical weather prediction (NWP) models struggle with extreme heat for medium-range and subseasonal-to-seasonal (S2S) timescales. Meanwhile, artificial intelligence-based weather prediction (AIWP) models are progressing rapidly. However, it is largely unknown how well AIWP models forecast extremes, especially for medium-range and S2S timescales. This study investigates 2-m temperature forecasts for 60 heat waves across the four boreal seasons and over four CONUS regions at lead times up to 20 days, using two AIWP models (Google GraphCast and Pangu-Weather) and one traditional NWP model (NOAA United Forecast System Global Ensemble Forecast System (UFS GEFS)). First, case study analyses show that both AIWP models and the UFS GEFS exhibit consistent cold biases on regional scales in the 5-10 days of lead time before heat wave onset. GraphCast is the more skillful AIWP model, outperforming UFS GEFS and Pangu-Weather in most locations. Next, the two AIWP models are isolated and analyzed across all heat waves and seasons, with events split among the model's testing (2018-2023) and training (1979-2017) periods. There are cold biases before and during the heat waves in both models and all seasons, except Pangu-Weather in winter, which exhibits a mean warm bias before heat wave onset. Overall, results offer encouragement that AIWP models may be useful for medium-range and S2S predictability of extreme heat.


AI Models Still Lag Behind Traditional Numerical Models in Predicting Sudden-Turning Typhoons

Xu, Daosheng, Lu, Zebin, Leung, Jeremy Cheuk-Hin, Zhao, Dingchi, Li, Yi, Shi, Yang, Chen, Bin, Nie, Gaozhen, Wu, Naigeng, Tian, Xiangjun, Yang, Yi, Zhang, Shaoqing, Zhang, Banglin

arXiv.org Artificial Intelligence

Given the interpretability, accuracy, and stability of numerical weather prediction (NWP) models, current operational weather forecasting relies heavily on the NWP approach. In the past two years, the rapid development of Artificial Intelligence (AI) has provided an alternative solution for medium-range (1-10 days) weather forecasting. Bi et al. (2023) (hereafter Bi23) introduced the first AI-based weather prediction (AIWP) model in China, named Pangu-Weather, which offers fast prediction without compromising accuracy. In their work, Bi23 made notable claims regarding its effectiveness in extreme weather predictions. However, this claim lacks persuasiveness because the extreme nature of the two tropical cyclones (TCs) examples presented in Bi23, namely Typhoon Kong-rey and Typhoon Yutu, stems primarily from their intensities rather than their moving paths. Their claim may mislead into another meaning which is that Pangu-Weather works well in predicting unusual typhoon paths, which was not explicitly analyzed. Here, we reassess Pangu-Weather's ability to predict extreme TC trajectories from 2020-2024. Results reveal that while Pangu-Weather overall outperforms NWP models in predicting tropical cyclone (TC) tracks, it falls short in accurately predicting the rarely observed sudden-turning tracks, such as Typhoon Khanun in 2023. We argue that current AIWP models still lag behind traditional NWP models in predicting such rare extreme events in medium-range forecasts.


Don't Generate, Discriminate: A Proposal for Grounding Language Models to Real-World Environments

Gu, Yu, Deng, Xiang, Su, Yu

arXiv.org Artificial Intelligence

A key missing capacity of current language models (LMs) is grounding to real-world environments. Most existing work for grounded language understanding uses LMs to directly generate plans that can be executed in the environment to achieve the desired effects. It thereby casts the burden of ensuring grammaticality, faithfulness, and controllability all on the LMs. We propose Pangu, a generic framework for grounded language understanding that capitalizes on the discriminative ability of LMs instead of their generative ability. Pangu consists of a symbolic agent and a neural LM working in a concerted fashion: The agent explores the environment to incrementally construct valid plans, and the LM evaluates the plausibility of the candidate plans to guide the search process. A case study on the challenging problem of knowledge base question answering (KBQA), which features a massive environment, demonstrates the remarkable effectiveness and flexibility of Pangu: A BERT-base LM is sufficient for setting a new record on standard KBQA datasets, and larger LMs further bring substantial gains. Pangu also enables, for the first time, effective few-shot in-context learning for KBQA with large LMs such as Codex.