Meng, Xiaofeng
Research on Foundation Model for Spatial Data Intelligence: China's 2024 White Paper on Strategic Development of Spatial Data Intelligence
Wang, Shaohua, Xie, Xing, Li, Yong, Guo, Danhuai, Cai, Zhi, Liu, Yu, Yue, Yang, Pan, Xiao, Lu, Feng, Wu, Huayi, Gui, Zhipeng, Ding, Zhiming, Zheng, Bolong, Zhang, Fuzheng, Qin, Tao, Wang, Jingyuan, Tao, Chuang, Chen, Zhengchao, Lu, Hao, Li, Jiayi, Chen, Hongyang, Yue, Peng, Yu, Wenhao, Yao, Yao, Sun, Leilei, Zhang, Yong, Chen, Longbiao, Du, Xiaoping, Li, Xiang, Zhang, Xueying, Qin, Kun, Gong, Zhaoya, Dong, Weihua, Meng, Xiaofeng
Research status and development trends; on this basis, this report proposes three major challenges faced by large spatial data intelligent models today. This report focuses on the current research status of spatial data intelligent large-scale models and sorts out the research progress in four major thematic areas of spatial data intelligent large-scale models: cities, air and space remote sensing, geography, and transportation. This report systematically introduces the key technologies, characteristics and advantages, research status, future development and other core information of spatial data intelligent large models, involving spatiotemporal big data platforms, distributed computing, 3D virtual reality, space The basic performance of large models such as analysis and visualization, as well as the complex spatial comprehensive performance of large models such as geospatial intelligent computing, deep learning, high-performance processing of big data, geographical knowledge graphs, and geographical intelligent multi-scenario simulation, analyze the application of the above key technologies in spatial data The location and role of smart large models.
From Chaos to Clarity: Time Series Anomaly Detection in Astronomical Observations
Hao, Xinli, Chen, Yile, Yang, Chen, Du, Zhihui, Ma, Chaohong, Wu, Chao, Meng, Xiaofeng
With the development of astronomical facilities, large-scale time series data observed by these facilities is being collected. Analyzing anomalies in these astronomical observations is crucial for uncovering potential celestial events and physical phenomena, thus advancing the scientific research process. However, existing time series anomaly detection methods fall short in tackling the unique characteristics of astronomical observations where each star is inherently independent but interfered by random concurrent noise, resulting in a high rate of false alarms. To overcome the challenges, we propose AERO, a novel two-stage framework tailored for unsupervised anomaly detection in astronomical observations. In the first stage, we employ a Transformer-based encoder-decoder architecture to learn the normal temporal patterns on each variate (i.e., star) in alignment with the characteristic of variate independence. In the second stage, we enhance the graph neural network with a window-wise graph structure learning to tackle the occurrence of concurrent noise characterized by spatial and temporal randomness. In this way, AERO is not only capable of distinguishing normal temporal patterns from potential anomalies but also effectively differentiating concurrent noise, thus decreasing the number of false alarms. We conducted extensive experiments on three synthetic datasets and three real-world datasets. The results demonstrate that AERO outperforms the compared baselines. Notably, compared to the state-of-the-art model, AERO improves the F1-score by up to 8.76% and 2.63% on synthetic and real-world datasets respectively.
Cross-silo Federated Learning with Record-level Personalized Differential Privacy
Liu, Junxu, Lou, Jian, Xiong, Li, Liu, Jinfei, Meng, Xiaofeng
Federated learning enhanced by differential privacy has emerged as a popular approach to better safeguard the privacy of client-side data by protecting clients' contributions during the training process. Existing solutions typically assume a uniform privacy budget for all records and provide one-size-fits-all solutions that may not be adequate to meet each record's privacy requirement. In this paper, we explore the uncharted territory of cross-silo FL with record-level personalized differential privacy. We devise a novel framework named rPDP-FL, employing a two-stage hybrid sampling scheme with both client-level sampling and non-uniform record-level sampling to accommodate varying privacy requirements. A critical and non-trivial problem is to select the ideal per-record sampling probability q given the personalized privacy budget {\epsilon}. We introduce a versatile solution named Simulation-CurveFitting, allowing us to uncover a significant insight into the nonlinear correlation between q and {\epsilon} and derive an elegant mathematical model to tackle the problem. Our evaluation demonstrates that our solution can provide significant performance gains over the baselines that do not consider personalized privacy preservation.