Fang, Yuchen
High Probability Complexity Bounds of Trust-Region Stochastic Sequential Quadratic Programming with Heavy-Tailed Noise
Fang, Yuchen, Lavaei, Javad, Na, Sen
In this paper, we consider nonlinear optimization problems with a stochastic objective and deterministic equality constraints. We propose a Trust-Region Stochastic Sequential Quadratic Programming (TR-SSQP) method and establish its high-probability iteration complexity bounds for identifying first- and second-order $\epsilon$-stationary points. In our algorithm, we assume that exact objective values, gradients, and Hessians are not directly accessible but can be estimated via zeroth-, first-, and second-order probabilistic oracles. Compared to existing complexity studies of SSQP methods that rely on a zeroth-order oracle with sub-exponential tail noise (i.e., light-tailed) and focus mostly on first-order stationarity, our analysis accommodates irreducible and heavy-tailed noise in the zeroth-order oracle and significantly extends the analysis to second-order stationarity. We show that under heavy-tailed noise conditions, our SSQP method achieves the same high-probability first-order iteration complexity bounds as in the light-tailed noise setting, while further exhibiting promising second-order iteration complexity bounds. Specifically, the method identifies a first-order $\epsilon$-stationary point in $\mathcal{O}(\epsilon^{-2})$ iterations and a second-order $\epsilon$-stationary point in $\mathcal{O}(\epsilon^{-3})$ iterations with high probability, provided that $\epsilon$ is lower bounded by a constant determined by the irreducible noise level in estimation. We validate our theoretical findings and evaluate the practical performance of our method on CUTEst benchmark test set.
Efficient Large-Scale Traffic Forecasting with Transformers: A Spatial Data Management Perspective
Fang, Yuchen, Liang, Yuxuan, Hui, Bo, Shao, Zezhi, Deng, Liwei, Liu, Xu, Jiang, Xinke, Zheng, Kai
Road traffic forecasting is crucial in real-world intelligent transportation scenarios like traffic dispatching and path planning in city management and personal traveling. Spatio-temporal graph neural networks (STGNNs) stand out as the mainstream solution in this task. Nevertheless, the quadratic complexity of remarkable dynamic spatial modeling-based STGNNs has become the bottleneck over large-scale traffic data. From the spatial data management perspective, we present a novel Transformer framework called PatchSTG to efficiently and dynamically model spatial dependencies for large-scale traffic forecasting with interpretability and fidelity. Specifically, we design a novel irregular spatial patching to reduce the number of points involved in the dynamic calculation of Transformer. The irregular spatial patching first utilizes the leaf K-dimensional tree (KDTree) to recursively partition irregularly distributed traffic points into leaf nodes with a small capacity, and then merges leaf nodes belonging to the same subtree into occupancy-equaled and non-overlapped patches through padding and backtracking. Based on the patched data, depth and breadth attention are used interchangeably in the encoder to dynamically learn local and global spatial knowledge from points in a patch and points with the same index of patches. Experimental results on four real world large-scale traffic datasets show that our PatchSTG achieves train speed and memory utilization improvements up to $10\times$ and $4\times$ with the state-of-the-art performance.
RAGraph: A General Retrieval-Augmented Graph Learning Framework
Jiang, Xinke, Qiu, Rihong, Xu, Yongxin, Zhang, Wentao, Zhu, Yichen, Zhang, Ruizhe, Fang, Yuchen, Chu, Xu, Zhao, Junfeng, Wang, Yasha
Graph Neural Networks (GNNs) have become essential in interpreting relational data across various domains, yet, they often struggle to generalize to unseen graph data that differs markedly from training instances. In this paper, we introduce a novel framework called General Retrieval-Augmented Graph Learning (RAGraph), which brings external graph data into the general graph foundation model to improve model generalization on unseen scenarios. On the top of our framework is a toy graph vector library that we established, which captures key attributes, such as features and task-specific label information. During inference, the RAGraph adeptly retrieves similar toy graphs based on key similarities in downstream tasks, integrating the retrieved data to enrich the learning context via the message-passing prompting mechanism. Our extensive experimental evaluations demonstrate that RAGraph significantly outperforms state-of-the-art graph learning methods in multiple tasks such as node classification, link prediction, and graph classification across both dynamic and static datasets. Furthermore, extensive testing confirms that RAGraph consistently maintains high performance without the need for task-specific fine-tuning, highlighting its adaptability, robustness, and broad applicability.
Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models
Fang, Yuchen, Na, Sen, Mahoney, Michael W., Kolar, Mladen
In this work, we consider solving optimization problems with a stochastic objective and deterministic equality constraints. We propose a Trust-Region Sequential Quadratic Programming method to find both first- and second-order stationary points. Our method utilizes a random model to represent the objective function, which is constructed from stochastic observations of the objective and is designed to satisfy proper adaptive accuracy conditions with a high but fixed probability. To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a quadratic approximation of the objective subject to a (relaxed) linear approximation of the problem constraints and a trust-region constraint. To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature of the reduced Hessian matrix, as well as a second-order correction step to address the potential Maratos effect, which arises due to the nonlinearity of the problem constraints. Such an effect may impede the method from moving away from saddle points. Both gradient and eigen step computations leverage a novel parameter-free decomposition of the step and the trust-region radius, accounting for the proportions among the feasibility residual, optimality residual, and negative curvature. We establish global almost sure first- and second-order convergence guarantees for our method, and present computational results on CUTEst problems, regression problems, and saddle-point problems to demonstrate its superiority over existing line-search-based stochastic methods.
ContiFormer: Continuous-Time Transformer for Irregular Time Series Modeling
Chen, Yuqi, Ren, Kan, Wang, Yansen, Fang, Yuchen, Sun, Weiwei, Li, Dongsheng
Modeling continuous-time dynamics on irregular time series is critical to account for data evolution and correlations that occur continuously. Traditional methods including recurrent neural networks or Transformer models leverage inductive bias via powerful neural architectures to capture complex patterns. However, due to their discrete characteristic, they have limitations in generalizing to continuous-time data paradigms. Though neural ordinary differential equations (Neural ODEs) and their variants have shown promising results in dealing with irregular time series, they often fail to capture the intricate correlations within these sequences. It is challenging yet demanding to concurrently model the relationship between input data points and capture the dynamic changes of the continuous-time system. To tackle this problem, we propose ContiFormer that extends the relation modeling of vanilla Transformer to the continuous-time domain, which explicitly incorporates the modeling abilities of continuous dynamics of Neural ODEs with the attention mechanism of Transformers. We mathematically characterize the expressive power of ContiFormer and illustrate that, by curated designs of function hypothesis, many Transformer variants specialized in irregular time series modeling can be covered as a special case of ContiFormer. A wide range of experiments on both synthetic and real-world datasets have illustrated the superior modeling capacities and prediction performance of ContiFormer on irregular time series data.
Time Series Supplier Allocation via Deep Black-Litterman Model
Luo, Jiayuan, Zhang, Wentao, Fang, Yuchen, Gao, Xiaowei, Zhuang, Dingyi, Chen, Hao, Jiang, Xinke
Time Series Supplier Allocation (TSSA) poses a complex NP-hard challenge, aimed at refining future order dispatching strategies to satisfy order demands with maximum supply efficiency fully. Traditionally derived from financial portfolio management, the Black-Litterman (BL) model offers a new perspective for the TSSA scenario by balancing expected returns against insufficient supply risks. However, its application within TSSA is constrained by the reliance on manually constructed perspective matrices and spatio-temporal market dynamics, coupled with the absence of supervisory signals and data unreliability inherent to supplier information. To solve these limitations, we introduce the pioneering Deep Black-Litterman Model (DBLM), which innovatively adapts the BL model from financial roots to supply chain context. Leveraging the Spatio-Temporal Graph Neural Networks (STGNNS), DBLM automatically generates future perspective matrices for TSSA, by integrating spatio-temporal dependency. Moreover, a novel Spearman rank correlation distinctively supervises our approach to address the lack of supervisory signals, specifically designed to navigate through the complexities of supplier risks and interactions. This is further enhanced by a masking mechanism aimed at counteracting the biases from unreliable data, thereby improving the model's precision and reliability. Extensive experimentation on two datasets unequivocally demonstrates DBLM's enhanced performance in TSSA, setting new standards for the field. Our findings and methodology are made available for community access and further development.
Infinite-Horizon Graph Filters: Leveraging Power Series to Enhance Sparse Information Aggregation
Zhang, Ruizhe, Jiang, Xinke, Fang, Yuchen, Luo, Jiayuan, Xu, Yongxin, Zhu, Yichen, Chu, Xu, Zhao, Junfeng, Wang, Yasha
Graph Neural Networks (GNNs) have shown considerable effectiveness in a variety of graph learning tasks, particularly those based on the message-passing approach in recent years. However, their performance is often constrained by a limited receptive field, a challenge that becomes more acute in the presence of sparse graphs. In light of the power series, which possesses infinite expansion capabilities, we propose a novel Graph Power Filter Neural Network (GPFN) that enhances node classification by employing a power series graph filter to augment the receptive field. Concretely, our GPFN designs a new way to build a graph filter with an infinite receptive field based on the convergence power series, which can be analyzed in the spectral and spatial domains. Besides, we theoretically prove that our GPFN is a general framework that can integrate any power series and capture long-range dependencies. Finally, experimental results on three datasets demonstrate the superiority of our GPFN over state-of-the-art baselines.
Spatio-Temporal Graph Neural Networks for Predictive Learning in Urban Computing: A Survey
Jin, Guangyin, Liang, Yuxuan, Fang, Yuchen, Shao, Zezhi, Huang, Jincai, Zhang, Junbo, Zheng, Yu
With recent advances in sensing technologies, a myriad of spatio-temporal data has been generated and recorded in smart cities. Forecasting the evolution patterns of spatio-temporal data is an important yet demanding aspect of urban computing, which can enhance intelligent management decisions in various fields, including transportation, environment, climate, public safety, healthcare, and others. Traditional statistical and deep learning methods struggle to capture complex correlations in urban spatio-temporal data. To this end, Spatio-Temporal Graph Neural Networks (STGNN) have been proposed, achieving great promise in recent years. STGNNs enable the extraction of complex spatio-temporal dependencies by integrating graph neural networks (GNNs) and various temporal learning methods. In this manuscript, we provide a comprehensive survey on recent progress on STGNN technologies for predictive learning in urban computing. Firstly, we provide a brief introduction to the construction methods of spatio-temporal graph data and the prevalent deep-learning architectures used in STGNNs. We then sort out the primary application domains and specific predictive learning tasks based on existing literature. Afterward, we scrutinize the design of STGNNs and their combination with some advanced technologies in recent years. Finally, we conclude the limitations of existing research and suggest potential directions for future work.
CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models
Fu, Lingyue, Chai, Huacan, Luo, Shuang, Du, Kounianhua, Zhang, Weiming, Fan, Longteng, Lei, Jiayi, Rui, Renting, Lin, Jianghao, Fang, Yuchen, Liu, Yifan, Wang, Jingkuan, Qi, Siyuan, Zhang, Kangning, Zhang, Weinan, Yu, Yong
With the emergence of Large Language Models (LLMs), there has been a significant improvement in the programming capabilities of models, attracting growing attention from researchers. We propose CodeApex, a bilingual benchmark dataset focusing on the programming comprehension and code generation abilities of LLMs. CodeApex comprises three types of multiple-choice questions: conceptual understanding, commonsense reasoning, and multi-hop reasoning, designed to evaluate LLMs on programming comprehension tasks. Additionally, CodeApex utilizes algorithmic questions and corresponding test cases to assess the code quality generated by LLMs. We evaluate 14 state-of-the-art LLMs, including both general-purpose and specialized models. GPT exhibits the best programming capabilities, achieving approximate accuracies of 50% and 56% on the two tasks, respectively. There is still significant room for improvement in programming tasks. We hope that CodeApex can serve as a reference for evaluating the coding capabilities of LLMs, further promoting their development and growth. Datasets are released at https://github.com/APEXLAB/CodeApex.git. CodeApex submission website is https://apex.sjtu.edu.cn/codeapex/.
HUTFormer: Hierarchical U-Net Transformer for Long-Term Traffic Forecasting
Shao, Zezhi, Wang, Fei, Zhang, Zhao, Fang, Yuchen, Jin, Guangyin, Xu, Yongjun
Traffic forecasting, which aims to predict traffic conditions based on historical observations, has been an enduring research topic and is widely recognized as an essential component of intelligent transportation. Recent proposals on Spatial-Temporal Graph Neural Networks (STGNNs) have made significant progress by combining sequential models with graph convolution networks. However, due to high complexity issues, STGNNs only focus on short-term traffic forecasting, e.g., 1-hour forecasting, while ignoring more practical long-term forecasting. In this paper, we make the first attempt to explore long-term traffic forecasting, e.g., 1-day forecasting. To this end, we first reveal its unique challenges in exploiting multi-scale representations. Then, we propose a novel Hierarchical U-net TransFormer (HUTFormer) to address the issues of long-term traffic forecasting. HUTFormer consists of a hierarchical encoder and decoder to jointly generate and utilize multi-scale representations of traffic data. Specifically, for the encoder, we propose window self-attention and segment merging to extract multi-scale representations from long-term traffic data. For the decoder, we design a cross-scale attention mechanism to effectively incorporate multi-scale representations. In addition, HUTFormer employs an efficient input embedding strategy to address the complexity issues. Extensive experiments on four traffic datasets show that the proposed HUTFormer significantly outperforms state-of-the-art traffic forecasting and long time series forecasting baselines.