AITopics | Fang, Yuchen

Collaborating Authors

Fang, Yuchen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

High Probability Complexity Bounds of Trust-Region Stochastic Sequential Quadratic Programming with Heavy-Tailed Noise

Fang, Yuchen, Lavaei, Javad, Na, Sen

arXiv.org Machine LearningApr-6-2025

In this paper, we consider nonlinear optimization problems with a stochastic objective and deterministic equality constraints. We propose a Trust-Region Stochastic Sequential Quadratic Programming (TR-SSQP) method and establish its high-probability iteration complexity bounds for identifying first- and second-order $\epsilon$-stationary points. In our algorithm, we assume that exact objective values, gradients, and Hessians are not directly accessible but can be estimated via zeroth-, first-, and second-order probabilistic oracles. Compared to existing complexity studies of SSQP methods that rely on a zeroth-order oracle with sub-exponential tail noise (i.e., light-tailed) and focus mostly on first-order stationarity, our analysis accommodates irreducible and heavy-tailed noise in the zeroth-order oracle and significantly extends the analysis to second-order stationarity. We show that under heavy-tailed noise conditions, our SSQP method achieves the same high-probability first-order iteration complexity bounds as in the light-tailed noise setting, while further exhibiting promising second-order iteration complexity bounds. Specifically, the method identifies a first-order $\epsilon$-stationary point in $\mathcal{O}(\epsilon^{-2})$ iterations and a second-order $\epsilon$-stationary point in $\mathcal{O}(\epsilon^{-3})$ iterations with high probability, provided that $\epsilon$ is lower bounded by a constant determined by the irreducible noise level in estimation. We validate our theoretical findings and evaluate the practical performance of our method on CUTEst benchmark test set.

artificial intelligence, iteration, machine learning, (19 more...)

arXiv.org Machine Learning

2503.19091

Country: North America > United States (0.67)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Mathematics of Computing (0.89)

Add feedback

Efficient Large-Scale Traffic Forecasting with Transformers: A Spatial Data Management Perspective

Fang, Yuchen, Liang, Yuxuan, Hui, Bo, Shao, Zezhi, Deng, Liwei, Liu, Xu, Jiang, Xinke, Zheng, Kai

arXiv.org Artificial IntelligenceDec-30-2024

Road traffic forecasting is crucial in real-world intelligent transportation scenarios like traffic dispatching and path planning in city management and personal traveling. Spatio-temporal graph neural networks (STGNNs) stand out as the mainstream solution in this task. Nevertheless, the quadratic complexity of remarkable dynamic spatial modeling-based STGNNs has become the bottleneck over large-scale traffic data. From the spatial data management perspective, we present a novel Transformer framework called PatchSTG to efficiently and dynamically model spatial dependencies for large-scale traffic forecasting with interpretability and fidelity. Specifically, we design a novel irregular spatial patching to reduce the number of points involved in the dynamic calculation of Transformer. The irregular spatial patching first utilizes the leaf K-dimensional tree (KDTree) to recursively partition irregularly distributed traffic points into leaf nodes with a small capacity, and then merges leaf nodes belonging to the same subtree into occupancy-equaled and non-overlapped patches through padding and backtracking. Based on the patched data, depth and breadth attention are used interchangeably in the encoder to dynamically learn local and global spatial knowledge from points in a patch and points with the same index of patches. Experimental results on four real world large-scale traffic datasets show that our PatchSTG achieves train speed and memory utilization improvements up to $10\times$ and $4\times$ with the state-of-the-art performance.

artificial intelligence, machine learning, proceedings, (17 more...)

arXiv.org Artificial Intelligence

2412.09972

Country:

North America (0.70)
Asia > China (0.46)

Genre: Research Report (1.00)

Industry: Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

RAGraph: A General Retrieval-Augmented Graph Learning Framework

Jiang, Xinke, Qiu, Rihong, Xu, Yongxin, Zhang, Wentao, Zhu, Yichen, Zhang, Ruizhe, Fang, Yuchen, Chu, Xu, Zhao, Junfeng, Wang, Yasha

arXiv.org Artificial IntelligenceDec-7-2024

Graph Neural Networks (GNNs) have become essential in interpreting relational data across various domains, yet, they often struggle to generalize to unseen graph data that differs markedly from training instances. In this paper, we introduce a novel framework called General Retrieval-Augmented Graph Learning (RAGraph), which brings external graph data into the general graph foundation model to improve model generalization on unseen scenarios. On the top of our framework is a toy graph vector library that we established, which captures key attributes, such as features and task-specific label information. During inference, the RAGraph adeptly retrieves similar toy graphs based on key similarities in downstream tasks, integrating the retrieved data to enrich the learning context via the message-passing prompting mechanism. Our extensive experimental evaluations demonstrate that RAGraph significantly outperforms state-of-the-art graph learning methods in multiple tasks such as node classification, link prediction, and graph classification across both dynamic and static datasets. Furthermore, extensive testing confirms that RAGraph consistently maintains high performance without the need for task-specific fine-tuning, highlighting its adaptability, robustness, and broad applicability.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.23855

Country:

Asia > China (0.28)
North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Information Technology (0.92)
Health & Medicine > Therapeutic Area (0.46)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models

Fang, Yuchen, Na, Sen, Mahoney, Michael W., Kolar, Mladen

arXiv.org Machine LearningSep-26-2024

In this work, we consider solving optimization problems with a stochastic objective and deterministic equality constraints. We propose a Trust-Region Sequential Quadratic Programming method to find both first- and second-order stationary points. Our method utilizes a random model to represent the objective function, which is constructed from stochastic observations of the objective and is designed to satisfy proper adaptive accuracy conditions with a high but fixed probability. To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a quadratic approximation of the objective subject to a (relaxed) linear approximation of the problem constraints and a trust-region constraint. To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature of the reduced Hessian matrix, as well as a second-order correction step to address the potential Maratos effect, which arises due to the nonlinearity of the problem constraints. Such an effect may impede the method from moving away from saddle points. Both gradient and eigen step computations leverage a novel parameter-free decomposition of the step and the trust-region radius, accounting for the proportions among the feasibility residual, optimality residual, and negative curvature. We establish global almost sure first- and second-order convergence guarantees for our method, and present computational results on CUTEst problems, regression problems, and saddle-point problems to demonstrate its superiority over existing line-search-based stochastic methods.

artificial intelligence, iteration, machine learning, (18 more...)

arXiv.org Machine Learning

2409.15734

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

ContiFormer: Continuous-Time Transformer for Irregular Time Series Modeling

Chen, Yuqi, Ren, Kan, Wang, Yansen, Fang, Yuchen, Sun, Weiwei, Li, Dongsheng

arXiv.org Artificial IntelligenceFeb-16-2024

Modeling continuous-time dynamics on irregular time series is critical to account for data evolution and correlations that occur continuously. Traditional methods including recurrent neural networks or Transformer models leverage inductive bias via powerful neural architectures to capture complex patterns. However, due to their discrete characteristic, they have limitations in generalizing to continuous-time data paradigms. Though neural ordinary differential equations (Neural ODEs) and their variants have shown promising results in dealing with irregular time series, they often fail to capture the intricate correlations within these sequences. It is challenging yet demanding to concurrently model the relationship between input data points and capture the dynamic changes of the continuous-time system. To tackle this problem, we propose ContiFormer that extends the relation modeling of vanilla Transformer to the continuous-time domain, which explicitly incorporates the modeling abilities of continuous dynamics of Neural ODEs with the attention mechanism of Transformers. We mathematically characterize the expressive power of ContiFormer and illustrate that, by curated designs of function hypothesis, many Transformer variants specialized in irregular time series modeling can be covered as a special case of ContiFormer. A wide range of experiments on both synthetic and real-world datasets have illustrated the superior modeling capacities and prediction performance of ContiFormer on irregular time series data.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2402.10635

Country:

North America > United States (0.46)
Asia (0.46)

Genre: Research Report > New Finding (0.45)

Industry:

Health & Medicine > Therapeutic Area (0.68)
Health & Medicine > Epidemiology (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Time Series Supplier Allocation via Deep Black-Litterman Model

Luo, Jiayuan, Zhang, Wentao, Fang, Yuchen, Gao, Xiaowei, Zhuang, Dingyi, Chen, Hao, Jiang, Xinke

arXiv.org Artificial IntelligenceFeb-9-2024

Time Series Supplier Allocation (TSSA) poses a complex NP-hard challenge, aimed at refining future order dispatching strategies to satisfy order demands with maximum supply efficiency fully. Traditionally derived from financial portfolio management, the Black-Litterman (BL) model offers a new perspective for the TSSA scenario by balancing expected returns against insufficient supply risks. However, its application within TSSA is constrained by the reliance on manually constructed perspective matrices and spatio-temporal market dynamics, coupled with the absence of supervisory signals and data unreliability inherent to supplier information. To solve these limitations, we introduce the pioneering Deep Black-Litterman Model (DBLM), which innovatively adapts the BL model from financial roots to supply chain context. Leveraging the Spatio-Temporal Graph Neural Networks (STGNNS), DBLM automatically generates future perspective matrices for TSSA, by integrating spatio-temporal dependency. Moreover, a novel Spearman rank correlation distinctively supervises our approach to address the lack of supervisory signals, specifically designed to navigate through the complexities of supplier risks and interactions. This is further enhanced by a masking mechanism aimed at counteracting the biases from unreliable data, thereby improving the model's precision and reliability. Extensive experimentation on two datasets unequivocally demonstrates DBLM's enhanced performance in TSSA, setting new standards for the field. Our findings and methodology are made available for community access and further development.

data mining, machine learning, supplier, (18 more...)

arXiv.org Artificial Intelligence

2401.1735

Country:

Europe (0.47)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
(2 more...)

Add feedback

Infinite-Horizon Graph Filters: Leveraging Power Series to Enhance Sparse Information Aggregation

Zhang, Ruizhe, Jiang, Xinke, Fang, Yuchen, Luo, Jiayuan, Xu, Yongxin, Zhu, Yichen, Chu, Xu, Zhao, Junfeng, Wang, Yasha

arXiv.org Artificial IntelligenceJan-23-2024

Graph Neural Networks (GNNs) have shown considerable effectiveness in a variety of graph learning tasks, particularly those based on the message-passing approach in recent years. However, their performance is often constrained by a limited receptive field, a challenge that becomes more acute in the presence of sparse graphs. In light of the power series, which possesses infinite expansion capabilities, we propose a novel Graph Power Filter Neural Network (GPFN) that enhances node classification by employing a power series graph filter to augment the receptive field. Concretely, our GPFN designs a new way to build a graph filter with an infinite receptive field based on the convergence power series, which can be analyzed in the spectral and spatial domains. Besides, we theoretically prove that our GPFN is a general framework that can integrate any power series and capture long-range dependencies. Finally, experimental results on three datasets demonstrate the superiority of our GPFN over state-of-the-art baselines.

artificial intelligence, graph filter, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2401.09943

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Spatio-Temporal Graph Neural Networks for Predictive Learning in Urban Computing: A Survey

Jin, Guangyin, Liang, Yuxuan, Fang, Yuchen, Shao, Zezhi, Huang, Jincai, Zhang, Junbo, Zheng, Yu

arXiv.org Artificial IntelligenceNov-23-2023

With recent advances in sensing technologies, a myriad of spatio-temporal data has been generated and recorded in smart cities. Forecasting the evolution patterns of spatio-temporal data is an important yet demanding aspect of urban computing, which can enhance intelligent management decisions in various fields, including transportation, environment, climate, public safety, healthcare, and others. Traditional statistical and deep learning methods struggle to capture complex correlations in urban spatio-temporal data. To this end, Spatio-Temporal Graph Neural Networks (STGNN) have been proposed, achieving great promise in recent years. STGNNs enable the extraction of complex spatio-temporal dependencies by integrating graph neural networks (GNNs) and various temporal learning methods. In this manuscript, we provide a comprehensive survey on recent progress on STGNN technologies for predictive learning in urban computing. Firstly, we provide a brief introduction to the construction methods of spatio-temporal graph data and the prevalent deep-learning architectures used in STGNNs. We then sort out the primary application domains and specific predictive learning tasks based on existing literature. Afterward, we scrutinize the design of STGNNs and their combination with some advanced technologies in recent years. Finally, we conclude the limitations of existing research and suggest potential directions for future work.

artificial intelligence, machine learning, prediction, (16 more...)

arXiv.org Artificial Intelligence

2303.14483

Country:

Asia > China (1.00)
North America > United States > California (0.28)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Education (0.92)
Transportation > Ground > Road (0.68)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.68)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models

Fu, Lingyue, Chai, Huacan, Luo, Shuang, Du, Kounianhua, Zhang, Weiming, Fan, Longteng, Lei, Jiayi, Rui, Renting, Lin, Jianghao, Fang, Yuchen, Liu, Yifan, Wang, Jingkuan, Qi, Siyuan, Zhang, Kangning, Zhang, Weinan, Yu, Yong

arXiv.org Artificial IntelligenceSep-10-2023

With the emergence of Large Language Models (LLMs), there has been a significant improvement in the programming capabilities of models, attracting growing attention from researchers. We propose CodeApex, a bilingual benchmark dataset focusing on the programming comprehension and code generation abilities of LLMs. CodeApex comprises three types of multiple-choice questions: conceptual understanding, commonsense reasoning, and multi-hop reasoning, designed to evaluate LLMs on programming comprehension tasks. Additionally, CodeApex utilizes algorithmic questions and corresponding test cases to assess the code quality generated by LLMs. We evaluate 14 state-of-the-art LLMs, including both general-purpose and specialized models. GPT exhibits the best programming capabilities, achieving approximate accuracies of 50% and 56% on the two tasks, respectively. There is still significant room for improvement in programming tasks. We hope that CodeApex can serve as a reference for evaluating the coding capabilities of LLMs, further promoting their development and growth. Datasets are released at https://github.com/APEXLAB/CodeApex.git. CodeApex submission website is https://apex.sjtu.edu.cn/codeapex/.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2309.0194

Genre: Research Report (0.82)

Industry: Education (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

HUTFormer: Hierarchical U-Net Transformer for Long-Term Traffic Forecasting

Shao, Zezhi, Wang, Fei, Zhang, Zhao, Fang, Yuchen, Jin, Guangyin, Xu, Yongjun

arXiv.org Artificial IntelligenceJul-26-2023

Traffic forecasting, which aims to predict traffic conditions based on historical observations, has been an enduring research topic and is widely recognized as an essential component of intelligent transportation. Recent proposals on Spatial-Temporal Graph Neural Networks (STGNNs) have made significant progress by combining sequential models with graph convolution networks. However, due to high complexity issues, STGNNs only focus on short-term traffic forecasting, e.g., 1-hour forecasting, while ignoring more practical long-term forecasting. In this paper, we make the first attempt to explore long-term traffic forecasting, e.g., 1-day forecasting. To this end, we first reveal its unique challenges in exploiting multi-scale representations. Then, we propose a novel Hierarchical U-net TransFormer (HUTFormer) to address the issues of long-term traffic forecasting. HUTFormer consists of a hierarchical encoder and decoder to jointly generate and utilize multi-scale representations of traffic data. Specifically, for the encoder, we propose window self-attention and segment merging to extract multi-scale representations from long-term traffic data. For the decoder, we design a cross-scale attention mechanism to effectively incorporate multi-scale representations. In addition, HUTFormer employs an efficient input embedding strategy to address the complexity issues. Extensive experiments on four traffic datasets show that the proposed HUTFormer significantly outperforms state-of-the-art traffic forecasting and long time series forecasting baselines.

data mining, forecasting, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2307.14596

Country:

North America > United States (1.00)
Europe (1.00)
Asia > China (0.69)

Genre: Research Report (0.64)

Industry:

Transportation (0.88)
Consumer Products & Services > Travel (0.35)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Add feedback