Not enough data to create a plot.
Try a different view from the menu above.
Chen, Zhijie
Optimal Control Operator Perspective and a Neural Adaptive Spectral Method
Feng, Mingquan, Chen, Zhijie, Huang, Yixin, Liu, Yizhou, Yan, Junchi
Optimal control problems (OCPs) involve finding a control function for a dynamical system such that a cost functional is optimized. It is central to physical systems in both academia and industry. In this paper, we propose a novel instance-solution control operator perspective, which solves OCPs in a one-shot manner without direct dependence on the explicit expression of dynamics or iterative optimization processes. The control operator is implemented by a new neural operator architecture named Neural Adaptive Spectral Method (NASM), a generalization of classical spectral methods. We theoretically validate the perspective and architecture by presenting the approximation error bounds of NASM for the control operator. Experiments on synthetic environments and a real-world dataset verify the effectiveness and efficiency of our approach, including substantial speedup in running time, and high-quality in- and out-of-distribution generalization.
Sketched Adaptive Federated Deep Learning: A Sharp Convergence Analysis
Chen, Zhijie, Li, Qiaobo, Banerjee, Arindam
Despite the recent success of federated learning (FL), the cost of communication arguably remains the main challenge. Wang et al. (2023) showed that a 20 Gbps network bandwidth is necessary to bring the communication overhead to a suitable scale for finetuning GPT-J-6B, which is unrealistic in distributed settings. Even with good network conditions, reduction on the communication complexity means one can train much larger models given the same communication budget. The communication cost of FL can be represented as O(dT), where d is the ambient dimension of the parameter space and T is the number of rounds for convergence. Various methods have been proposed to minimize T, e.g., local training (Stich, 2018), large batch training (Xu et al., 2023). Folklores in centralized training regimes suggest that T heavily relies on the choice of optimizers, where adaptive methods usually demonstrate faster convergence and better generalization performance, especially in transformer-based machine learning models (Reddi et al., 2019). In decentralized settings, adaptive methods are also favorable due to their robustness to data heterogeneity, e.g., adaptive methods are guaranteed to converge under heavy-tailed noise while SGD does not (Zhang et al., 2020). These favorable merits, in principle, should be preserved in communication-efficient FL algorithms. The alternative approach of reducing communication costs is to be more thrifty on the communication bits at a single round, i.e., to reduce the O(d) factor, which is dominant in the communication complexity for modern neural networks where d T. Considerable efforts have been devoted to design efficient gradient compression methods.
LArctan-SKAN: Simple and Efficient Single-Parameterized Kolmogorov-Arnold Networks using Learnable Trigonometric Function
Chen, Zhijie, Zhang, Xinglin
This paper proposes a novel approach for designing Single-Parameterized Kolmogorov-Arnold Networks (SKAN) by utilizing a Single-Parameterized Function (SFunc) constructed from trigonometric functions. Experimental validation on the MNIST dataset demonstrates that LArctan-SKAN excels in both accuracy and computational efficiency. Specifically, LArctan-SKAN significantly improves test set accuracy over existing models, outperforming all pure KAN variants compared, including FourierKAN, LSS-SKAN, and Spl-KAN. Furthermore, LArctan-SKAN exhibits remarkable computational efficiency, with a training speed increase of 535.01% These results confirm the effectiveness and potential of SKANs constructed with trigonometric functions.
LSS-SKAN: Efficient Kolmogorov-Arnold Networks based on Single-Parameterized Function
Chen, Zhijie, Zhang, Xinglin
The recently proposed Kolmogorov-Arnold Networks (KAN) networks have attracted increasing attention due to their advantage of high visualizability compared to MLP. In this paper, based on a series of small-scale experiments, we proposed the Efficient KAN Expansion Principle (EKE Principle): allocating parameters to expand network scale, rather than employing more complex basis functions, leads to more efficient performance improvements in KANs. Based on this principle, we proposed a superior KAN termed SKAN, where the basis function utilizes only a single learnable parameter. We then evaluated various single-parameterized functions for constructing SKANs, with LShifted Softplus-based SKANs (LSS-SKANs) demonstrating superior accuracy. Subsequently, extensive experiments were performed, comparing LSS-SKAN with other KAN variants on the MNIST dataset. In the final accuracy tests, LSS-SKAN exhibited superior performance on the MNIST dataset compared to all tested pure KAN variants. Regarding execution speed, LSS-SKAN outperformed all compared popular KAN variants. Zhijie Chen and Xinglin Zhang are with School of Computer Science and Engineering, South China University of Technology, Guangzhou, China. The rapid development of artificial intelligence (AI) is reshaping our world.
Optimizing AD Pruning of Sponsored Search with Reinforcement Learning
Lian, Yijiang, Chen, Zhijie, Pei, Xin, Li, Shuang, Wang, Yifei, Qiu, Yuefeng, Zhang, Zhiheng, Tao, Zhipeng, Yuan, Liang, Guan, Hanju, Zhang, Kefeng, Li, Zhigang, Liu, Xiaochun
Industrial sponsored search system (SSS) can be logically divided into three modules: keywords matching, ad retrieving, and ranking. During ad retrieving, the ad candidates grow exponentially. A query with high commercial value might retrieve a great deal of ad candidates such that the ranking module could not afford. Due to limited latency and computing resources, the candidates have to be pruned earlier. Suppose we set a pruning line to cut SSS into two parts: upstream and downstream. The problem we are going to address is: how to pick out the best $K$ items from $N$ candidates provided by the upstream to maximize the total system's revenue. Since the industrial downstream is very complicated and updated quickly, a crucial restriction in this problem is that the selection scheme should get adapted to the downstream. In this paper, we propose a novel model-free reinforcement learning approach to fixing this problem. Our approach considers downstream as a black-box environment, and the agent sequentially selects items and finally feeds into the downstream, where revenue would be estimated and used as a reward to improve the selection policy. To the best of our knowledge, this is first time to consider the system optimization from a downstream adaption view. It is also the first time to use reinforcement learning techniques to tackle this problem. The idea has been successfully realized in Baidu's sponsored search system, and online long time A/B test shows remarkable improvements on revenue.