AITopics | Chen, Zhijie

Plotting

Chen, Zhijie

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Optimal Control Operator Perspective and a Neural Adaptive Spectral Method

Feng, Mingquan, Chen, Zhijie, Huang, Yixin, Liu, Yizhou, Yan, Junchi

arXiv.org Artificial IntelligenceDec-16-2024

Optimal control problems (OCPs) involve finding a control function for a dynamical system such that a cost functional is optimized. It is central to physical systems in both academia and industry. In this paper, we propose a novel instance-solution control operator perspective, which solves OCPs in a one-shot manner without direct dependence on the explicit expression of dynamics or iterative optimization processes. The control operator is implemented by a new neural operator architecture named Neural Adaptive Spectral Method (NASM), a generalization of classical spectral methods. We theoretically validate the perspective and architecture by presenting the approximation error bounds of NASM for the control operator. Experiments on synthetic environments and a real-world dataset verify the effectiveness and efficiency of our approach, including substantial speedup in running time, and high-quality in- and out-of-distribution generalization.

artificial intelligence, machine learning, optimization problem, (21 more...)

arXiv.org Artificial Intelligence

2412.12469

Country: North America > United States > Illinois (0.14)

Genre: Research Report (1.00)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Sketched Adaptive Federated Deep Learning: A Sharp Convergence Analysis

Chen, Zhijie, Li, Qiaobo, Banerjee, Arindam

arXiv.org Artificial IntelligenceNov-11-2024

Despite the recent success of federated learning (FL), the cost of communication arguably remains the main challenge. Wang et al. (2023) showed that a 20 Gbps network bandwidth is necessary to bring the communication overhead to a suitable scale for finetuning GPT-J-6B, which is unrealistic in distributed settings. Even with good network conditions, reduction on the communication complexity means one can train much larger models given the same communication budget. The communication cost of FL can be represented as O(dT), where d is the ambient dimension of the parameter space and T is the number of rounds for convergence. Various methods have been proposed to minimize T, e.g., local training (Stich, 2018), large batch training (Xu et al., 2023). Folklores in centralized training regimes suggest that T heavily relies on the choice of optimizers, where adaptive methods usually demonstrate faster convergence and better generalization performance, especially in transformer-based machine learning models (Reddi et al., 2019). In decentralized settings, adaptive methods are also favorable due to their robustness to data heterogeneity, e.g., adaptive methods are guaranteed to converge under heavy-tailed noise while SGD does not (Zhang et al., 2020). These favorable merits, in principle, should be preserved in communication-efficient FL algorithms. The alternative approach of reducing communication costs is to be more thrifty on the communication bits at a single round, i.e., to reduce the O(d) factor, which is dominant in the communication complexity for modern neural networks where d T. Considerable efforts have been devoted to design efficient gradient compression methods.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2411.0677

Country:

North America > United States (0.14)
Asia (0.14)

Genre: Research Report (0.50)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LArctan-SKAN: Simple and Efficient Single-Parameterized Kolmogorov-Arnold Networks using Learnable Trigonometric Function

Chen, Zhijie, Zhang, Xinglin

arXiv.org Artificial IntelligenceOct-25-2024

This paper proposes a novel approach for designing Single-Parameterized Kolmogorov-Arnold Networks (SKAN) by utilizing a Single-Parameterized Function (SFunc) constructed from trigonometric functions. Experimental validation on the MNIST dataset demonstrates that LArctan-SKAN excels in both accuracy and computational efficiency. Specifically, LArctan-SKAN significantly improves test set accuracy over existing models, outperforming all pure KAN variants compared, including FourierKAN, LSS-SKAN, and Spl-KAN. Furthermore, LArctan-SKAN exhibits remarkable computational efficiency, with a training speed increase of 535.01% These results confirm the effectiveness and potential of SKANs constructed with trigonometric functions.

artificial intelligence, larctan-skan, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2410.1936

Country: Asia > China (0.15)

Genre: Research Report > New Finding (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

LSS-SKAN: Efficient Kolmogorov-Arnold Networks based on Single-Parameterized Function

Chen, Zhijie, Zhang, Xinglin

arXiv.org Artificial IntelligenceOct-18-2024

The recently proposed Kolmogorov-Arnold Networks (KAN) networks have attracted increasing attention due to their advantage of high visualizability compared to MLP. In this paper, based on a series of small-scale experiments, we proposed the Efficient KAN Expansion Principle (EKE Principle): allocating parameters to expand network scale, rather than employing more complex basis functions, leads to more efficient performance improvements in KANs. Based on this principle, we proposed a superior KAN termed SKAN, where the basis function utilizes only a single learnable parameter. We then evaluated various single-parameterized functions for constructing SKANs, with LShifted Softplus-based SKANs (LSS-SKANs) demonstrating superior accuracy. Subsequently, extensive experiments were performed, comparing LSS-SKAN with other KAN variants on the MNIST dataset. In the final accuracy tests, LSS-SKAN exhibited superior performance on the MNIST dataset compared to all tested pure KAN variants. Regarding execution speed, LSS-SKAN outperformed all compared popular KAN variants. Zhijie Chen and Xinglin Zhang are with School of Computer Science and Engineering, South China University of Technology, Guangzhou, China. The rapid development of artificial intelligence (AI) is reshaping our world.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.14951

Country: Asia > China > Guangdong Province > Guangzhou (0.24)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Optimizing AD Pruning of Sponsored Search with Reinforcement Learning

Lian, Yijiang, Chen, Zhijie, Pei, Xin, Li, Shuang, Wang, Yifei, Qiu, Yuefeng, Zhang, Zhiheng, Tao, Zhipeng, Yuan, Liang, Guan, Hanju, Zhang, Kefeng, Li, Zhigang, Liu, Xiaochun

arXiv.org Machine LearningAug-5-2020

Industrial sponsored search system (SSS) can be logically divided into three modules: keywords matching, ad retrieving, and ranking. During ad retrieving, the ad candidates grow exponentially. A query with high commercial value might retrieve a great deal of ad candidates such that the ranking module could not afford. Due to limited latency and computing resources, the candidates have to be pruned earlier. Suppose we set a pruning line to cut SSS into two parts: upstream and downstream. The problem we are going to address is: how to pick out the best $K$ items from $N$ candidates provided by the upstream to maximize the total system's revenue. Since the industrial downstream is very complicated and updated quickly, a crucial restriction in this problem is that the selection scheme should get adapted to the downstream. In this paper, we propose a novel model-free reinforcement learning approach to fixing this problem. Our approach considers downstream as a black-box environment, and the agent sequentially selects items and finally feeds into the downstream, where revenue would be estimated and used as a reward to improve the selection policy. To the best of our knowledge, this is first time to consider the system optimization from a downstream adaption view. It is also the first time to use reinforcement learning techniques to tackle this problem. The idea has been successfully realized in Baidu's sponsored search system, and online long time A/B test shows remarkable improvements on revenue.

artificial intelligence, downstream system, reinforcement learning, (15 more...)

arXiv.org Machine Learning

2008.02014

Country: North America > United States (0.18)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback