AITopics | Pan, Zhisong

Collaborating Authors

Pan, Zhisong

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Neural Tangent Kernel View on Federated Averaging for Deep Linear Neural Network

Liu, Xin, Zhan, Dazhi, Tao, Wei, Ma, Xin, Pan, Yu, Ding, Yu, Pan, Zhisong

arXiv.org Machine LearningOct-9-2023

Federated averaging (FedAvg) is a widely employed paradigm for collaboratively training models from distributed clients without sharing data. Nowadays, the neural network has achieved remarkable success due to its extraordinary performance, which makes it a preferred choice as the model in FedAvg. However, the optimization problem of the neural network is often non-convex even non-smooth. Furthermore, FedAvg always involves multiple clients and local updates, which results in an inaccurate updating direction. These properties bring difficulties in analyzing the convergence of FedAvg in training neural networks. Recently, neural tangent kernel (NTK) theory has been proposed towards understanding the convergence of first-order methods in tackling the non-convex problem of neural networks. The deep linear neural network is a classical model in theoretical subject due to its simple formulation. Nevertheless, there exists no theoretical result for the convergence of FedAvg in training the deep linear neural network. By applying NTK theory, we make a further step to provide the first theoretical guarantee for the global convergence of FedAvg in training deep linear neural networks. Specifically, we prove FedAvg converges to the global minimum at a linear rate $\mathcal{O}\big((1-\eta K /N)^t\big)$, where $t$ is the number of iterations, $\eta$ is the learning rate, $N$ is the number of clients and $K$ is the number of local updates. Finally, experimental evaluations on two benchmark datasets are conducted to empirically validate the correctness of our theoretical findings.

artificial intelligence, machine learning, neural network, (13 more...)

arXiv.org Machine Learning

2310.05495

Country: Asia > China (0.46)

Genre: Research Report (0.50)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A high-resolution dynamical view on momentum methods for over-parameterized neural networks

Liu, Xin, Tao, Wei, Wang, Jun, Pan, Zhisong

arXiv.org Artificial IntelligenceOct-27-2022

Due to the simplicity and efficiency of the first-order gradient method, it has been widely used in training neural networks. Although the optimization problem of the neural network is non-convex, recent research has proved that the first-order method is capable of attaining a global minimum for training over-parameterized neural networks, where the number of parameters is significantly larger than that of training instances. Momentum methods, including heavy ball method (HB) and Nesterov's accelerated method (NAG), are the workhorse first-order gradient methods owning to their accelerated convergence. In practice, NAG often exhibits better performance than HB. However, current research fails to distinguish their convergence difference in training neural networks. Motivated by this, we provide convergence analysis of HB and NAG in training an over-parameterized two-layer neural network with ReLU activation, through the lens of high-resolution dynamical systems and neural tangent kernel (NTK) theory. Compared to existing works, our analysis not only establishes tighter upper bounds of the convergence rate for both HB and NAG, but also characterizes the effect of the gradient correction term, which leads to the acceleration of NAG over HB. Finally, we validate our theoretical result on three benchmark datasets.

artificial intelligence, machine learning, neural network, (15 more...)

arXiv.org Artificial Intelligence

2208.03941

Country: Asia (0.46)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Provable Convergence of Nesterov Accelerated Method for Over-Parameterized Neural Networks

Liu, Xin, Pan, Zhisong

arXiv.org Artificial IntelligenceJul-5-2021

Despite the empirical success of deep learning, it still lacks theoretical understandings to explain why randomly initialized neural network trained by first-order optimization methods is able to achieve zero training loss, even though its landscape is non-convex and non-smooth. Recently, there are some works to demystifies this phenomenon under over-parameterized regime. In this work, we make further progress on this area by considering a commonly used momentum optimization algorithm: Nesterov accelerated method (NAG). We analyze the convergence of NAG for two-layer fully connected neural network with ReLU activation. Specifically, we prove that the error of NAG converges to zero at a linear convergence rate $1-\Theta(1/\sqrt{\kappa})$, where $\kappa > 1$ is determined by the initialization and the architecture of neural network. Comparing to the rate $1-\Theta(1/\kappa)$ of gradient descent, NAG achieves an acceleration. Besides, it also validates NAG and Heavy-ball method can achieve a similar convergence rate.

deep learning, neural network, optimization problem, (16 more...)

arXiv.org Artificial Intelligence

2107.01832

Country:

North America > Canada (0.14)
Asia > China (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Weakness Analysis of Cyberspace Configuration Based on Reinforcement Learning

Zhang, Lei, Bai, Wei, Guo, Shize, Xia, Shiming, Li, Hongmei, Pan, Zhisong

arXiv.org Artificial IntelligenceJul-9-2020

In this work, we present a learning-based approach to analysis cyberspace configuration. Unlike prior methods, our approach has the ability to learn from past experience and improve over time. In particular, as we train over a greater number of agents as attackers, our method becomes better at rapidly finding attack paths for previously hidden paths, especially in multiple domain cyberspace. To achieve these results, we pose finding attack paths as a Reinforcement Learning (RL) problem and train an agent to find multiple domain attack paths. To enable our RL policy to find more hidden attack paths, we ground representation introduction an multiple domain action select module in RL. By designing a simulated cyberspace experimental environment to verify our method. Our objective is to find more hidden attack paths, to analysis the weakness of cyberspace configuration. The experimental results show that our method can find more hidden multiple domain attack paths than existing baselines methods.

attacker, deep learning, law enforcement, (18 more...)

arXiv.org Artificial Intelligence

2007.04614

Country:

Asia > China (0.28)
North America > United States (0.28)

Genre: Research Report (0.84)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multi-task Feature Selection based Anomaly Detection

Yang, Longqi, Wang, Yibing, Pan, Zhisong, Hu, Guyu

arXiv.org Machine LearningMar-17-2014

Network anomaly detection is still a vibrant research area. As the fast growth of network bandwidth and the tremendous traffic on the network, there arises an extremely challengeable question: How to efficiently and accurately detect the anomaly on multiple traffic? In multi-task learning, the traffic consisting of flows at different time periods is considered as a task. Multiple tasks at different time periods performed simultaneously to detect anomalies. In this paper, we apply the multi-task feature selection in network anomaly detection area which provides a powerful method to gather information from multiple traffic and detect anomalies on it simultaneously. In particular, the multi-task feature selection includes the well-known l1-norm based feature selection as a special case given only one task. Moreover, we show that the multi-task feature selection is more accurate by utilizing more information simultaneously than the l1-norm based method. At the evaluation stage, we preprocess the raw data trace from trans-Pacific backbone link between Japan and the United States, label with anomaly communities, and generate a 248-feature dataset. We show empirically that the multi-task feature selection outperforms independent l1-norm based feature selection on real traffic dataset.

artificial intelligence, data mining, feature selection, (17 more...)

arXiv.org Machine Learning

1403.4017

Country:

Asia > Japan (0.25)
North America > United States (0.24)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback