Not enough data to create a plot.
Try a different view from the menu above.
Deng, Yangtao
Minder: Faulty Machine Detection for Large-scale Distributed Model Training
Deng, Yangtao, Shi, Xiang, Jiang, Zhuo, Zhang, Xingjian, Zhang, Lei, Zhang, Zhang, Li, Bo, Song, Zuquan, Zhu, Hang, Liu, Gaohong, Li, Fuliang, Wang, Shuguang, Lin, Haibin, Ye, Jianxi, Yu, Minlan
Large-scale distributed model training requires simultaneous training on up to thousands of machines. Faulty machine detection is critical when an unexpected fault occurs in a machine. From our experience, a training task can encounter two faults per day on average, possibly leading to a halt for hours. To address the drawbacks of the time-consuming and labor-intensive manual scrutiny, we propose Minder, an automatic faulty machine detector for distributed training tasks. The key idea of Minder is to automatically and efficiently detect faulty distinctive monitoring metric patterns, which could last for a period before the entire training task comes to a halt. Minder has been deployed in our production environment for over one year, monitoring daily distributed training tasks where each involves up to thousands of machines. In our real-world fault detection scenarios, Minder can accurately and efficiently react to faults within 3.6 seconds on average, with a precision of 0.904 and F1-score of 0.893.
Moving Sampling Physics-informed Neural Networks induced by Moving Mesh PDE
Yang, Yu, Yang, Qihong, Deng, Yangtao, He, Qiaolin
Currently, many researchers have proposed widely used deep learning solvers based on deep neural networks, such as the Deep Ritz method [29], which solve the variational problems arising from PDEs; the Deep BSDE model [4], which is developed from stochastic differential equations and performs well at solving high-dimensional problems, and the DeepONet framework [12], which is used to learn operators accurately and efficiently from a relatively small dataset. In this article, we use physics-informed neural networks (PINN) [17]. In PINN, the governing equations of PDEs, boundary conditions, and related physical constraints are incorporated into the design of the loss function, and an optimization algorithm is used to find the network parameters to minimize the loss function, so that the approximated solution output by the neural networks satisfies the governing equations and constraints.
Neural Networks Based on Power Method and Inverse Power Method for Solving Linear Eigenvalue Problems
Yang, Qihong, Deng, Yangtao, Yang, Yu, He, Qiaolin, Zhang, Shiquan
In this article, we propose two kinds of neural networks inspired by power method and inverse power method to solve linear eigenvalue problems. These neural networks share similar ideas with traditional methods, in which the differential operator is realized by automatic differentiation. The eigenfunction of the eigenvalue problem is learned by the neural network and the iterative algorithms are implemented by optimizing the specially defined loss function. The largest positive eigenvalue, smallest eigenvalue and interior eigenvalues with the given prior knowledge can be solved efficiently. We examine the applicability and accuracy of our methods in the numerical experiments in one dimension, two dimensions and higher dimensions. Numerical results show that accurate eigenvalue and eigenfunction approximations can be obtained by our methods.
On the uncertainty analysis of the data-enabled physics-informed neural network for solving neutron diffusion eigenvalue problem
Yang, Yu, Gong, Helin, Yang, Qihong, Deng, Yangtao, He, Qiaolin, Zhang, Shiquan
In practical engineering experiments, the data obtained through detectors are inevitably noisy. For the already proposed data-enabled physics-informed neural network (DEPINN) \citep{DEPINN}, we investigate the performance of DEPINN in calculating the neutron diffusion eigenvalue problem from several perspectives when the prior data contain different scales of noise. Further, in order to reduce the effect of noise and improve the utilization of the noisy prior data, we propose innovative interval loss functions and give some rigorous mathematical proofs. The robustness of DEPINN is examined on two typical benchmark problems through a large number of numerical results, and the effectiveness of the proposed interval loss function is demonstrated by comparison. This paper confirms the feasibility of the improved DEPINN for practical engineering applications in nuclear reactor physics.