Goto

Collaborating Authors

 analytic solution


Scaling Law Analysis in Federated Learning: How to Select the Optimal Model Size?

Chen, Xuanyu, Yang, Nan, Wang, Shuai, Yuan, Dong

arXiv.org Artificial Intelligence

The recent success of large language models (LLMs) has sparked a growing interest in training large-scale models. As the model size continues to scale, concerns are growing about the depletion of high-quality, well-curated training data. This has led practitioners to explore training approaches like Federated Learning (FL), which can leverage the abundant data on edge devices while maintaining privacy. However, the decentralization of training datasets in FL introduces challenges to scaling large models, a topic that remains under-explored. This paper fills this gap and provides qualitative insights on generalizing the previous model scaling experience to federated learning scenarios. Specifically, we derive a P AC-Bayes (Probably Approximately Correct Bayesian) upper bound for the generalization error of models trained with stochastic algorithms in federated settings and quantify the impact of distributed training data on the optimal model size by finding the analytic solution of model size that minimizes this bound. Our theoretical results demonstrate that the optimal model size has a negative power law relationship with the number of clients if the total training compute is unchanged. Besides, we also find that switching to FL with the same training compute will inevitably reduce the upper bound of generalization performance that the model can achieve through training, and that estimating the optimal model size in federated scenarios should depend on the average training compute across clients. Furthermore, we also empirically validate the correctness of our results with extensive training runs on different models, network settings, and datasets.




HergNet: a Fast Neural Surrogate Model for Sound Field Predictions via Superposition of Plane Waves

Calafà, Matteo, Xia, Yuanxin, Jeong, Cheol-Ho

arXiv.org Artificial Intelligence

ABSTRACT We present a novel neural network architecture for the efficient prediction of sound fields in two and three dimensions. The network is designed to automatically satisfy the Helmholtz equation, ensuring that the outputs are physically valid. Therefore, the method can effectively learn solutions to boundary-value problems in various wave phenomena, such as acoustics, optics, and electromagnetism. Numerical experiments show that the proposed strategy can potentially outperform state-of-the-art methods in room acoustics simulation, in particular in the range of mid to high frequencies. Index T erms-- Helmholtz equation, wave fields, room acoustics, physics-informed neural networks 1. INTRODUCTION Several physical phenomena are represented by propagation of waves, especially in fields like acoustics, optics, quantum mechanics, electromagnetism and surface fluid mechanics [1, 2, 3, 4, 5]. Fast and accurate simulations of waves dynamics is therefore of great relevance to the scientific community, in particular in complex scenarios, where high frequencies, broad domains or long time intervals are considered.




Solving Sparse Finite Element Problems on Neuromorphic Hardware

Theilman, Bradley H., Aimone, James B.

arXiv.org Artificial Intelligence

We demonstrate that scalable neuromorphic hardware can implement the finite element method, which is a critical numerical method for engineering and scientific discovery. Our approach maps the sparse interactions between neighboring finite elements to small populations of neurons that dynamically update according to the governing physics of a desired problem description. We show that for the Poisson equation, which describes many physical systems such as gravitational and electrostatic fields, this cortical-inspired neural circuit can achieve comparable levels of numerical accuracy and scaling while enabling the use of inherently parallel and energy-efficient neuromorphic hardware. We demonstrate that this approach can be used on the Intel Loihi 2 platform and illustrate how this approach can be extended to nontrivial mesh geometries and dynamics. Despite this tremendous potential, the widespread impact of neuromorphic computing has been limited by the difficulty in identifying ...


Reviews: Analytic solution and stationary phase approximation for the Bayesian lasso and elastic net

Neural Information Processing Systems

Summary An approximation to the posterior distribution from a Bayesian lasso or Bayesian elastic net prior is developed. The method uses a saddle-point approximation to the partition function. This is developed by writing the posterior distribution in terms of tau n / sigma 2 and uses an approximation for large tau. The results are illustrated on three data sets: diabetes (n 442, p 10), leukaemia (n 72, p 3571) and Cancer Cell Line Encyclopedia (n 474, p 1000). These demonstrate some of the performance characteristics of the approximation.


Neural networks for bifurcation and linear stability analysis of steady states in partial differential equations

Shahab, Muhammad Luthfi, Susanto, Hadi

arXiv.org Artificial Intelligence

This research introduces an extended application of neural networks for solving nonlinear partial differential equations (PDEs). A neural network, combined with a pseudo-arclength continuation, is proposed to construct bifurcation diagrams from parameterized nonlinear PDEs. Additionally, a neural network approach is also presented for solving eigenvalue problems to analyze solution linear stability, focusing on identifying the largest eigenvalue. The effectiveness of the proposed neural network is examined through experiments on the Bratu equation and the Burgers equation. Results from a finite difference method are also presented as comparison. Varying numbers of grid points are employed in each case to assess the behavior and accuracy of both the neural network and the finite difference method. The experimental results demonstrate that the proposed neural network produces better solutions, generates more accurate bifurcation diagrams, has reasonable computational times, and proves effective for linear stability analysis.


An Analytic Solution to the 3D CSC Dubins Path Problem

Baez, Victor M., Navkar, Nikhil, Becker, Aaron T.

arXiv.org Artificial Intelligence

Abstract-- We present an analytic solution to the 3D Dubins path problem for paths composed of an initial circular arc, a straight component, and a final circular arc. These are commonly called CSC paths. By modeling the start and goal configurations of the path as the base frame and final frame of an RRPRR manipulator, we treat this as an inverse kinematics problem. The kinematic features of the 3D Dubins path are built into the constraints of our manipulator model. Furthermore, we show that the number of solutions is not constant, with up to seven valid CSC path solutions even in non-singular regions.