AITopics | rate scheduler

Appendix: On the Overlooked Pitfalls of Weight Decay and How to Mitigate Them

Neural Information Processing SystemsApr-24-2026, 06:50:14 GMT

Suppose we have a non-zero solution θ which is a stationary point of f(θ,t) at t-th step and SGD finds θt = θ at t-th step. Theorem 2.2 of Shapiro and Wardi [9] told us that the learning rate should be small enough for convergence. Obviously, we have η < in practice. As ηt = ηt+1 does not hold, SGD cannot converging to any non-zero stationary point. The proof is now complete.

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

040d3b6af368bf71f952c18da5713b48-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 06:50:11 GMT

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Genre: Research Report (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Representation Meets Optimization: Training PINNs and PIKANs for Gray-Box Discovery in Systems Pharmacology

Daryakenari, Nazanin Ahmadi, Shukla, Khemraj, Karniadakis, George Em

arXiv.org Artificial IntelligenceNov-17-2025

Physics-Informed Kolmogorov-Arnold Networks (PIKANs) are gaining attention as an effective counterpart to the original multilayer perceptron-based Physics-Informed Neural Networks (PINNs). Both representation models can address inverse problems and facilitate gray-box system identification. However, a comprehensive understanding of their performance in terms of accuracy and speed remains underexplored. In particular, we introduce a modified PIKAN architecture, tanh-cPIKAN, which is based on Chebyshev polynomials for parametrization of the univariate functions with an extra nonlinearity for enhanced performance. We then present a systematic investigation of how choices of the optimizer, representation, and training configuration influence the performance of PINNs and PIKANs in the context of systems pharmacology modeling. We benchmark a wide range of first-order, second-order, and hybrid optimizers, including various learning rate schedulers. We use the new Optax library to identify the most effective combinations for learning gray-boxes under ill-posed, non-unique, and data-sparse conditions. We examine the influence of model architecture (MLP vs. KAN), numerical precision (single vs. double), the need for warm-up phases for second-order methods, and sensitivity to the initial learning rate. We also assess the optimizer scalability for larger models and analyze the trade-offs introduced by JAX in terms of computational efficiency and numerical accuracy. Using two representative systems pharmacology case studies - a pharmacokinetics model and a chemotherapy drug-response model - we offer practical guidance on selecting optimizers and representation models/architectures for robust and efficient gray-box discovery. Our findings provide actionable insights for improving the training of physics-informed networks in biomedical applications and beyond.

artificial intelligence, machine learning, optimizer, (16 more...)

arXiv.org Artificial Intelligence

2504.07379

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

97d596ca21d0751ba2c633bad696cf7f-Supplemental-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 02:08:47 GMT

artificial intelligence, machine learning, molecule, (19 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

32fcc8cfe1fa4c77b5c58dafd36d1a98-AuthorFeedback.pdf

Neural Information Processing SystemsOct-2-2025, 15:13:28 GMT

We thank the reviewers for their detailed comments. Please see our response below. "... common implementation of weight decay [1] will usually multiply the amount of weight decay by the learning " The same holds in our setup: We have an "How do different learning rate schedules affect the conclusion?": We address LR schedule questions below. "It would be great if the authors can provide more experiments on ... AUTOL2" We ran additional experiments "((1)) If I could have access to the test set... " . We reject the claim that our submission "violates the ethics of "((2)) I have concerns on comparing AutoL2... " . Experiments with lr decay and AutoL2 are presented in the SM. "((3))) The practically of the proposed work... "... more insights on the relation between learning rate scheduler and AutoL2... " We address this point in the "... the lambda update refractory period is not detailed ... " The refractory period lasts for "It would be interesting to see on the same graph, training with learning rate scheduler ... " In the SM we have the "In Figure 1a and 1b, how is the best test accuracy determined?... " In Figs.

artificial intelligence, machine learning, rate schedule, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback