decay rate
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- Asia > China (0.04)
A PID Controller Approach for Adaptive Probability-dependent Gradient Decay in Model Calibration
During model optimization, the expected calibration error tends to overfit earlier than classification accuracy, indicating distinct optimization objectives for classification error and calibration error. To ensure consistent optimization of both model accuracy and model calibration, we propose a novel method incorporating a probability-dependent gradient decay coefficient into loss function. This coefficient exhibits a strong correlation with the overall confidence level.
Error Analysis of Generalized Langevin Equations with Approximated Memory Kernels
We analyze prediction error in stochastic dynamical systems with memory, focusing on generalized Langevin equations (GLEs) formulated as stochastic Volterra equations. We establish that, under a strongly convex potential, trajectory discrepancies decay at a rate determined by the decay of the memory kernel and are quantitatively bounded by the estimation error of the kernel in a weighted norm. Our analysis integrates synchronized noise coupling with a Volterra comparison theorem, encompassing both subexponential and exponential kernel classes. For first-order models, we derive moment and perturbation bounds using resolvent estimates in weighted spaces. For second-order models with confining potentials, we prove contraction and stability under kernel perturbations using a hypocoercive Lyapunov-type distance. This framework accommodates non-translation-invariant kernels and white-noise forcing, explicitly linking improved kernel estimation to enhanced trajectory prediction. Numerical examples validate these theoretical findings.
- North America > United States > North Carolina > Durham County > Durham (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
AdamNX: An Adam improvement algorithm based on a novel exponential decay mechanism for the second-order moment estimate
Zhu, Meng, Xiao, Quan, Min, Weidong
Since the 21st century, artificial intelligence has been leading a new round of industrial revolution. Under the training framework, the optimization algorithm aims to stably converge high-dimensional optimization to local and even global minima. Entering the era of large language models, although the scale of model parameters and data has increased, Adam remains the mainstream optimization algorithm. However, compared with stochastic gradient descent (SGD) based optimization algorithms, Adam is more likely to converge to non-flat minima. To address this issue, the AdamNX algorithm is proposed. Its core innovation lies in the proposition of a novel type of second-order moment estimation exponential decay rate, which gradually weakens the learning step correction strength as training progresses, and degrades to momentum SGD in the stable training period, thereby improving the stability of training in the stable period and possibly enhancing generalization ability. Experimental results show that our second-order moment estimation exponential decay rate is better than the current second-order moment estimation exponential decay rate, and AdamNX can stably outperform Adam and its variants in terms of performance. Our code is open-sourced at https://github.com/mengzhu0308/AdamNX.
- North America > Canada > Ontario > Toronto (0.14)
- Asia > China > Jiangxi Province > Nanchang (0.05)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
A Principles
A basic coherent optical component used in this work is an MZI. MZIs into a triangular mesh (Recks-style) or rectangular mesh (Clements-style), we can construct arbitrary N N unitary U ( N) . As a simple example, we show the principle of Recks-style MZI array for a simple demonstration. We give a detailed description of our parallel mapping algorithm. We implement ONN simulation, all models, and training logic in PyTorch 1.8.1.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > United States > California (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
15212f24321aa2c3dc8e9acf820f3c15-AuthorFeedback.pdf
We would like to thank all the reviewers for their insightful comments. Changes mentioned in our responses below have been incorporated in the revised version of the paper. Regarding the contribution of the paper, our Level-1 theory of mind (section 2.2) was similar to Ref [23] That is not true for the opposite case. POMDP model always generates a deterministic policy. It only changes the likelihood function of the model. Therefore, we don't need any new parameters to measure the accuracy of our model.
A V Experiments
The hyperparameters for these experiments are discussed in C. In all experiments, we use the same V AE-architecture as [58], and reshape the images to be 32 32 . The size of the latent dimension is set to be 50 therefore the maximum # of Active Units is 50 . The metrics considered to measure the performance of V AEs include Negative Log-Likelihood, # of Active Units in the latent space and the Mutual Information between the input x and the latent space z, I ( z; x). We use the same formulation as [7] to compute NLL and # of Active Units, and use the same formulation as [9, 21] to approximate the Mutual Information. Bolded value corresponds to the proposed model which uses the kernel layer after each convolutional layer.
Generalization Error Rates in Kernel Regression: The Crossover from the Noiseless to Noisy Regime
In this work, we unify and extend this line of work, providing characterization of all regimes and excess error decay rates that can be observed in terms of the interplay of noise and regularization. In particular, we show the existence of a transition in the noisy setting between the noiseless exponents to its noisy values as the sample complexity is increased.
- Europe > Switzerland > Vaud > Lausanne (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
- North America > United States > Texas > Brazos County > College Station (0.14)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > Canada (0.04)