mean square error
- North America (0.04)
- Europe (0.04)
SupplementaryMaterial
To study the accuracy of the predicted rotation angles by TARGET-VAE, we calculate the mean standard deviation ofthepredicted rotations, introduced in[1]. This metric basically measures the mean square error between the rotation ofthe object inthe input image and the predicted rotation forthatobject. Wefind that the model correctly identifies and reconstructs the objects (Figure 3). Eachshape is rotated by one of 40 values linearly spaced in [0, 2π], translated across bothx and y dimensions, and scaled using one of six linearly spaced values in [0.5, 1]. Weobserved that, as expected, eliminating inference on the discretized rotation dimension has a significant negative effect on identifying transformation-invariant representations and the clustering accuracy on MNIST(U) is only33.8%(Table2).
Diversity Enhanced Active Learning with Strictly Proper Scoring Rules
We study acquisition functions for active learning (AL) for text classification. The Expected Loss Reduction (ELR) method focuses on a Bayesian estimate of the reduction in classification error, recently updated with Mean Objective Cost of Uncertainty (MOCU). We convert the ELR framework to estimate the increase in (strictly proper) scores like log probability or negative mean square error, which we call Bayesian Estimate of Mean Proper Scores (BEMPS). We also prove convergence results borrowing techniques used with MOCU. In order to allow better experimentation with the new acquisition functions, we develop a complementary batch AL algorithm, which encourages diversity in the vector of expected changes in scores for unlabelled data. To allow high performance text classifiers, we combine ensembling and dynamic validation set construction on pretrained language models. Extensive experimental evaluation then explores how these different acquisition functions perform. The results show that the use of mean square error and log probability with BEMPS yields robust acquisition functions, which consistently outperform the others tested.
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > Russia (0.04)
- Asia > Russia (0.04)
A Licenses and Terms of Use 716 ClimateLearn is a software package that can be installed from the Python Package Index as follows
UCLA is the maintainer of ClimateLearn . Table 4 shows the hyperparameters for ResNet in all of our experiments. Table 6: Default hyperparameters of ViT Hyperparameter Meaning V alue p Patch size 2 D Embedding dimension 128 Depth Number of ViT blocks 8 # heads Number of attention heads 4 MLP ratio Determine the hidden dimension of the MLP layer in a ViT block 4 Prediction depth Number of layers of the prediction head 2 Hidden dimension Hidden dimension of the prediction head 128 Drop path For stochastic depth [30] 0 . 1 Dropout Dropout rate 0 . 1 Table 7 summarizes the variables we use for our experiments. Constant represents constant variables, Single represents surface variables, and Atmospheric represents atmospheric properties at the chosen altitudes. Finally, during evaluation time, we use these masks to select subset of data.
- North America (0.04)
- Europe (0.04)
sponse addressing one common point raised by Reviewer 1 and Reviewer 3 regarding how to handle the case where 2 null
We thank all the reviewers for their careful feedback and will revise our paper accordingly. Such a fact is presented in the classic paper "An analysis of temporal-difference learning with function Similar facts can be found for other TD algorithms (e.g. Reviewer 1 is correct in that a discount factor is needed. Now we address specific reviewer comments below. A reference for this is the classic paper "An Finally, the "-" sign in Line 213 is due to the Hurwtiz assumption.
Neural Actor-Critic Methods for Hamilton-Jacobi-Bellman PDEs: Asymptotic Analysis and Numerical Studies
Cohen, Samuel N., Hebner, Jackson, Jiang, Deqing, Sirignano, Justin
We mathematically analyze and numerically study an actor-critic machine learning algorithm for solving high-dimensional Hamilton-Jacobi-Bellman (HJB) partial differential equations from stochastic control theory. The architecture of the critic (the estimator for the value function) is structured so that the boundary condition is always perfectly satisfied (rather than being included in the training loss) and utilizes a biased gradient which reduces computational cost. The actor (the estimator for the optimal control) is trained by minimizing the integral of the Hamiltonian over the domain, where the Hamiltonian is estimated using the critic. We show that the training dynamics of the actor and critic neural networks converge in a Sobolev-type space to a certain infinite-dimensional ordinary differential equation (ODE) as the number of hidden units in the actor and critic $\rightarrow \infty$. Further, under a convexity-like assumption on the Hamiltonian, we prove that any fixed point of this limit ODE is a solution of the original stochastic control problem. This provides an important guarantee for the algorithm's performance in light of the fact that finite-width neural networks may only converge to a local minimizers (and not optimal solutions) due to the non-convexity of their loss functions. In our numerical studies, we demonstrate that the algorithm can solve stochastic control problems accurately in up to 200 dimensions. In particular, we construct a series of increasingly complex stochastic control problems with known analytic solutions and study the algorithm's numerical performance on them. These problems range from a linear-quadratic regulator equation to highly challenging equations with non-convex Hamiltonians, allowing us to identify and analyze the strengths and limitations of this neural actor-critic method for solving HJB equations.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- North America > United States > New York (0.04)
Reconstruction and Prediction of Volterra Integral Equations Driven by Gaussian Noise
Xu, Zhihao, Ding, Saisai, Zhang, Zhikun, Wang, Xiangjun
Integral equations are widely used in fields such as applied modeling, medical imaging, and system identification, providing a powerful framework for solving deterministic problems. While parameter identification for differential equations has been extensively studied, the focus on integral equations, particularly stochastic Volterra integral equations, remains limited. This research addresses the parameter identification problem, also known as the equation reconstruction problem, in Volterra integral equations driven by Gaussian noise. We propose an improved deep neural networks framework for estimating unknown parameters in the drift term of these equations. The network represents the primary variables and their integrals, enhancing parameter estimation accuracy by incorporating inter-output relationships into the loss function. Additionally, the framework extends beyond parameter identification to predict the system's behavior outside the integration interval. Prediction accuracy is validated by comparing predicted and true trajectories using a 95% confidence interval. Numerical experiments demonstrate the effectiveness of the proposed deep neural networks framework in both parameter identification and prediction tasks, showing robust performance under varying noise levels and providing accurate solutions for modeling stochastic systems.
$O(1/k)$ Finite-Time Bound for Non-Linear Two-Time-Scale Stochastic Approximation
Two-time-scale stochastic approximation is an algorithm with coupled iterations which has found broad applications in reinforcement learning, optimization and game control. While several prior works have obtained a mean square error bound of $O(1/k)$ for linear two-time-scale iterations, the best known bound in the non-linear contractive setting has been $O(1/k^{2/3})$. In this work, we obtain an improved bound of $O(1/k)$ for non-linear two-time-scale stochastic approximation. Our result applies to algorithms such as gradient descent-ascent and two-time-scale Lagrangian optimization. The key step in our analysis involves rewriting the original iteration in terms of an averaged noise sequence which decays sufficiently fast. Additionally, we use an induction-based approach to show that the iterates are bounded in expectation.
- North America > United States > New York (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)