AITopics | mean square error

Collaborating Authors

mean square error

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

To study the accuracy of the predicted rotation angles by TARGET-VAE, we calculate the mean standard deviation ofthepredicted rotations, introduced in[1]. This metric basically measures the mean square error between the rotation ofthe object inthe input image and the predicted rotation forthatobject. Wefind that the model correctly identifies and reconstructs the objects (Figure 3). Eachshape is rotated by one of 40 values linearly spaced in [0, 2π], translated across bothx and y dimensions, and scaled using one of six linearly spaced values in [0.5, 1]. Weobserved that, as expected, eliminating inference on the discretized rotation dimension has a significant negative effect on identifying transformation-invariant representations and the clustering accuracy on MNIST(U) is only33.8%(Table2).

andmnist, artificial intelligence, supplementarymaterial, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.30)

Add feedback

Diversity Enhanced Active Learning with Strictly Proper Scoring Rules

Neural Information Processing SystemsDec-24-2025, 04:16:24 GMT

We study acquisition functions for active learning (AL) for text classification. The Expected Loss Reduction (ELR) method focuses on a Bayesian estimate of the reduction in classification error, recently updated with Mean Objective Cost of Uncertainty (MOCU). We convert the ELR framework to estimate the increase in (strictly proper) scores like log probability or negative mean square error, which we call Bayesian Estimate of Mean Proper Scores (BEMPS). We also prove convergence results borrowing techniques used with MOCU. In order to allow better experimentation with the new acquisition functions, we develop a complementary batch AL algorithm, which encourages diversity in the vector of expected changes in scores for unlabelled data. To allow high performance text classifiers, we combine ensembling and dynamic validation set construction on pretrained language models. Extensive experimental evaluation then explores how these different acquisition functions perform. The results show that the use of mean square error and log probability with BEMPS yields robust acquisition functions, which consistently outperform the others tested.

acquisition function, diversity enhanced active learning, name change, (6 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.44)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Optimal Shrinkage of Singular Values Under Random Data Contamination

Danny Barash, Matan Gavish

Neural Information Processing SystemsNov-21-2025, 13:46:55 GMT

Reconstruction of low-rank matrices from noisy and otherwise contaminated data is a key problem in machine learning, computer vision and data science.

artificial intelligence, machine learning, singular value, (16 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

sponse addressing one common point raised by Reviewer 1 and Reviewer 3 regarding how to handle the case where 2 null

Neural Information Processing SystemsAug-20-2025, 06:29:57 GMT

We thank all the reviewers for their careful feedback and will revise our paper accordingly. Such a fact is presented in the classic paper "An analysis of temporal-difference learning with function Similar facts can be found for other TD algorithms (e.g. Reviewer 1 is correct in that a discount factor is needed. Now we address specific reviewer comments below. A reference for this is the classic paper "An Finally, the "-" sign in Line 213 is due to the Hurwtiz assumption.

assumption, reviewer 1, reviewer 3, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Neural Actor-Critic Methods for Hamilton-Jacobi-Bellman PDEs: Asymptotic Analysis and Numerical Studies

Cohen, Samuel N., Hebner, Jackson, Jiang, Deqing, Sirignano, Justin

arXiv.org Machine LearningJul-10-2025

We mathematically analyze and numerically study an actor-critic machine learning algorithm for solving high-dimensional Hamilton-Jacobi-Bellman (HJB) partial differential equations from stochastic control theory. The architecture of the critic (the estimator for the value function) is structured so that the boundary condition is always perfectly satisfied (rather than being included in the training loss) and utilizes a biased gradient which reduces computational cost. The actor (the estimator for the optimal control) is trained by minimizing the integral of the Hamiltonian over the domain, where the Hamiltonian is estimated using the critic. We show that the training dynamics of the actor and critic neural networks converge in a Sobolev-type space to a certain infinite-dimensional ordinary differential equation (ODE) as the number of hidden units in the actor and critic $\rightarrow \infty$. Further, under a convexity-like assumption on the Hamiltonian, we prove that any fixed point of this limit ODE is a solution of the original stochastic control problem. This provides an important guarantee for the algorithm's performance in light of the fact that finite-width neural networks may only converge to a local minimizers (and not optimal solutions) due to the non-convexity of their loss functions. In our numerical studies, we demonstrate that the algorithm can solve stochastic control problems accurately in up to 200 dimensions. In particular, we construct a series of increasingly complex stochastic control problems with known analytic solutions and study the algorithm's numerical performance on them. These problems range from a linear-quadratic regulator equation to highly challenging equations with non-convex Hamiltonians, allowing us to identify and analyze the strengths and limitations of this neural actor-critic method for solving HJB equations.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2507.06428

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > New York (0.04)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reconstruction and Prediction of Volterra Integral Equations Driven by Gaussian Noise

Xu, Zhihao, Ding, Saisai, Zhang, Zhikun, Wang, Xiangjun

arXiv.org Machine LearningJun-3-2025

Integral equations are widely used in fields such as applied modeling, medical imaging, and system identification, providing a powerful framework for solving deterministic problems. While parameter identification for differential equations has been extensively studied, the focus on integral equations, particularly stochastic Volterra integral equations, remains limited. This research addresses the parameter identification problem, also known as the equation reconstruction problem, in Volterra integral equations driven by Gaussian noise. We propose an improved deep neural networks framework for estimating unknown parameters in the drift term of these equations. The network represents the primary variables and their integrals, enhancing parameter estimation accuracy by incorporating inter-output relationships into the loss function. Additionally, the framework extends beyond parameter identification to predict the system's behavior outside the integration interval. Prediction accuracy is validated by comparing predicted and true trajectories using a 95% confidence interval. Numerical experiments demonstrate the effectiveness of the proposed deep neural networks framework in both parameter identification and prediction tasks, showing robust performance under varying noise levels and providing accurate solutions for modeling stochastic systems.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

2506.00933

Country: Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Mathematics of Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

$O(1/k)$ Finite-Time Bound for Non-Linear Two-Time-Scale Stochastic Approximation

Chandak, Siddharth

arXiv.org Machine LearningApr-27-2025

Two-time-scale stochastic approximation is an algorithm with coupled iterations which has found broad applications in reinforcement learning, optimization and game control. While several prior works have obtained a mean square error bound of $O(1/k)$ for linear two-time-scale iterations, the best known bound in the non-linear contractive setting has been $O(1/k^{2/3})$. In this work, we obtain an improved bound of $O(1/k)$ for non-linear two-time-scale stochastic approximation. Our result applies to algorithms such as gradient descent-ascent and two-time-scale Lagrangian optimization. The key step in our analysis involves rewriting the original iteration in terms of an averaged noise sequence which decays sufficiently fast. Additionally, we use an induction-based approach to show that the iterates are bounded in expectation.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

2504.19375

Country:

North America > United States > New York (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback