AITopics | mujoco environment

Details

Neural Information Processing SystemsApr-25-2026, 07:23:57 GMT

The training is stalled if the size of the replay buffer is smaller than the minibatch size, i.e., if |B|< M. Algorithms 3 and 4 show the critic network update and the actor network and uncertainty parameter sampler update, respectively. Although we write the gradient-based update in the form of a mini-batch stochastic gradient update for simplicity, we employ an adaptive approach such as Adam [16]. The update of pk follows the exponential moving average with the momentum (1/Tlast), where Tlast is the number of steps spent in the last episode (Tlast is set to 1000 for the first episode). The reason behind this design choice is as follows. The short episode is a meaning that a bad uncertainty parameter ω is used in the last episode.

artificial intelligence, machine learning, worst-case performance, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

1fa6269f58898f0e809575c9a48747ef-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 01:03:29 GMT

artificial intelligence, machine learning, query, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Derivations

Neural Information Processing SystemsApr-24-2026, 23:49:00 GMT

Lemma 1 (Ensemble Sample Diversity Decomposition) Given the state-action visit distribution of the ensemble policy ρ. The entropy of this distribution is H(ρ). By definition, I(ρ;z) = H(ρ) H(ρ|z) = H(z) H(z|ρ) (4) By randomly selecting the latent variable z, we consider that H(z) is a constant depending on the number of z. Lemma 3 Let X1,X2,...,XN be an infinite sequence of i.i.d. The PDF of XN:N can be derived by taking the derivative of PDF.

artificial intelligence, ime step, machine learning, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.48)

Add feedback

024677efb8e4aee2eaeef17b54695bbe-Supplemental.pdf

Neural Information Processing SystemsApr-24-2026, 10:31:06 GMT

artificial intelligence, machine learning, mujoco environment, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

024677efb8e4aee2eaeef17b54695bbe-Paper.pdf

Neural Information Processing SystemsApr-24-2026, 10:31:02 GMT

machine learning, reinforcement learning, safe region, (17 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.69)

Add feedback

fc394e9935fbd62c8aedc372464e1965-Supplemental.pdf

Neural Information Processing SystemsFeb-12-2026, 00:58:58 GMT

dataset, representation, unobserved, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.97)

Add feedback

d55cbf210f175f4a37916eafe6c04f0d-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 14:12:38 GMT

algorithm, bail, batch, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Arizona > Maricopa County > Phoenix (0.04)
North America > Canada (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

60cb558c40e4f18479664069d9642d5a-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-8-2026, 15:15:59 GMT

We thank all the reviewers for the time and expertise invested in these reviews. A: We are sorry that some abuse of notations in the paper hinders the5 understanding ofourmethod. A: Such an assumption comes from an empirical41 observation that in robotics control problems, some key poses in different dynamics are still alike.

artificial intelligence, mujoco environment, responsestoreview, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Robots (0.59)

Add feedback