AITopics | environment step

Collaborating Authors

environment step

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

TOPPO: Rethinking PPO for Multi-Task Reinforcement Learning with Critic Balancing

Li, Yuanpeng, Lin, Gefei, Qu, Annie, Miao, Rui

arXiv.org Machine LearningMay-13-2026

Soft Actor-Critic (SAC) and its variants dominate Multi-Task Reinforcement Learning (MTRL) due to their off-policy sample efficiency, while on-policy methods such as Proximal Policy Optimization (PPO) remain underexplored. We diagnose that PPO in MTRL suffers from a previously overlooked issue: critic-side gradient ill-conditioning, which may cause tail tasks to stall while easy tasks dominate the value function's updates. To address this, we propose TOPPO (Tail-Optimized PPO), a reformulation of PPO via Critic Balancing -- a set of modules that improve gradient conditioning and balance learning dynamics across tasks. Unlike prior approaches that rely on modular architectures or large models, TOPPO targets the optimization bottleneck within PPO itself. Empirically, TOPPO achieves stronger mean and tail-task performance than published SAC-family and ARS-family baselines while using substantially fewer parameters and environment steps on Meta-World+ benchmark. Notably, TOPPO matches or surpasses strong SAC baselines early in training and maintains superior performance at full budget. Ablations confirm the effectiveness of each module in TOPPO and provide insights into their interactions. Our results demonstrate that, with proper optimization, on-policy methods can rival or exceed off-policy approaches in MTRL, challenging the prevailing reliance on SAC and highlighting critic-side gradient conditioning as the central bottleneck.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Machine Learning

2605.11473

Genre: Research Report > New Finding (0.85)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Add feedback

11afefdd848d1bc9ac9f1604d9f45817-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 15:50:45 GMT

artificial intelligence, robot, seditor, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training

Bhaskara, Vin, Wang, Haicheng

arXiv.org Machine LearningApr-22-2026

Local prediction-error-based curiosity rewards focus on the current transition without considering the world model's cumulative prediction error across all visited transitions. We introduce Curiosity-Critic, which grounds its intrinsic reward in the improvement of this cumulative objective, and show that it reduces to a tractable per-step form: the difference between the current prediction error and the asymptotic error baseline of the current state transition. We estimate this baseline online with a learned critic co-trained alongside the world model; regressing a single scalar, the critic converges well before the world model saturates, redirecting exploration toward learnable transitions without oracle knowledge of the noise floor. The reward is higher for learnable transitions and collapses toward the baseline for stochastic ones, effectively separating epistemic (reducible) from aleatoric (irreducible) prediction error online. Prior prediction-error curiosity formulations, from Schmidhuber (1991) to learned-feature-space variants, emerge as special cases corresponding to specific approximations of this baseline. Experiments on a stochastic grid world show that Curiosity-Critic outperforms prediction-error and visitation-count baselines in convergence speed and final world model accuracy.

artificial intelligence, machine learning, transition, (18 more...)

arXiv.org Machine Learning

2604.18701

Country:

North America > Canada > Ontario > Toronto (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

_NeurIPS_2022__On_the_Effectiveness_of_Fine_tuning_Versus_Meta_reinforcement_Learning (1)

Mandi Zhao

Neural Information Processing SystemsFeb-19-2026, 09:45:44 GMT

Do the main claims made in the abstract and introduction accurately reflect the paper's contributions and If you ran experiments... (a) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? Please refer to both main text and appendix for experiment details. Did you report error bars (e.g., with respect to the random seed after running experiments multiple All adaptation experiments in Procgen and RLBench are run for 3 seeds. Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal As stated in section 2, we use RTX A5000 GPUs each with 24GB memory. C2F-ARM algorithm and training framework are built based on the original author's implementation Did you mention the license of the assets?

artificial intelligence, experiment, machine learning, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

dc300c4d75b94b211b149ae4bcbbf2d2-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 09:08:23 GMT

machine learning, natural language, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > China > Tianjin Province > Tianjin (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology (1.00)
Energy (0.67)
Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(3 more...)

Add feedback

cd3b5d2ed967e906af24b33d6a356cac-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 05:01:10 GMT

large language model, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country:

Europe > Poland > Masovia Province > Warsaw (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

Add feedback

c919a2b5ec1de69f2629f9119676e336-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 01:56:50 GMT

artificial intelligence, machine learning, representation, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Variance-Reduced Gradient Estimation via Noise-Reuse in Online Evolution Strategies Oscar Li1, James Harrison, Jascha Sohl-Dickstein, Virginia Smith

Neural Information Processing SystemsFeb-15-2026, 20:09:30 GMT

First-order optimization methods are a foundational tool in machine learning.

artificial intelligence, evolutionary algorithm, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia (0.40)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Instructional Material (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.66)

Add feedback

SupplementaryMaterialforRethinkingValue FunctionLearningforGeneralizationin ReinforcementLearning

Neural Information Processing SystemsFeb-12-2026, 10:27:47 GMT

Then,wecalculatethe mean stiffness of the value network across all state pairs and report its average computed over all trainingepochs. Eachagentis trained on 200 training levels for 25M environment steps. The mean and standard deviation are computedover10differentruns. Morespecifically,wecollect100 training episodes throughout the training and evaluate the value network prediction for the initial stateofeachtrajectory. Each agent is trained on 200 training levels for 25M environment steps.

machine learning, optimizevalueobjectivejv, reinforcement learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

Filters

Collaborating Authors

environment step

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

TOPPO: Rethinking PPO for Multi-Task Reinforcement Learning with Critic Balancing

fbd8e65962da06f83f3f28b52774ffd0-Paper-Conference.pdf

11afefdd848d1bc9ac9f1604d9f45817-Supplemental-Conference.pdf

Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training

_NeurIPS_2022__On_the_Effectiveness_of_Fine_tuning_Versus_Meta_reinforcement_Learning (1)

dc300c4d75b94b211b149ae4bcbbf2d2-Paper-Conference.pdf

cd3b5d2ed967e906af24b33d6a356cac-Paper-Conference.pdf

c919a2b5ec1de69f2629f9119676e336-Supplemental-Conference.pdf

Variance-Reduced Gradient Estimation via Noise-Reuse in Online Evolution Strategies Oscar Li1, James Harrison, Jascha Sohl-Dickstein, Virginia Smith

SupplementaryMaterialforRethinkingValue FunctionLearningforGeneralizationin ReinforcementLearning