schmidhuber
Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training
Local prediction-error-based curiosity rewards focus on the current transition without considering the world model's cumulative prediction error across all visited transitions. We introduce Curiosity-Critic, which grounds its intrinsic reward in the improvement of this cumulative objective, and show that it reduces to a tractable per-step form: the difference between the current prediction error and the asymptotic error baseline of the current state transition. We estimate this baseline online with a learned critic co-trained alongside the world model; regressing a single scalar, the critic converges well before the world model saturates, redirecting exploration toward learnable transitions without oracle knowledge of the noise floor. The reward is higher for learnable transitions and collapses toward the baseline for stochastic ones, effectively separating epistemic (reducible) from aleatoric (irreducible) prediction error online. Prior prediction-error curiosity formulations, from Schmidhuber (1991) to learned-feature-space variants, emerge as special cases corresponding to specific approximations of this baseline. Experiments on a stochastic grid world show that Curiosity-Critic outperforms prediction-error and visitation-count baselines in convergence speed and final world model accuracy.
- North America > Canada > Quebec > Montreal (0.14)
- Oceania > Australia (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Utah (0.04)
- Asia > Taiwan > Taiwan Province > Taipei (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > Switzerland (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Europe > Finland > Uusimaa > Helsinki (0.04)
- North America > United States (0.14)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- (3 more...)
- North America > Canada > Quebec > Montreal (0.05)
- North America > United States > Colorado > Boulder County > Boulder (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > Middle East > Jordan (0.04)
Breaking the Activation Function Bottleneck through Adaptive Parameterization
Sebastian Flennerhag, Hujun Yin, John Keane, Mark Elliot
Adaptive parameterization is a means of increasing this flexibility and thereby increasing the model's capacity to learn non-linear patterns. We focus on the feed-forward layer, f(x):= φ(W x+b),for some activation functionφ: R 7 R. Define the pre-activation layer as a = A(x):= Wx+band denote byg(a):= φ(a)/athe activation effect ofφgivena, where divisioniselement-wise.