Balancing Expressivity and Robustness: Constrained Rational Activations for Reinforcement Learning

Surdej, Rafał, Bortkiewicz, Michał, Lewandowski, Alex, Ostaszewski, Mateusz, Lyle, Clare

arXiv.org Artificial Intelligence 

Trainable activation functions, whose parameters are optimized alongside network weights, offer increased expressivity compared to fixed activation functions. Specifically, trainable activation functions defined as ratios of polynomials (rational functions) have been proposed to enhance plasticity in reinforcement learning. However, their impact on training stability remains unclear. In this work, we study trainable rational activations in both reinforcement and continual learning settings. We find that while their flexibility enhances adaptability, it can also introduce instability, leading to overestimation in RL and feature collapse in longer continual learning scenarios. Our main result is demonstrating a trade-off between expressivity and plasticity in rational activations. To address this, we propose a constrained variant that structurally limits excessive output scaling while preserving adaptability. Experiments across MetaWorld and DeepMind Control Suite (DMC) environments show that our approach improves training stability and performance. In continual learning benchmarks, including MNIST with reshuffled labels and Split CIFAR-100, we reveal how different constraints affect the balance between expressivity and long-term retention. While preliminary experiments in discrete action domains (e.g., Atari) did not show similar instability, this suggests that the trade-off is particularly relevant for continuous control. Together, our findings provide actionable design principles for robust and adaptable trainable activations in dynamic, non-stationary environments. Figure 1: Interquartile Mean (IQM) performance after 1M environment steps, aggregated across 15 MetaWorld and 15 DeepMind Control Suite (DMC) environments. For Meta-World, we measure the score, while for DMC, returns are divided by 1000 to match the upper performance bound. We compare Original Rationals (OR), our Constrained Rationals (CR), ReLU, and ReLU with Layer Normalization (LN), all trained with resets. Our results show that CR + Resets achieves the highest overall performance, highlighting the benefits of our proposed constraints in stabilizing RL training. Neural network expressivity is a key factor in reinforcement learning (RL), particularly in dynamic environments where agents must continuously adapt. While most RL architectures rely on static activation functions, recent work suggests that allowing activations to be trainable could enhance adaptability by increasing the flexibility of individual neurons.