Adaptive Action Duration with Contextual Bandits for Deep Reinforcement Learning in Dynamic Environments