Reinforcement Learning Agent Design and Optimization with Bandwidth Allocation Model