On The Sample Complexity Bounds In Bilevel Reinforcement Learning