B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners