Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning