The Mechanism of Weak-to-Strong Generalization: Feature Elicitation from Latent Knowledge

Open in new window