Improving Preference Extraction In LLMs By Identifying Latent Knowledge Through Classifying Probes

Open in new window