Curiosity-Driven LLM-as-a-judge for Personalized Creative Judgment
Kumar, Vanya Bannihatti, Goyal, Divyanshu, Eppa, Akhil, Bhandari, Neel
–arXiv.org Artificial Intelligence
Creative Thinking(TTCW) benchmark introduced in Chakrabarty et al. (2024), Rigorous, standardized evaluation has repeatedly catalyzed progress in machine learning, from ImageNetRussakovsky et al. (2015) and GLUEWang et al. (2019), driving leaps in the fields of computer vision and Natural Language Processing, respectively. The same effect is evident in objective math reasoning, where benchmarks like GSM8KCobbe et al. (2021), together with RL-trained reasoning models such as OpenAI's o1OpenAI et al. (2024) and DeepSeek-R1DeepSeek-AI Models(LLM) as a judge prefer their own generations making them unreliable. As shown in Chakrabarty et al. (2024) and Table 12 and Table 2, even Specifically, when the model is "surprised" by an expert's explanation, it signals a mismatch between the LLM's prior belief and the expert's The intuition behind predicting the annotator is that the model can learn which annotator caused the belief shift, allowing it to calibrate the curiosity signal for each annotator individually, thereby improving personalization. In our experiments, we establish a baseline using an SFT model that predicts annotators' binary More details about the results can be found in Fig 4.Figure 1: Overview of Architecture during training for Curiosity Driven LLM-as-a-judgeFigure 2: Overview of Architecture during inference for Curiosity Driven LLM-as-a-judge 2 (a) Baseline without using explanations (b) Baseline using explanations TTCW dataset Chakrabarty et al. (2024), which is based on the Torrance Test of Creative Thinking Torrance (1966) but adapted for LLMs. All the distinct dimensions in the TTCW dataset are mentioned in Appendix A.1.
arXiv.org Artificial Intelligence
Oct-8-2025
- Country:
- Asia > Middle East
- Israel (0.04)
- Europe > Monaco (0.04)
- North America > United States
- Georgia > Fulton County
- Atlanta (0.04)
- New Jersey > Mercer County
- Princeton (0.04)
- Georgia > Fulton County
- South America > Chile
- Asia > Middle East
- Genre:
- Research Report > New Finding (0.48)
- Technology: