Goto

Collaborating Authors

 Large Language Model


UniBench: VisualReasoningRequiresRethinking Vision-LanguageBeyondScaling

Neural Information Processing Systems

Wefind that while scaling training data ormodel size can boost many vision-language model capabilities, scaling offers little benefit for reasoning or relations. Surprisingly, we also discover today's best VLMs struggle on simple digit recognition and counting tasks, e.g. MNIST, which much simpler networks can solve.









Mitigating Reward Overoptimization via Lightweight Uncertainty Estimation

Neural Information Processing Systems

Reinforcement Learning from Human Feedback (RLHF) has been pivotal in aligning Large Language Models with human values but often suffers from overopti-mization due to its reliance on a proxy reward model.