Goto

Collaborating Authors

 evaluator


Safety through feedback in Constrained RL

Neural Information Processing Systems

This feedback can be system generated or elicited from a human observing the training process. Previous approaches have not been able to scale to complex environments and are constrained to receiving feedback at the state level which can be expensive to collect. To this end, we introduce an approach that scales to more complex domains and extends beyond state-level feedback, thus, reducing the burden on the evaluator.









UniBench: VisualReasoningRequiresRethinking Vision-LanguageBeyondScaling

Neural Information Processing Systems

Wefind that while scaling training data ormodel size can boost many vision-language model capabilities, scaling offers little benefit for reasoning or relations. Surprisingly, we also discover today's best VLMs struggle on simple digit recognition and counting tasks, e.g. MNIST, which much simpler networks can solve.