Our findings on NoRa dataset reveal a prevalent vulnerability to such noise among current LLMs, with existing robust methods like self-correction and self-consistency showing limited efficacy.
Recent evidence suggests that, in some problems, NeSy models can achieve high accuracy on the reasoning task by learning concepts with incorrect semantics .
Limitations in either capability can impede the overall performance of a VLM. A systematic evaluation of the perception and reasoning capabilities is crucial to provide valuable insights for future model optimization.