RUST: A Comprehensive Benchmark Towards Trustworthy Multimodal Large Language Models

Neural Information Processing Systems 

To perform systematic evaluations, we set up 32 various tasks, including improvements to existing multimodal tasks, extension of text-only tasks to multimodal scenarios, and novel methods for risk assessment, which focus on models' basic performance with practical significance.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found