Benchmarking and Improving LVLMs on Event Extraction from Multimedia Documents