Natural language is not enough: Benchmarking multi-modal generative AI for Verilog generation