Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement