A Benchmark for Parsing Ambiguous Questions into Database Queries