Generating Structured Outputs from Language Models: Benchmark and Studies