BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation