UNCLE: Benchmarking Uncertainty Expressions in Long-Form Generation