Probability Distributions Computed by Hard-Attention Transformers