Stronger Than You Think: Benchmarking Weak Supervision on Realistic Tasks