CALM : A Multi-task Benchmark for Comprehensive Assessment of Language Model Bias