How Predictable Are Large Language Model Capabilities? A Case Study on BIG-bench