An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models