B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal Tokens