B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal Tokens

Open in new window