Unleashing the Power of CNN and Transformer for Balanced RGB-Event Video Recognition