Foundation Models and Adaptive Feature Selection: A Synergistic Approach to Video Question Answering