Optimizing Speech Multi-View Feature Fusion through Conditional Computation