Unified Text-Image-to-Video Generation: A Training-Free Approach to Flexible Visual Conditioning