FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline