Hollywood Town: Long-Video Generation via Cross-Modal Multi-Agent Orchestration