VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior

Open in new window