Learning Transferable Spatiotemporal Representations from Natural Script Knowledge