Generating Videos with Scene Dynamics - MIT

#artificialintelligence 

Learning models that generate videos may also be a promising way to learn representations. For example, we can train generators on a large repository of unlabeled videos, then fine-tune the discriminator on a small labeled dataset in order to recognize some actions with minimal supervision. We can also visualize what emerges in the representation for predicting the future. While not all units are semantic, we found there are a few hidden units that fire on objects which are sources of motions, such as people or train tracks. Since generating the future requires understanding moving objects, the network may learn to recognize these objects internally, even though it is not supervised to do so.