Unsupervised Learning of Disentangled Representations from Video

Feb-14-2020, 15:13:53 GMT–Neural Information Processing Systems

We present a new model DRNET that learns disentangled image representations from video. Our approach leverages the temporal coherence of video and a novel adversarial loss to learn a representation that factorizes each frame into a stationary part and a temporally varying component. The disentangled representation can be used for a range of tasks. For example, applying a standard LSTM to the time-vary components enables prediction of future frames. For the latter, we demonstrate the ability to coherently generate up to several hundred steps into the future.

disentangled representation, unsupervised learning, video

Neural Information Processing Systems

Feb-14-2020, 15:13:53 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.40)