NeRV: Neural Representations for Videos

Neural Information Processing Systems 

We propose a novel neural representation for videos (NeRV) which encodes videos in neural networks. Unlike conventional representations that treat videos as frame sequences, we represent videos as neural networks taking frame index as input. Given a frame index, NeRV outputs the corresponding RGB image. Video encoding in NeRV is simply fitting a neural network to video frames and decoding process is a simple feedforward operation. As an image-wise implicit representation, NeRV output the whole image and shows great efficiency compared to pixel-wise implicit representation, improving the encoding speed by \textbf{25}\times to \textbf{70}\times, the decoding speed by \textbf{38}\times to \textbf{132}\times, while achieving better video quality.