Appendix: ScalableNeuralVideoRepresentations withLearnablePositionalFeatures