Self-supervised novel 2D view synthesis of large-scale scenes with efficient multi-scale voxel carving