ReFiNe: Recursive Field Networks for Cross-modal Multi-scene Representation