NanoVLMs: How small can we go and still make coherent Vision Language Models?