Language-Image Models with 3D Understanding