CityLens: Benchmarking Large Language-Vision Models for Urban Socioeconomic Sensing