Chat with the Environment: Interactive Multimodal Perception Using Large Language Models