Can foundation models actively gather information in interactive environments to test hypotheses?