Evaluating Large Language Models on the Frame and Symbol Grounding Problems: A Zero-shot Benchmark