Integrating Visual Foundation Models for Enhanced Robot Manipulation and Motion Planning: A Layered Approach