ManipGPT: Is Affordance Segmentation by Large Vision Models Enough for Articulated Object Manipulation?

Open in new window