Creative Problem Solving in Large Language and Vision Models -- What Would it Take?