Perception of Visual Content: Differences Between Humans and Foundation Models