This avocado armchair could be the future of AI
For all GPT-3's flair, its output can feel untethered from reality, as if it doesn't know what it's talking about. By grounding text in images, researchers at OpenAI and elsewhere are trying to give language models a better grasp of the everyday concepts that humans use to make sense of things. DALL·E and CLIP come at this problem from different directions. At first glance, CLIP (Contrastive Language-Image Pre-training) is yet another image recognition system. Except that it has learned to recognize images not from labeled examples in curated data sets, as most existing models do, but from images and their captions taken from the internet.
Jan-5-2021, 19:00:03 GMT
- Technology: