Multimodal models are fast becoming a reality -- consequences be damned
Roughly a year ago, VentureBeat wrote about progress in the AI and machine learning field toward developing multimodal models, or models that can understand the meaning of text, videos, audio, and images together in context. Back then, the work was in its infancy and faced formidable challenges, not least of which concerned biases amplified in training datasets. But breakthroughs have been made. This year, OpenAI released DALL-E and CLIP, two multimodal models that the research labs claims are a "a step toward systems with [a] deeper understanding of the world." DALL-E, inspired by the surrealist artist Salvador Dalí, was trained to generate images from simple text descriptions.
Dec-21-2021, 22:35:05 GMT
- Country:
- Asia (0.04)
- North America > United States
- California (0.15)
- Genre:
- Research Report (0.50)
- Industry:
- Health & Medicine > Therapeutic Area (0.32)
- Information Technology (0.69)
- Technology: