Microsoft unveils AI model that understands image content, solves visual puzzles

Mar-5-2023, 15:05:19 GMT–#artificialintelligence

On Monday, researchers from Microsoft introduced Kosmos-1, a multimodal model that can reportedly analyze images for content, solve visual puzzles, perform visual text recognition, pass visual IQ tests, and understand natural language instructions. The researchers believe multimodal AI--which integrates different modes of input such as text, audio, images, and video--is a key step to building artificial general intelligence (AGI) that can perform general tasks at the level of a human. "Being a basic part of intelligence, multimodal perception is a necessity to achieve artificial general intelligence, in terms of knowledge acquisition and grounding to the real world," the researchers write in their academic paper, Language Is Not All You Need: Aligning Perception with Language Models. While the media buzzes with news about large language models (LLM), some AI experts point to multimodal AI as a potential path toward general artificial intelligence, a hypothetical technology that will ostensibly be able to replace humans at any intellectual task (and any intellectual job). AGI is the stated goal of OpenAI, a key business partner of Microsoft in the AI space.

machine learning, natural language, pattern recognition, (15 more...)

#artificialintelligence

Mar-5-2023, 15:05:19 GMT

News Web Page

Add feedback

Genre:
- Research Report > New Finding (0.36)

Industry:
- Energy > Oil & Gas > Upstream (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.38)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found