Artificial intelligence system learns concepts shared across video, audio, and text

May-4-2022, 10:06:47 GMT–#artificialintelligence

Humans observe the world through a combination of different modalities, like vision, hearing, and our understanding of language. Machines, on the other hand, interpret the world through data that algorithms can process. So, when a machine "sees" a photo, it must encode that photo into data it can use to perform a task like image classification. This process becomes more complicated when inputs come in multiple formats, like videos, audio clips, and images. "The main challenge here is, how can a machine align those different modalities? As humans, this is easy for us. We see a car and then hear the sound of a car driving by, and we know these are the same thing. But for machine learning, it is not that straightforward," says Alexander Liu, a graduate student in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and first author of a paper tackling this problem.

artificial intelligence, machine learning, video, (15 more...)

#artificialintelligence

May-4-2022, 10:06:47 GMT

News Web Page

Add feedback

Country:
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.40)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found