Dr. Tristan Behrens on LinkedIn: #artificialintelligence #music
Not only did Transformer make their way successfully into Computer Vision just a short while ago, but they also contribute to the field of Neural Networks that work on different kinds of data. "PolyViT: Co-training Vision Transformers on Images, Videos and Audio" showcases a transformer that works on images, videos and audio. The idea behind transformers is to consider your input data as some form of a sequence of tokens. In NLP those tokens are discrete and usually mapped to the continuous plane of existence using embedding layers. Images on the other hand are typically cut into non-overlapping patches, which are then projected by some neural network layers to continuous vectors.
Nov-30-2021, 07:00:08 GMT
- Technology:
- Information Technology
- Communications > Social Media (0.85)
- Artificial Intelligence
- Machine Learning > Neural Networks (0.76)
- Vision (0.65)
- Information Technology