Survey: Transformer-based Models in Data Modality Conversion
Rashno, Elyas, Eskandari, Amir, Anand, Aman, Zulkernine, Farhana
–arXiv.org Artificial Intelligence
Typically, a modality is linked to a particular sensor that creates a distinct communication channel, such as sight, speech, and written language. Humans possess a fundamental process in sensory perception that allows them to efficiently engage with the world in dynamic and unconstrained situations by integrating data from several sensory modalities. Each modality functions as a separate source of information that is distinguished by its own specific statistical features. A photograph depicting "elephants playing in the water" delivers visual information through numerous pixels, whereas a similar verbal description conveys this sight using distinct words. Similarly, voice can communicate the same occurrence using spectrograms or speech characteristics. A data conversion AI system must receive input from a specific modality, process, understand, and reproduce its content in a different modality, imitating human-like perception. Modality Conversion (MC) is a broad methodology for constructing artificial intelligence models that can extract and transform information from one modality of representation to another [67]. Amir Eskandari and Aman Anand contributed equally to this research.
arXiv.org Artificial Intelligence
Aug-8-2024
- Country:
- Asia > China (0.04)
- South America > Chile
- North America
- United States > Minnesota
- Hennepin County > Minneapolis (0.14)
- Canada > Ontario
- Kingston (0.04)
- United States > Minnesota
- Europe
- Switzerland > Zürich
- Zürich (0.14)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Switzerland > Zürich
- Africa > Middle East
- Morocco > Casablanca-Settat Region > Casablanca (0.04)
- Genre:
- Overview (1.00)
- Research Report
- Promising Solution (0.68)
- New Finding (0.67)
- Industry:
- Health & Medicine (0.67)
- Education (0.46)
- Information Technology > Security & Privacy (0.46)
- Technology:
- Information Technology
- Sensing and Signal Processing > Image Processing (1.00)
- Data Science > Data Mining (1.00)
- Artificial Intelligence
- Vision (1.00)
- Speech > Speech Recognition (1.00)
- Representation & Reasoning (1.00)
- Cognitive Science (0.92)
- Natural Language
- Text Processing (1.00)
- Machine Translation (1.00)
- Large Language Model (1.00)
- Machine Learning > Neural Networks
- Deep Learning (1.00)
- Information Technology