AITopics | cross-modal transformer

Collaborating Authors

cross-modal transformer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Injecting Multimodal Information into Rigid Protein Docking via Bi-level Optimization

Neural Information Processing SystemsFeb-15-2026, 00:48:06 GMT

The structure of protein-protein complexes is critical for understanding binding dynamics, biological mechanisms, and intervention strategies.

bioinformatics, information, machine learning, (16 more...)

Neural Information Processing Systems

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > Promising Solution (0.67)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
(2 more...)

Add feedback

9e3b203e72c4e058de26d02a92a81844-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 01:13:40 GMT

dataset, qualitative result, trajectory, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.31)

Add feedback

2e5c2cb8d13e8fba78d95211440ba326-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 01:59:01 GMT

Finally, Section E illustrates qualitative results. We present the encoder-decoder variant of HAMT in fine-tuning on the right of Figure 1. Compared to the original cross-modal transformer on the left, the variant removes text-tovision cross-modal attention. The encoder encodes the texts to obtain textual embeddings. Theoriginal target location is viewed as a middle stop point.

artificial intelligence, instruction, predictedtrajectorybyhamt, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

77fa0e7d45c6687f1958de0b31e9fc05-Paper-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 22:40:47 GMT

bioinformatics, information, machine learning, (16 more...)

Neural Information Processing Systems

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > Promising Solution (0.67)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
(2 more...)

Add feedback

Supplementary Material

Neural Information Processing SystemsAug-17-2025, 06:57:07 GMT

Section A provides additional details for the method. The scene encoder is to extract the environment information. Following [19], we sample the frames at 2.5 HZ and predict future For ETH and UCY datasets, we adopt the standard metrics ( i . Due to the limitations discussed in Section 4.1, we introudce curve smoothing (CS) into current We conduct experiments on P A V using the traditional ADE/FDE metrics. In particular, our method improves the FDE by 13.6% on PETS.

artificial intelligence, machine learning, trajectory, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.31)

Add feedback

9e3b203e72c4e058de26d02a92a81844-Paper-Conference.pdf

Neural Information Processing SystemsAug-17-2025, 06:57:04 GMT

artificial intelligence, machine learning, trajectory, (15 more...)

Neural Information Processing Systems

Country:

Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Towards Interpretable Sleep Stage Classification Using Cross-Modal Transformers

Pradeepkumar, Jathurshan, Anandakumar, Mithunjha, Kugathasan, Vinith, Suntharalingham, Dhinesh, Kappel, Simon L., De Silva, Anjula C., Edussooriya, Chamira U. S.

arXiv.org Artificial IntelligenceNov-24-2023

Accurate sleep stage classification is significant for sleep health assessment. In recent years, several machine-learning based sleep staging algorithms have been developed , and in particular, deep-learning based algorithms have achieved performance on par with human annotation. Despite improved performance, a limitation of most deep-learning based algorithms is their black-box behavior, which have limited their use in clinical settings. Here, we propose a cross-modal transformer, which is a transformer-based method for sleep stage classification. The proposed cross-modal transformer consists of a novel cross-modal transformer encoder architecture along with a multi-scale one-dimensional convolutional neural network for automatic representation learning. Our method outperforms the state-of-the-art methods and eliminates the black-box behavior of deep-learning models by utilizing the interpretability aspect of the attention modules. Furthermore, our method provides considerable reductions in the number of parameters and training time compared to the state-of-the-art methods. Our code is available at https://github.com/Jathurshan0330/Cross-Modal-Transformer. A demo of our work can be found at https://bit.ly/Cross_modal_transformer_demo.

cross-modal transformer, sequence cross-modal transformer, transformer, (12 more...)

arXiv.org Artificial Intelligence

2208.06991

Country:

Asia > Sri Lanka (0.04)
North America > United States > Illinois > Cook County > Westchester (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Sleep (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CTAL: Pre-training Cross-modal Transformer for Audio-and-Language Representations

Li, Hang, Kang, Yu, Liu, Tianqiao, Ding, Wenbiao, Liu, Zitao

arXiv.org Artificial IntelligenceSep-1-2021

Existing audio-language task-specific predictive approaches focus on building complicated late-fusion mechanisms. However, these models are facing challenges of overfitting with limited labels and low model generalization abilities. In this paper, we present a Cross-modal Transformer for Audio-and-Language, i.e., CTAL, which aims to learn the intra-modality and inter-modality connections between audio and language through two proxy tasks on a large amount of audio-and-language pairs: masked language modeling and masked cross-modal acoustic modeling. After fine-tuning our pre-trained model on multiple downstream audio-and-language tasks, we observe significant improvements across various tasks, such as, emotion classification, sentiment analysis, and speaker verification. On this basis, we further propose a specially-designed fusion mechanism that can be used in fine-tuning phase, which allows our pre-trained model to achieve better performance. Lastly, we demonstrate detailed ablation studies to prove that both our novel cross-modality fusion component and audio-language pre-training methods significantly contribute to the promising results.

arxiv preprint arxiv, dataset, representation, (15 more...)

arXiv.org Artificial Intelligence

2109.00181

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.64)

Industry: