AITopics | hierarchical attention

Collaborating Authors

hierarchical attention

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GVMGen: A General Video-to-Music Generation Model with Hierarchical Attentions

Zuo, Heda, You, Weitao, Wu, Junxian, Ren, Shihong, Chen, Pei, Zhou, Mingxu, Lu, Yujia, Sun, Lingyun

arXiv.org Artificial IntelligenceJan-17-2025

Composing music for video is essential yet challenging, leading to a growing interest in automating music generation for video applications. Existing approaches often struggle to achieve robust music-video correspondence and generative diversity, primarily due to inadequate feature alignment methods and insufficient datasets. In this study, we present General Video-to-Music Generation model (GVMGen), designed for generating high-related music to the video input. Our model employs hierarchical attentions to extract and align video features with music in both spatial and temporal dimensions, ensuring the preservation of pertinent features while minimizing redundancy. Remarkably, our method is versatile, capable of generating multi-style music from different video inputs, even in zero-shot scenarios. We also propose an evaluation model along with two novel objective metrics for assessing video-music alignment. Additionally, we have compiled a large-scale dataset comprising diverse types of video-music pairs. Experimental results demonstrate that GVMGen surpasses previous models in terms of music-video correspondence, generative diversity, and application universality.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2501.09972

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

FasterViT: Fast Vision Transformers with Hierarchical Attention

Hatamizadeh, Ali, Heinrich, Greg, Yin, Hongxu, Tao, Andrew, Alvarez, Jose M., Kautz, Jan, Molchanov, Pavlo

arXiv.org Artificial IntelligenceJun-9-2023

We design a new family of hybrid CNN-ViT neural networks, named FasterViT, with a focus on high image throughput for computer vision (CV) applications. FasterViT combines the benefits of fast local representation learning in CNNs and global modeling properties in ViT. Our newly introduced Hierarchical Attention (HAT) approach decomposes global self-attention with quadratic complexity into a multi-level attention with reduced computational costs. We benefit from efficient window-based self-attention. Each window has access to dedicated carrier tokens that participate in local and global representation learning. At a high level, global self-attentions enable the efficient cross-window communication at lower costs. FasterViT achieves a SOTA Pareto-front in terms of accuracy \vs image throughput. We have extensively validated its effectiveness on various CV tasks including classification, object detection and segmentation. We also show that HAT can be used as a plug-and-play module for existing networks and enhance them. We further demonstrate significantly faster and more accurate performance than competitive counterparts for images with high resolution. Code is available at https://github.com/NVlabs/FasterViT.

artificial intelligence, machine learning, transformer, (16 more...)

arXiv.org Artificial Intelligence

2306.06189

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

HAHE: Hierarchical Attention for Hyper-Relational Knowledge Graphs in Global and Local Level

Luo, Haoran, E, Haihong, Yang, Yuhao, Guo, Yikai, Sun, Mingzhi, Yao, Tianyu, Tang, Zichen, Wan, Kaiyang, Song, Meina, Lin, Wei

arXiv.org Artificial IntelligenceMay-15-2023

Link Prediction on Hyper-relational Knowledge Graphs (HKG) is a worthwhile endeavor. HKG consists of hyper-relational facts (H-Facts), composed of a main triple and several auxiliary attribute-value qualifiers, which can effectively represent factually comprehensive information. The internal structure of HKG can be represented as a hypergraph-based representation globally and a semantic sequence-based representation locally. However, existing research seldom simultaneously models the graphical and sequential structure of HKGs, limiting HKGs' representation. To overcome this limitation, we propose a novel Hierarchical Attention model for HKG Embedding (HAHE), including global-level and local-level attention. The global-level attention can model the graphical structure of HKG using hypergraph dual-attention layers, while the local-level attention can learn the sequential structure inside H-Facts via heterogeneous self-attention layers. Experiment results indicate that HAHE achieves state-of-the-art performance in link prediction tasks on HKG standard datasets. In addition, HAHE addresses the issue of HKG multi-position prediction for the first time, increasing the applicability of the HKG link prediction task. Our code is publicly available.

global and local level, hierarchical attention, hyper-relational knowledge graph, (1 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2023.acl-long.450

2305.06588

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.60)

Add feedback

Hierarchical Text Generation using an Outline

Drissi, Mehdi, Watkins, Olivia, Kalita, Jugal

arXiv.org Machine LearningOct-20-2018

Many challenges in natural language processing require generating text, including language translation, dialogue generation, and speech recognition. For all of these problems, text generation becomes more difficult as the text becomes longer. Current language models often struggle to keep track of coherence for long pieces of text. Here, we attempt to have the model construct and use an outline of the text it generates to keep it focused. We find that the usage of an outline improves perplexity. We do not find that using the outline improves human evaluation over a simpler baseline, revealing a discrepancy in perplexity and human perception. Similarly, hierarchical generation is not found to improve human evaluation scores.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

1810.08802

Country:

North America > United States > Indiana (0.14)
North America > United States > Colorado (0.14)

Genre: Research Report (0.83)

Industry: Leisure & Entertainment > Sports > Baseball (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Sentence-State LSTM for Text Representation

Zhang, Yue, Liu, Qi, Song, Linfeng

arXiv.org Machine LearningMay-7-2018

Bidirectional LSTMs are a powerful tool for text representation. On the other hand, they have been shown to suffer various limitations due to their sequential nature. We investigate an alternative LSTM structure for encoding text, which consists of a parallel state for each word. Recurrent steps are used to perform local and global information exchange between words simultaneously, rather than incremental reading of a sequence of words. Results on various classification and sequence labelling benchmarks show that the proposed model has strong representation power, giving highly competitive performances compared to stacked BiLSTM models with similar parameter numbers. 1 Introduction Neural models have become the dominant approach in the NLP literature. Compared to handcrafted indicator features, neural sentence representations are less sparse, and more flexible in encoding intricate syntactic and semantic information. Among various neural networks for encoding sentences, bidirectional LSTMs (BiLSTM) (Hochreiter and Schmidhuber, 1997) have been a dominant method, giving state-of-the-art results in language modelling (Sundermeyer et al., 2012), machine translation (Bahdanau et al., 2015), syntactic parsing (Dozat and Manning, 2017) and question answering (Tan et al., 2015). Despite their success, BiLSTMs have been shown to suffer several limitations.

artificial intelligence, bilstm, machine learning, (19 more...)

arXiv.org Machine Learning

1805.02474

Country:

Europe (0.93)
Asia (0.93)
North America > United States > California (0.28)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Media > Film (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Hierarchical Pointer Memory Network for Task Oriented Dialogue

Raghu, Dinesh, Gupta, Nikhil, Mausam, null

arXiv.org Machine LearningMay-3-2018

We observe that end-to-end memory networks (MN) trained for task-oriented dialogue, such as for recommending restaurants to a user, suffer from an out-of-vocabulary (OOV) problem -- the entities returned by the Knowledge Base (KB) may not be seen by the network at training time, making it impossible for it to use them in dialogue. We propose a Hierarchical Pointer Memory Network (HyP-MN), in which the next word may be generated from the decode vocabulary or copied from a hierarchical memory maintaining KB results and previous utterances. Evaluating over the dialog bAbI tasks, we find that HyP-MN drastically outperforms MN obtaining 12% overall accuracy gains. Further analysis reveals that MN fails completely in recommending any relevant restaurant, whereas HyP-MN recommends the best next restaurant 80% of the time.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

1805.01216

Genre: Research Report (0.64)

Industry: Consumer Products & Services > Restaurants (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

A Mixed Hierarchical Attention based Encoder-Decoder Approach for Standard Table Summarization

Jain, Parag, Laha, Anirban, Sankaranarayanan, Karthik, Nema, Preksha, Khapra, Mitesh M., Shetty, Shreyas

arXiv.org Artificial IntelligenceApr-20-2018

Structured data summarization involves generation of natural language summaries from structured input data. In this work, we consider summarizing structured data occurring in the form of tables as they are prevalent across a wide variety of domains. We formulate the standard table summarization problem, which deals with tables conforming to a single predefined schema. To this end, we propose a mixed hierarchical attention based encoder-decoder model which is able to leverage the structure in addition to the content of the tables. Our experiments on the publicly available WEATHERGOV dataset show around 18 BLEU (~ 30%) improvement over the current state-of-the-art.

artificial intelligence, natural language, precipitation, (15 more...)

arXiv.org Artificial Intelligence

1804.0779

Country: North America > United States (1.00)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)

Add feedback