Kernel Deformed Exponential Families for Sparse Continuous Attention

Moreno, Alexander, Nagesh, Supriya, Wu, Zhenke, Dempsey, Walter, Rehg, James M.

Nov-12-2021–arXiv.org Machine Learning

Attention mechanisms take an expectation of a data representation with respect to probability weights. This creates summary statistics that focus on important features. Recently, Martins et al. (2020; 2021) proposed continuous attention mechanisms, focusing on unimodal attention densities from the exponential and deformed exponential families: the latter has sparse support. Farinhas et al. (2021) extended this to use Gaussian mixture attention densities, which are a flexible class with dense support. In this paper, we extend this to two general flexible classes: kernel exponential families (Canu & Smola, 2006) and our new sparse counterpart kernel deformed exponential families. Theoretically, we show new existence results for both kernel exponential and deformed exponential families, and that the deformed case has similar approximation capabilities to kernel exponential families. Experiments show that kernel deformed exponential families can attend to multiple compact regions of the data domain. Attention mechanisms take weighted averages of data representations (Bahdanau et al., 2015), where the weights are a function of input objects. These are then used as inputs for prediction. Discrete attention 1) cannot easily handle data where observations are irregularly spaced 2) attention maps may be scattered, lacking focus.

attention density, exp 2, exponential family, (14 more...)

arXiv.org Machine Learning

Nov-12-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Michigan > Washtenaw County
    - Ann Arbor (0.04)
  - Georgia > Fulton County
    - Atlanta (0.04)

Genre:
- Research Report (0.50)

Industry:
- Health & Medicine (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning
    - Statistical Learning (1.00)
    - Neural Networks > Deep Learning (0.46)