AITopics | primal-attention

cd687a58a13b673eea3fc1b2e4944cf7-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 19:34:14 GMT

artificial intelligence, machine learning, primal-attention, (17 more...)

Neural Information Processing Systems

Country: Europe > Belgium (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Supplementary Material Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation Yingyi Chen

Neural Information Processing SystemsFeb-17-2026, 04:22:05 GMT

Comments on Theorem 3.2 With the primal problem in (6) in the paper, Theorem 3.2 provides Additionally, [27] presents the optimization w.r.t. a single projection direction in Therefore, our KSVD is more general in the data setups. Remark 3.3, we show that the values can be regarded as playing the role of the dual variables Using data-dependent projection weights does not affect the derivation of the shifted eigenvalue problem in the dual. With the derivations of the primal-dual optimization problems above, the primal-dual model representation of our KSVD problem can be provided correspondingly. Lemma 4.2 evaluates the objective value Moreover, as in the proof of Theorem 3.2, we note that the regularization coefficient This section provides the implementation details of all experiments included in the paper. This will be illustrated in details in the following.Algorithm 1 Learning with Primal-AttentionRequire: X:= [ x UEA Time Series The UEA time series benchmark [31] consists of 30 datasets. Following the setup in [11], we select 10 datasets for evaluation.

artificial intelligence, machine learning, primal-attention, (17 more...)

Neural Information Processing Systems

Country: Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

cd687a58a13b673eea3fc1b2e4944cf7-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 04:22:03 GMT

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.05)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)

Add feedback

Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation

Neural Information Processing SystemsDec-26-2025, 19:26:51 GMT

Recently, a new line of works has emerged to understand and improve self-attention in Transformers by treating it as a kernel machine. However, existing works apply the methods for symmetric kernels to the asymmetric self-attention, resulting in a nontrivial gap between the analytical understanding and numerical implementation. In this paper, we provide a new perspective to represent and optimize self-attention through asymmetric Kernel Singular Value Decomposition (KSVD), which is also motivated by the low-rank property of self-attention normally observed in deep layers. Through asymmetric KSVD, i) a primal-dual representation of self-attention is formulated, where the optimization objective is cast to maximize the projection variances in the attention outputs; ii) a novel attention mechanism, i.e., Primal-Attention, is proposed via the primal representation of KSVD, avoiding explicit computation of the kernel matrix in the dual; iii) with KKT conditions, we prove that the stationary solution to the KSVD optimization in Primal-Attention yields a zero-value objective. In this manner, KSVD optimization can be implemented by simply minimizing a regularization loss, so that low-rank property is promoted without extra decomposition. Numerical experiments show state-of-the-art performance of our Primal-Attention with improved efficiency. Moreover, we demonstrate that the deployed KSVD optimization regularizes Primal-Attention with a sharper singular value decay than that of the canonical self-attention, further verifying the great potential of our method. To the best of our knowledge, this is the first work that provides a primal-dual representation for the asymmetric kernel in self-attention and successfully applies it to modelling and optimization.

asymmetric kernel svd, primal-attention, self-attention, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

cd687a58a13b673eea3fc1b2e4944cf7-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 07:49:05 GMT

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.05)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)

Add feedback

Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation

Neural Information Processing SystemsJan-19-2025, 22:24:39 GMT

Recently, a new line of works has emerged to understand and improve self-attention in Transformers by treating it as a kernel machine. However, existing works apply the methods for symmetric kernels to the asymmetric self-attention, resulting in a nontrivial gap between the analytical understanding and numerical implementation. In this paper, we provide a new perspective to represent and optimize self-attention through asymmetric Kernel Singular Value Decomposition (KSVD), which is also motivated by the low-rank property of self-attention normally observed in deep layers. Through asymmetric KSVD, i) a primal-dual representation of self-attention is formulated, where the optimization objective is cast to maximize the projection variances in the attention outputs; ii) a novel attention mechanism, i.e., Primal-Attention, is proposed via the primal representation of KSVD, avoiding explicit computation of the kernel matrix in the dual; iii) with KKT conditions, we prove that the stationary solution to the KSVD optimization in Primal-Attention yields a zero-value objective. In this manner, KSVD optimization can be implemented by simply minimizing a regularization loss, so that low-rank property is promoted without extra decomposition.

asymmetric kernel svd, primal representation, primal-attention, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation

Chen, Yingyi, Tao, Qinghua, Tonin, Francesco, Suykens, Johan A. K.

arXiv.org Artificial IntelligenceDec-5-2023

Recently, a new line of works has emerged to understand and improve self-attention in Transformers by treating it as a kernel machine. However, existing works apply the methods for symmetric kernels to the asymmetric self-attention, resulting in a nontrivial gap between the analytical understanding and numerical implementation. In this paper, we provide a new perspective to represent and optimize self-attention through asymmetric Kernel Singular Value Decomposition (KSVD), which is also motivated by the low-rank property of self-attention normally observed in deep layers. Through asymmetric KSVD, $i$) a primal-dual representation of self-attention is formulated, where the optimization objective is cast to maximize the projection variances in the attention outputs; $ii$) a novel attention mechanism, i.e., Primal-Attention, is proposed via the primal representation of KSVD, avoiding explicit computation of the kernel matrix in the dual; $iii$) with KKT conditions, we prove that the stationary solution to the KSVD optimization in Primal-Attention yields a zero-value objective. In this manner, KSVD optimization can be implemented by simply minimizing a regularization loss, so that low-rank property is promoted without extra decomposition. Numerical experiments show state-of-the-art performance of our Primal-Attention with improved efficiency. Moreover, we demonstrate that the deployed KSVD optimization regularizes Primal-Attention with a sharper singular value decay than that of the canonical self-attention, further verifying the great potential of our method. To the best of our knowledge, this is the first work that provides a primal-dual representation for the asymmetric kernel in self-attention and successfully applies it to modeling and optimization.

matrix, primal-attention, transformer, (13 more...)

arXiv.org Artificial Intelligence

2305.19798

Country: Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)

Add feedback

Filters

Collaborating Authors

primal-attention

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

cd687a58a13b673eea3fc1b2e4944cf7-Supplemental-Conference.pdf

Supplementary Material Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation Yingyi Chen

cd687a58a13b673eea3fc1b2e4944cf7-Paper-Conference.pdf

Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation

cd687a58a13b673eea3fc1b2e4944cf7-Paper-Conference.pdf

Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation

Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation