Goto

Collaborating Authors

 transpose


MonarchAttention: Zero-Shot Conversion to Fast, Hardware-Aware Structured Attention

Neural Information Processing Systems

Transformers have achieved state-of-the-art performance across various tasks, but suffer from a notable quadratic complexity in sequence length due to the attention mechanism. In this work, we propose MonarchAttention-a novel approach to sub-quadratic attention approximation via Monarch matrices, an expressive class of structured matrices. Based on the variational form of softmax, we describe an efficient optimization-based algorithm to compute an approximate projection of softmax attention onto the class of Monarch matrices with Θ(N Nd) computational complexity and Θ(Nd)memory/IO complexity.


Appendix for "Episodic Multi-Task Learning with Heterogeneous Neural Processes "

Neural Information Processing Systems

In this section, we list frequently asked questions from researchers who help proofread this manuscript. These raised questions might also be relevant for others and help in better understanding the paper, so we include more detailed discussions here. This work considers the multi-input multi-output setting of multi-task learning under the episodic training mechanism. As shown in Table 1, we use "Heterogeneous tasks" to distinguish the different branches of multi-task learning: (1) single-input multi-output (SIMO) considers different tasks which have the same input and different supervision information. All tasks are related since they share the target space. This setting encourages deep models to deal with the insufficient data of each task by aggregating the training data from related tasks in the spirit of data augmentation. Meanwhile, "Episodic training" is used to describe the data-feeding strategy. Multi-task meta-learning also benefits from episodic training, but it follows the SIMO setting in every single episode and cannot sufficiently handle heterogeneous tasks.



Appendix for "Episodic Multi-Task Learning with Heterogeneous Neural Processes "

Neural Information Processing Systems

Appendix for "Episodic Multi-T ask Learning with Heterogeneous Neural Processes" In this section, we list frequently asked questions from researchers who help proofread this manuscript. As shown in Table 1, we use "Heterogeneous tasks" to distinguish the different branches of multi-task Meanwhile, "Episodic training" is used to describe the data-feeding strategy. Thus, "Heterogeneous tasks" is not available here (-). In episodic multi-task learning, we restrict the scope of the problem to the case where tasks in the same episode are related and share the same target space. This also implies that tasks with the same target space are related.




2 Method Notations We use X>, X1, Tr(X) and vec(X) to denote the transpose, inverse, trace, and column-wise vectorization of amatrixX. We use X Y to represent the Kronecker product

Neural Information Processing Systems

In contrast, artificial agents are prone to'catastrophic forgetting' whereby performance on previous tasks deteriorates rapidly as new ones are acquired. This shortcoming has recently been addressed using methods that encourage parameters tostay close tothose used forprevious tasks.




3ca6d336ddaa316a6ae953a20b9477cf-Supplemental-Conference.pdf

Neural Information Processing Systems

Totackle with arange of noise levels, the training images are corrupted by Gaussian noisewithσ randomly chosefrom[0,50]. Swin transformer: Hierarchical vision transformer using shifted windows.