Goto

Collaborating Authors

 dual model



e52ad5c9f751f599492b4f087ed7ecfc-AuthorFeedback.pdf

Neural Information Processing Systems

Due to limited time, we evaluated SNM [Yin and Neubig, 2017] on Python dataset.5 SNM explicitly introduces the constraints of grammar rules when generating ASTs. The BLEU score for SNM is6 10.62 and similar to our Basic model, indicating that the CG task on this dataset is very challenging. In particular,7 all prediction of SNM is valid, whereas the percentage of valid code generated by the dual model is low (Table 1).8 Since CS and CG models are trained at the same time and the parameters of the36 two models are separate after the joint training, i.e., the two models solve their respective tasks separately after the37 joint training, the number of parameters of each dual model is the same as that of the basic model.



Towards Understanding How Transformers Learn In-context Through a Representation Learning Lens

Neural Information Processing Systems

Pre-trained large language models based on Transformers have demonstrated remarkable in-context learning (ICL) abilities. With just a few demonstration examples, the models can implement new tasks without any parameter updates. However, it is still an open question to understand the mechanism of ICL. In this paper, we attempt to explore the ICL process in Transformers through a lens of representation learning. Initially, leveraging kernel methods, we figure out a dual model for one softmax attention layer.


Towards Understanding How Transformers Learn In-context Through a Representation Learning Lens

Neural Information Processing Systems

Pre-trained large language models based on Transformers have demonstrated remarkable in-context learning (ICL) abilities. With just a few demonstration examples, the models can implement new tasks without any parameter updates. However, it is still an open question to understand the mechanism of ICL.


Code Generation as a Dual Task of Code Summarization

Bolin Wei, Ge Li, Xin Xia, Zhiyi Fu, Zhi Jin

Neural Information Processing Systems

On the other hand, CG is an indispensable process in which programmers write code to implement specific intents [Balzer, 1985]. Proper comments and correct code can massively improve programmers' productivity and enhance software quality.




Towards Understanding How Transformers Learn In-context Through a Representation Learning Lens

Neural Information Processing Systems

Pre-trained large language models based on Transformers have demonstrated remarkable in-context learning (ICL) abilities. With just a few demonstration examples, the models can implement new tasks without any parameter updates. However, it is still an open question to understand the mechanism of ICL. In this paper, we attempt to explore the ICL process in Transformers through a lens of representation learning. Initially, leveraging kernel methods, we figure out a dual model for one softmax attention layer.


FedFixer: Mitigating Heterogeneous Label Noise in Federated Learning

Ji, Xinyuan, Zhu, Zhaowei, Xi, Wei, Gadyatskaya, Olga, Song, Zilong, Cai, Yong, Liu, Yang

arXiv.org Artificial Intelligence

Federated Learning (FL) heavily depends on label quality for its performance. However, the label distribution among individual clients is always both noisy and heterogeneous. The high loss incurred by client-specific samples in heterogeneous label noise poses challenges for distinguishing between client-specific and noisy label samples, impacting the effectiveness of existing label noise learning approaches. To tackle this issue, we propose FedFixer, where the personalized model is introduced to cooperate with the global model to effectively select clean client-specific samples. In the dual models, updating the personalized model solely at a local level can lead to overfitting on noisy data due to limited samples, consequently affecting both the local and global models' performance. To mitigate overfitting, we address this concern from two perspectives. Firstly, we employ a confidence regularizer to alleviate the impact of unconfident predictions caused by label noise. Secondly, a distance regularizer is implemented to constrain the disparity between the personalized and global models. We validate the effectiveness of FedFixer through extensive experiments on benchmark datasets. The results demonstrate that FedFixer can perform well in filtering noisy label samples on different clients, especially in highly heterogeneous label noise scenarios.