Goto

Collaborating Authors

 Xue, Zhikai


Interweaving Memories of a Siamese Large Language Model

arXiv.org Artificial Intelligence

Parameter-efficient fine-tuning (PEFT) methods optimize large language models (LLMs) by modifying or introducing a small number of parameters to enhance alignment with downstream tasks. However, they can result in catastrophic forgetting, where LLMs prioritize new knowledge at the expense of comprehensive world knowledge. A promising approach to mitigate this issue is to recall prior memories based on the original knowledge. To this end, we propose a model-agnostic PEFT framework, IMSM, which Interweaves Memories of a Siamese Large Language Model. Specifically, our siamese LLM is equipped with an existing PEFT method. Given an incoming query, it generates two distinct memories based on the pre-trained and fine-tuned parameters. IMSM then incorporates an interweaving mechanism that regulates the contributions of both original and enhanced memories when generating the next token. This framework is theoretically applicable to all open-source LLMs and existing PEFT methods. We conduct extensive experiments across various benchmark datasets, evaluating the performance of popular open-source LLMs using the proposed IMSM, in comparison to both classical and leading PEFT methods. Our findings indicate that IMSM maintains comparable time and space efficiency to backbone PEFT methods while significantly improving performance and effectively mitigating catastrophic forgetting.


Disentangling the Potential Impacts of Papers into Diffusion, Conformity, and Contribution Values

arXiv.org Artificial Intelligence

The potential impact of an academic paper is determined by various factors, including its popularity and contribution. Existing models usually estimate original citation counts based on static graphs and fail to differentiate values from nuanced perspectives. In this study, we propose a novel graph neural network to Disentangle the Potential impacts of Papers into Diffusion, Conformity, and Contribution values (called DPPDCC). Given a target paper, DPPDCC encodes temporal and structural features within the constructed dynamic heterogeneous graph. Particularly, to capture the knowledge flow, we emphasize the importance of comparative and co-cited/citing information between papers and aggregate snapshots evolutionarily. To unravel popularity, we contrast augmented graphs to extract the essence of diffusion and predict the accumulated citation binning to model conformity. We further apply orthogonal constraints to encourage distinct modeling of each perspective and preserve the inherent value of contribution. To evaluate models' generalization for papers published at various times, we reformulate the problem by partitioning data based on specific time points to mirror real-world conditions. Extensive experimental results on three datasets demonstrate that DPPDCC significantly outperforms baselines for previously, freshly, and immediately published papers. Further analyses confirm its robust capabilities. We will make our datasets and codes publicly available.


H2CGL: Modeling Dynamics of Citation Network for Impact Prediction

arXiv.org Artificial Intelligence

Assessing the potential impact of papers is of great significance to both academia and industry (Wang, Song and Barabási, 2013), especially given the exponential annual growth in the number of papers (Lo, Wang, Neumann, Kinney and Weld, 2020; Chu and Evans, 2021; Xue, He, Liu, Jiang, Zhao and Lu, 2023). As the numerical value of the scientific impact could be difficult to determine, citation count is frequently employed as a rough estimate (Evans and Reimer, 2009; Sinatra, Wang, Deville, Song and Barabási, 2016; Jiang, Koch and Sun, 2021). Actually, the dynamics in citation networks cannot be ignored. For example, the "sleeping beauties" (Van Raan, 2004) phenomenon indicates that the citations of a paper can vary considerably in different time periods. Besides the content quality, the future citations of a paper will be influenced by newly published papers (Funk and Owen-Smith, 2017; Park, Leahey and Funk, 2023). New papers may be successors to older ones, discovering the importance of previous works, thereby drawing more citations for them; or new papers may be competing with older ones, correcting or improving the previous works, thus making them lose potential citations. Therefore, it's imperative to capture dynamics of the citation network to accurately predict the future citations of a target paper. Previous studies within informetrics have primarily concentrated on content information or citation networks of papers.