Goto

Collaborating Authors

 Commonsense Reasoning



Reasons and Solutions for the Decline in Model Performance after Editing Xiusheng Huang

Neural Information Processing Systems

Knowledge editing technology has received widespread attention for low-cost updates of incorrect or outdated knowledge in large-scale language models. However, recent research has found that edited models often exhibit varying degrees of performance degradation.


model on any particular supervised task). We compared with GPT-2 (345M) on the Winograd Schema Challenge

Neural Information Processing Systems

Interesting to see how well the proposed model would do under such zero-shot setup (i.e. GPT -2 accuracy is taken from their paper. The BERT paper reports that BooksCorpus and Wikipedia contain 0.8B and 2.5B words, respectively. For our processed data, BooksCorpus and Wikipedia contain 0.75B and 2B words, respectively. The implementation is the same as word embedding, i.e., a lookup "Segment 1", and "Segment 2") and feed it to model input, which indicates the segment of input tokens.





Automating Dataset Updates Towards Reliable and Timely Evaluation of Large Language Models

Neural Information Processing Systems

There are two updating strategies: 1) mimicking strategy to generate similar samples based on original data, preserving stylistic and contextual essence, and 2) extending strategy that further expands existing samples at varying cognitive levels by adapting Bloom's taxonomy of educational objectives.



Ensembling Graph Predictions for AMR Parsing Hoang Thanh Lam

Neural Information Processing Systems

AMR parsing is an important problem in natural language processing (NLP) research and it has a broad application in downstream tasks such as question answering [Kapanipathi et al., 2020] and common sense reasoning [Lim et al., 2020].


Ensembling Graph Predictions for AMR Parsing Hoang Thanh Lam

Neural Information Processing Systems

AMR parsing is an important problem in natural language processing (NLP) research and it has a broad application in downstream tasks such as question answering [Kapanipathi et al., 2020] and common sense reasoning [Lim et al., 2020].