Feng, Chong
PSPO*: An Effective Process-supervised Policy Optimization for Reasoning Alignment
Li, Jiawei, Liang, Xinyue, Yang, Yizhe, Feng, Chong, Gao, Yang
Process supervision enhances the performance of large language models in reasoning tasks by providing feedback at each step of chain-of-thought reasoning. However, due to the lack of effective process supervision methods, even advanced large language models are prone to logical errors and redundant reasoning. We claim that the effectiveness of process supervision significantly depends on both the accuracy and the length of reasoning chains. Moreover, we identify that these factors exhibit a nonlinear relationship with the overall reward score of the reasoning process. Inspired by these insights, we propose a novel process supervision paradigm, PSPO*, which systematically outlines the workflow from reward model training to policy optimization, and highlights the importance of nonlinear rewards in process supervision. Based on PSPO*, we develop the PSPO-WRS, which considers the number of reasoning steps in determining reward scores and utilizes an adjusted Weibull distribution for nonlinear reward shaping. Experimental results on six mathematical reasoning datasets demonstrate that PSPO-WRS consistently outperforms current mainstream models.
QRMeM: Unleash the Length Limitation through Question then Reflection Memory Mechanism
Wang, Bo, Huang, Heyan, Cao, Yixin, Ying, Jiahao, Tang, Wei, Feng, Chong
While large language models (LLMs) have made notable advancements in natural language processing, they continue to struggle with processing extensive text. Memory mechanism offers a flexible solution for managing long contexts, utilizing techniques such as compression, summarization, and structuring to facilitate nuanced and efficient handling of large volumes of text. However, existing techniques face challenges with static knowledge integration, leading to insufficient adaptation to task-specific needs and missing multi-segmentation relationships, which hinders the dynamic reorganization and logical combination of relevant segments during the response process. To address these issues, we introduce a novel strategy, Question then Reflection Memory Mechanism (QRMeM), incorporating a dual-structured memory pool. This pool synergizes static textual content with structured graph guidance, fostering a reflective trial-and-error approach for navigating and identifying relevant segments. Our evaluation across multiple-choice questions (MCQ) and multi-document question answering (Multi-doc QA) benchmarks showcases QRMeM enhanced performance compared to existing approaches.
RAAMove: A Corpus for Analyzing Moves in Research Article Abstracts
Li, Hongzheng, Wang, Ruojin, Shi, Ge, Lv, Xing, Lei, Lei, Feng, Chong, Liu, Fang, Lin, Jinkun, Mei, Yangguang, Xu, Lingnan
Move structures have been studied in English for Specific Purposes (ESP) and English for Academic Purposes (EAP) for decades. However, there are few move annotation corpora for Research Article (RA) abstracts. In this paper, we introduce RAAMove, a comprehensive multi-domain corpus dedicated to the annotation of move structures in RA abstracts. The primary objective of RAAMove is to facilitate move analysis and automatic move identification. This paper provides a thorough discussion of the corpus construction process, including the scheme, data collection, annotation guidelines, and annotation procedures. The corpus is constructed through two stages: initially, expert annotators manually annotate high-quality data; subsequently, based on the human-annotated data, a BERT-based model is employed for automatic annotation with the help of experts' modification. The result is a large-scale and high-quality corpus comprising 33,988 annotated instances. We also conduct preliminary move identification experiments using the BERT-based model to verify the effectiveness of the proposed corpus and model. The annotated corpus is available for academic research purposes and can serve as essential resources for move analysis, English language teaching and writing, as well as move/discourse-related tasks in Natural Language Processing (NLP).
Boosting Event Extraction with Denoised Structure-to-Text Augmentation
wang, bo, Huang, Heyan, Wei, Xiaochi, Shi, Ge, Liu, Xiao, Feng, Chong, Zhou, Tong, Wang, Shuaiqiang, Yin, Dawei
Event extraction aims to recognize pre-defined event triggers and arguments from texts, which suffer from the lack of high-quality annotations. In most NLP applications, involving a large scale of synthetic training data is a practical and effective approach to alleviate the problem of data scarcity. However, when applying to the task of event extraction, recent data augmentation methods often neglect the problem of grammatical incorrectness, structure misalignment, and semantic drifting, leading to unsatisfactory performances. In order to solve these problems, we propose a denoised structure-to-text augmentation framework for event extraction DAEE, which generates additional training data through the knowledge-based structure-to-text generation model and selects the effective subset from the generated data iteratively with a deep reinforcement learning agent. Experimental results on several datasets demonstrate that the proposed method generates more diverse text representations for event extraction and achieves comparable results with the state-of-the-art.
Rethinking Adjacent Dependency in Session-based Recommendations
Zhang, Qian, Wang, Shoujin, Lu, Wenpeng, Feng, Chong, Peng, Xueping, Wang, Qingxiang
Session-based recommendations (SBRs) recommend the next item for an anonymous user by modeling the dependencies between items in a session. Benefiting from the superiority of graph neural networks (GNN) in learning complex dependencies, GNN-based SBRs have become the main stream of SBRs in recent years. Most GNN-based SBRs are based on a strong assumption of adjacent dependency, which means any two adjacent items in a session are necessarily dependent here. However, based on our observation, the adjacency does not necessarily indicate dependency due to the uncertainty and complexity of user behaviours. Therefore, the aforementioned assumption does not always hold in the real-world cases and thus easily leads to two deficiencies: (1) the introduction of false dependencies between items which are adjacent in a session but are not really dependent, and (2) the missing of true dependencies between items which are not adjacent but are actually dependent. Such deficiencies significantly downgrade accurate dependency learning and thus reduce the recommendation performance. Aiming to address these deficiencies, we propose a novel review-refined inter-item graph neural network (RI-GNN), which utilizes the topic information extracted from items' reviews to refine dependencies between items. Experiments on two public real-world datasets demonstrate that RI-GNN outperforms the state-of-the-art methods.