Zhu, Minjun
DeepReview: Improving LLM-based Paper Review with Human-like Deep Thinking Process
Zhu, Minjun, Weng, Yixuan, Yang, Linyi, Zhang, Yue
Large Language Models (LLMs) are increasingly utilized in scientific research assessment, particularly in automated paper review. However, existing LLMbased review systems face significant challenges, including limited domain expertise, hallucinated reasoning, and a lack of structured evaluation. To address these limitations, we introduce DeepReview, a multi-stage framework designed to emulate expert reviewers by incorporating structured analysis, literature retrieval, and evidence-based argumentation. Using DeepReview-13K, a curated dataset with structured annotations, we train DeepReviewer-14B, which outperforms CycleReviewer-70B with fewer tokens. In its best mode, DeepReviewer-14B achieves win rates of 88.21% and 80.20% against GPT-o1 and DeepSeek-R1 in evaluations. Our work sets a new benchmark for LLM-based paper review, with all resources publicly available. The code, model, dataset and demo have be released in http://ai-researcher.net.
Direct Preference Optimization Using Sparse Feature-Level Constraints
Yin, Qingyu, Leong, Chak Tou, Zhang, Hongbo, Zhu, Minjun, Yan, Hanqi, Zhang, Qiang, He, Yulan, Li, Wenjie, Wang, Jun, Zhang, Yue, Yang, Linyi
The alignment of large language models (LLMs) with human preferences remains a key challenge. While post-training techniques like Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO) have achieved notable success, they often experience computational inefficiencies and training instability. In this paper, we propose Feature-level constrained Preference Optimization (FPO), a novel method designed to simplify the alignment process while ensuring stability. FPO leverages pre-trained Sparse Autoencoders (SAEs) and introduces feature-level constraints, allowing for efficient, sparsity-enforced alignment. Our approach enjoys efficiency by using sparse features activated in a well-trained sparse autoencoder and the quality of sequential KL divergence by using the feature-level offline reference. Experimental results on benchmark datasets demonstrate that FPO achieves an above 5% absolute improvement in win rate with much lower computational cost compared to state-of-the-art baselines, making it a promising solution for efficient and controllable LLM alignments. Aligning large language models (LLMs) with human values and practical objectives is a critical challenge in AI development (Wang et al., 2023). Post-training methods, including fine-tuning (Wei et al., 2022; Chung et al., 2024) and alignment strategies (Tunstall et al., 2023), have played a significant role in refining LLM behavior. Among these, Reinforcement Learning from Human Feedback (RLHF) (Christiano et al., 2017; Ouyang et al., 2022) has emerged as a leading technique, integrating human feedback to guide models towards producing valuable and useful outputs.
CycleResearcher: Improving Automated Research via Automated Review
Weng, Yixuan, Zhu, Minjun, Bao, Guangsheng, Zhang, Hongbo, Wang, Jindong, Zhang, Yue, Yang, Linyi
The automation of scientific discovery has been a long-standing goal within the research community, driven by the potential to accelerate knowledge creation. While significant progress has been made using commercial large language models (LLMs) as research assistants or idea generators, the possibility of automating the entire research process with open-source LLMs remains largely unexplored. This paper explores the feasibility of using open-source post-trained LLMs as autonomous agents capable of performing the full cycle of automated research and review, from literature review and manuscript preparation to peer review and paper revision. Our iterative preference training framework consists of CycleResearcher, which conducts research tasks, and CycleReviewer, which simulates the peer review process, providing iterative feedback via reinforcement learning. To train these models, we develop two new datasets, Review-5k and Research-14k, reflecting real-world machine learning research and peer review dynamics. Our results demonstrate that CycleReviewer achieves a 26.89\% improvement in mean absolute error (MAE) over individual human reviewers in predicting paper scores, indicating that LLMs can surpass expert-level performance in research evaluation. In research, the papers generated by the CycleResearcher model achieved a score of 5.36 in simulated peer reviews, surpassing the preprint level of 5.24 from human experts and approaching the accepted paper level of 5.69. This work represents a significant step toward fully automated scientific inquiry, providing ethical safeguards and advancing AI-driven research capabilities. The code, dataset and model weight are released at \url{http://github/minjun-zhu/Researcher}.
Locking Down the Finetuned LLMs Safety
Zhu, Minjun, Yang, Linyi, Wei, Yifan, Zhang, Ningyu, Zhang, Yue
Fine-tuning large language models (LLMs) on additional datasets is often necessary to optimize them for specific downstream tasks. However, existing safety alignment measures, which restrict harmful behavior during inference, are insufficient to mitigate safety risks during fine-tuning. Alarmingly, fine-tuning with just 10 toxic sentences can make models comply with harmful instructions. We introduce SafetyLock, a novel alignment intervention method that maintains robust safety post-fine-tuning through efficient and transferable mechanisms. SafetyLock leverages our discovery that fine-tuned models retain similar safety-related activation representations to their base models. This insight enables us to extract what we term the Meta-SafetyLock, a set of safety bias directions representing key activation patterns associated with safe responses in the original model. We can then apply these directions universally to fine-tuned models to enhance their safety. By searching for activation directions across multiple token dimensions, SafetyLock achieves enhanced robustness and transferability. Our experiments demonstrate that SafetyLock can reduce the harmful instruction response rate from 60% to below 1% in toxic fine-tuned models. It surpasses traditional methods in both performance and efficiency, offering a scalable, non-invasive solution for ensuring the safety of customized LLMs. Our analysis across various fine-tuning scenarios confirms SafetyLock's robustness, advocating its integration into safety protocols for aligned LLMs. Large language models (LLMs) have demonstrated increasing utility across various domains (Wei et al., 2022b;a; Weng et al., 2023; Hadar-Shoval et al., 2024), yet their potential to handle harmful queries has raised significant concerns (Carroll et al., 2023; Hendrycks et al., 2023). These measures are expected to teach models to refuse harmful queries during inference (Huang et al., 2024b; Wang et al., 2024b; Raza et al., 2024; Zou et al., 2024).
Large Language Models are Better Reasoners with Self-Verification
Weng, Yixuan, Zhu, Minjun, Xia, Fei, Li, Bin, He, Shizhu, Liu, Shengping, Sun, Bin, Liu, Kang, Zhao, Jun
Recently, with the chain of thought (CoT) prompting, large language models (LLMs), e.g., GPT-3, have shown strong reasoning ability in several natural language processing tasks such as arithmetic, commonsense, and logical reasoning. However, LLMs with CoT require multi-step prompting and multi-token prediction, which is highly sensitive to individual mistakes and vulnerable to error accumulation. The above issues make the LLMs need the ability to verify the answers. In fact, after inferring conclusions in some thinking decision tasks, people often check them by re-verifying steps to avoid some mistakes. In this paper, we propose and prove that LLMs also have similar self-verification abilities. We take the conclusion obtained by CoT as one of the conditions for solving the original problem. By performing a backward verification of the answers that LLM deduced for itself, we can obtain interpretable answer validation scores to select the candidate answer with the highest score. Experimental results demonstrate that the proposed method can improve the reasoning performance on various arithmetic, commonsense, and logical reasoning datasets. Our code is publicly available at: https://github.com/WENGSYX/Self-Verification.
Towards Graph-hop Retrieval and Reasoning in Complex Question Answering over Textual Database
Zhu, Minjun, Weng, Yixuan, He, Shizhu, Liu, Kang, Zhao, Jun
In Textual question answering (TQA) systems, complex questions often require retrieving multiple textual fact chains with multiple reasoning steps. While existing benchmarks are limited to single-chain or single-hop retrieval scenarios. In this paper, we propose to conduct Graph-Hop -- a novel multi-chains and multi-hops retrieval and reasoning paradigm in complex question answering. We construct a new benchmark called ReasonGraphQA, which provides explicit and fine-grained evidence graphs for complex questions to support interpretable reasoning, comprehensive and detailed reasoning. And ReasonGraphQA also shows an advantage in reasoning diversity and scale. Moreover, We propose a strong graph-hop baseline called Bidirectional Graph Retrieval (BGR) method for generating an explanation graph of textual evidence in knowledge reasoning and question answering. We have thoroughly evaluated existing evidence retrieval and reasoning models on the ReasonGraphQA. Experiments highlight Graph-Hop is a promising direction for answering complex questions, but it still has certain limitations. We have further studied mitigation strategies to meet these challenges and discuss future directions.
Mastering Symbolic Operations: Augmenting Language Models with Compiled Neural Networks
Weng, Yixuan, Zhu, Minjun, Xia, Fei, Li, Bin, He, Shizhu, Liu, Kang, Zhao, Jun
Language models (LMs) proficiency in handling deterministic symbolic reasoning and rule-based tasks remains limited due to their dependency implicit learning on textual data. To enable fully rule comprehension ability, we explore how to incorporate compiled neural networks (CoNNs) which weight is specially designed into the architecture of LMs, to achieve high accuracy and robust performance. CoNNs are transformer-based neural networks that execute rules through artificially generated attention weights. Our method, which call "Neural Comprehension", by incorporating CoNN modules into the LM, the framework effectively tackles rule-intensive challenges. Our experiments on symbolic reasoning tasks and real-world arithmetic reasoning tasks demonstrate the superior performance of our method compared to existing techniques. Furthermore, our LM achieves flawless execution on symbolic operations tasks, highlighting the potential of our method in enabling LMs to possess true symbolic comprehension capabilities. Our code is publicly available at: https://github.com/WENGSYX/Neural-Comprehension.
Large Language Models Need Holistically Thought in Medical Conversational QA
Weng, Yixuan, Li, Bin, Xia, Fei, Zhu, Minjun, Sun, Bin, He, Shizhu, Liu, Kang, Zhao, Jun
The medical conversational question answering (CQA) system aims at providing a series of professional medical services to improve the efficiency of medical care. Despite the success of large language models (LLMs) in complex reasoning tasks in various fields, such as mathematics, logic, and commonsense QA, they still need to improve with the increased complexity and specialization of the medical field. This is because medical CQA tasks require not only strong medical reasoning, but also the ability to think broadly and deeply. In this paper, to address these challenges in medical CQA tasks that need to be considered and understood in many aspects, we propose the Holistically Thought (HoT) method, which is designed to guide the LLMs to perform the diffused and focused thinking for generating high-quality medical responses. The proposed HoT method has been evaluated through automated and manual assessments in three different medical CQA datasets containing the English and Chinese languages. The extensive experimental results show that our method can produce more correctness, professional, and considerate answers than several state-of-the-art (SOTA) methods, manifesting its effectiveness. Our code in https://github.com/WENGSYX/HoT.
ReasonChainQA: Text-based Complex Question Answering with Explainable Evidence Chains
Zhu, Minjun, Weng, Yixuan, He, Shizhu, Liu, Kang, Zhao, Jun
The ability of reasoning over evidence has received increasing attention in question answering (QA). Recently, natural language database (NLDB) conducts complex QA in knowledge base with textual evidences rather than structured representations, this task attracts a lot of attention because of the flexibility and richness of textual evidence. However, existing text-based complex question answering datasets fail to provide explicit reasoning process, while it's important for retrieval effectiveness and reasoning interpretability. Therefore, we present a benchmark \textbf{ReasonChainQA} with explanatory and explicit evidence chains. ReasonChainQA consists of two subtasks: answer generation and evidence chains extraction, it also contains higher diversity for multi-hop questions with varying depths, 12 reasoning types and 78 relations. To obtain high-quality textual evidences for answering complex question. Additional experiment on supervised and unsupervised retrieval fully indicates the significance of ReasonChainQA. Dataset and codes will be made publicly available upon accepted.