Education
ClaimGen-CN: A Large-scale Chinese Dataset for Legal Claim Generation
Zhou, Siying, Wu, Yiquan, Chen, Hui, Hu, Xavier, Kuang, Kun, Jatowt, Adam, Hu, Ming, Zheng, Chunyan, Wu, Fei
Legal claims refer to the plaintiff's demands in a case and are essential to guiding judicial reasoning and case resolution. While many works have focused on improving the efficiency of legal professionals, the research on helping non-professionals (e.g., plaintiffs) remains unexplored. This paper explores the problem of legal claim generation based on the given case's facts. First, we construct ClaimGen-CN, the first dataset for Chinese legal claim generation task, from various real-world legal disputes. Additionally, we design an evaluation metric tailored for assessing the generated claims, which encompasses two essential dimensions: factuality and clarity. Building on this, we conduct a comprehensive zero-shot evaluation of state-of-the-art general and legal-domain large language models. Our findings highlight the limitations of the current models in factual precision and expressive clarity, pointing to the need for more targeted development in this domain. To encourage further exploration of this important task, we will make the dataset publicly available.
Reasoning as an Adaptive Defense for Safety
Kim, Taeyoun, Tajwar, Fahim, Raghunathan, Aditi, Kumar, Aviral
Reasoning methods that adaptively allocate test-time compute have advanced LLM performance on easy to verify domains such as math and code. In this work, we study how to utilize this approach to train models that exhibit a degree of robustness to safety vulnerabilities, and show that doing so can provide benefits. We build a recipe called $\textit{TARS}$ (Training Adaptive Reasoners for Safety), a reinforcement learning (RL) approach that trains models to reason about safety using chain-of-thought traces and a reward signal that balances safety with task completion. To build TARS, we identify three critical design choices: (1) a ``lightweight'' warmstart SFT stage, (2) a mix of harmful, harmless, and ambiguous prompts to prevent shortcut behaviors such as too many refusals, and (3) a reward function to prevent degeneration of reasoning capabilities during training. Models trained with TARS exhibit adaptive behaviors by spending more compute on ambiguous queries, leading to better safety-refusal trade-offs. They also internally learn to better distinguish between safe and unsafe prompts and attain greater robustness to both white-box (e.g., GCG) and black-box attacks (e.g., PAIR). Overall, our work provides an effective, open recipe for training LLMs against jailbreaks and harmful requests by reasoning per prompt.
Towards Responsible AI: Advances in Safety, Fairness, and Accountability of Autonomous Systems
Ensuring responsible use of artificial intelligence (AI) has become imperative as autonomous systems increasingly influence critical societal domains. However, the concept of trustworthy AI remains broad and multi-faceted. This thesis advances knowledge in the safety, fairness, transparency, and accountability of AI systems. In safety, we extend classical deterministic shielding techniques to become resilient against delayed observations, enabling practical deployment in real-world conditions. We also implement both deterministic and probabilistic safety shields into simulated autonomous vehicles to prevent collisions with road users, validating the use of these techniques in realistic driving simulators. We introduce fairness shields, a novel post-processing approach to enforce group fairness in sequential decision-making settings over finite and periodic time horizons. By optimizing intervention costs while strictly ensuring fairness constraints, this method efficiently balances fairness with minimal interference. For transparency and accountability, we propose a formal framework for assessing intentional behaviour in probabilistic decision-making agents, introducing quantitative metrics of agency and intention quotient. We use these metrics to propose a retrospective analysis of intention, useful for determining responsibility when autonomous systems cause unintended harm. Finally, we unify these contributions through the ``reactive decision-making'' framework, providing a general formalization that consolidates previous approaches. Collectively, the advancements presented contribute practically to the realization of safer, fairer, and more accountable AI systems, laying the foundations for future research in trustworthy AI.
Multi-Stakeholder Alignment in LLM-Powered Collaborative AI Systems: A Multi-Agent Framework for Intelligent Tutoring
Uchoa, Alexandre P, Oliveira, Carlo E T, Motta, Claudia L R, Schneider, Daniel
The integration of Large Language Models into Intelligent Tutoring Systems presents significant challenges in aligning with diverse and often conflicting values from students, parents, teachers, and institutions. Existing architectures lack formal mechanis ms for negotiating these multi - stakeholder tensions, creating risks in accountability and bias. This paper introduces the Advisory Governance Layer (AGL), a non - intrusive, multi - agent framework designed to enable distributed stakeholder participation in AI governance. The AGL employs specialized agents representing stakeholder groups to evaluate pedagogical actions against their specific policies in a privacy - preserving manner, anticipating future advances in personal assistant technology that will enhance stakeholder value expression. Through a novel policy taxonomy and conflict - resolution protocols, the framework provides structured, auditable governance advice to the ITS without altering its core pedagogical decision - making. This work contributes a refere nce architecture and technical specifications for aligning educational AI with multi - stakeholder values, bridging the gap between high - level ethical principles and practical implementation.
TLCD: A Deep Transfer Learning Framework for Cross-Disciplinary Cognitive Diagnosis
Wang, Zhifeng, Su, Meixin, Yang, Yang, Zeng, Chunyan, Ye, Lizhi
Driven by the dual principles of smart education and artificial intelligence technology, the online education model has rapidly emerged as an important component of the education industry. Cognitive diagnostic technology can utilize students' learning data and feedback information in educational evaluation to accurately assess their ability level at the knowledge level. However, while massive amounts of information provide abundant data resources, they also bring about complexity in feature extraction and scarcity of disciplinary data. In cross-disciplinary fields, traditional cognitive diagnostic methods still face many challenges. Given the differences in knowledge systems, cognitive structures, and data characteristics between different disciplines, this paper conducts in-depth research on neural network cognitive diagnosis and knowledge association neural network cognitive diagnosis, and proposes an innovative cross-disciplinary cognitive diagnosis method (TLCD). This method combines deep learning techniques and transfer learning strategies to enhance the performance of the model in the target discipline by utilizing the common features of the main discipline. The experimental results show that the cross-disciplinary cognitive diagnosis model based on deep learning performs better than the basic model in cross-disciplinary cognitive diagnosis tasks, and can more accurately evaluate students' learning situation.
Knocking-Heads Attention
Zhou, Zhanchao, Chen, Xiaodong, Chen, Haoxing, Lan, Zhenzhong, Li, Jianguo
Multi-head attention (MHA) has become the cornerstone of modern large language models, enhancing representational capacity through parallel attention heads. However, increasing the number of heads inherently weakens individual head capacity, and existing attention mechanisms - whether standard MHA or its variants like grouped-query attention (GQA) and grouped-tied attention (GTA) - simply concatenate outputs from isolated heads without strong interaction. To address this limitation, we propose knocking-heads attention (KHA), which enables attention heads to "knock" on each other - facilitating cross-head feature-level interactions before the scaled dot-product attention. This is achieved by applying a shared, diagonally-initialized projection matrix across all heads. The diagonal initialization preserves head-specific specialization at the start of training while allowing the model to progressively learn integrated cross-head representations. KHA adds only minimal parameters and FLOPs and can be seamlessly integrated into MHA, GQA, GTA, and other attention variants. We validate KHA by training a 6.1B parameter MoE model (1.01B activated) on 1T high-quality tokens. Compared to baseline attention mechanisms, KHA brings superior and more stable training dynamics, achieving better performance across downstream tasks.
Sentinel: Dynamic Knowledge Distillation for Personalized Federated Intrusion Detection in Heterogeneous IoT Networks
Singh, Gurpreet, Sood, Keshav, Rajalakshmi, P., Xiang, Yong
Abstract--Federated learning (FL) offers a privacy-preserving paradigm for machine learning, but its application in intrusion detection systems (IDS) within IoT networks is challenged by severe class imbalance, non-IID data, and high communication overhead.These challenges severely degrade the performance of conventional FL methods in real-world network traffic classification. T o overcome these limitations, we propose Sentinel, a personalized federated IDS (pFed-IDS) framework that incorporates a dual-model architecture on each client, consisting of a personalized teacher and a lightweight shared student model. This design effectively balances deep local adaptation with efficient global model consensus while preserving client privacy by transmitting only the compact student model, thus reducing communication costs. Sentinel integrates three key mechanisms to ensure robust performance: bidirectional knowledge distillation with adaptive temperature scaling, multi-faceted feature alignment, and class-balanced loss functions. Furthermore, the server employs normalized gradient aggregation with equal client weighting to enhance fairness and mitigate client drift. Extensive experiments on the IoTID20 and 5GNIDD benchmark datasets demonstrate that Sentinel significantly outperforms state-of-the-art federated methods, establishing a new performance benchmark, especially under extreme data heterogeneity, while maintaining communication efficiency. HE rapid proliferation of billions of heterogeneous Internet of Things (IoT) devices has significantly expanded attack surfaces, presenting new challenges for network security. Insufficient security measures in many of these devices--such as inadequate authentication, weak encryption, and vulnerable communication protocols--facilitate a continuous influx of various cyberattacks, including novel (zero-day) threats, which pose significant risks to the availability, confidentiality, and integrity of data and systems. Traditional firewalls and encryption of the security system are not enough to prevent increasing cyber attacks [1].
LangLingual: A Personalised, Exercise-oriented English Language Learning Tool Leveraging Large Language Models
Gupta, Sammriddh, Singh, Sonit, Joshi, Aditya, Kim, Mira
Language educators strive to create a rich experience for learners, while they may be restricted in the extend of feedback and practice they can provide. We present the design and development of LangLingual, a conversational agent built using the LangChain framework and powered by Large Language Models. The system is specifically designed to provide real-time, grammar-focused feedback, generate context-aware language exercises and track learner proficiency over time. The paper discusses the architecture, implementation and evaluation of LangLingual in detail. The results indicate strong usability, positive learning outcomes and encouraging learner engagement.
Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)
Jiang, Liwei, Chai, Yuanjun, Li, Margaret, Liu, Mickel, Fok, Raymond, Dziri, Nouha, Tsvetkov, Yulia, Sap, Maarten, Albalak, Alon, Choi, Yejin
Language models (LMs) often struggle to generate diverse, human-like creative content, raising concerns about the long-term homogenization of human thought through repeated exposure to similar outputs. Yet scalable methods for evaluating LM output diversity remain limited, especially beyond narrow tasks such as random number or name generation, or beyond repeated sampling from a single model. We introduce Infinity-Chat, a large-scale dataset of 26K diverse, real-world, open-ended user queries that admit a wide range of plausible answers with no single ground truth. We introduce the first comprehensive taxonomy for characterizing the full spectrum of open-ended prompts posed to LMs, comprising 6 top-level categories (e.g., brainstorm & ideation) that further breaks down to 17 subcategories. Using Infinity-Chat, we present a large-scale study of mode collapse in LMs, revealing a pronounced Artificial Hivemind effect in open-ended generation of LMs, characterized by (1) intra-model repetition, where a single model consistently generates similar responses, and more so (2) inter-model homogeneity, where different models produce strikingly similar outputs. Infinity-Chat also includes 31,250 human annotations, across absolute ratings and pairwise preferences, with 25 independent human annotations per example. This enables studying collective and individual-specific human preferences in response to open-ended queries. Our findings show that LMs, reward models, and LM judges are less well calibrated to human ratings on model generations that elicit differing idiosyncratic annotator preferences, despite maintaining comparable overall quality. Overall, INFINITY-CHAT presents the first large-scale resource for systematically studying real-world open-ended queries to LMs, revealing critical insights to guide future research for mitigating long-term AI safety risks posed by the Artificial Hivemind.
Guardian: Decoupling Exploration from Safety in Reinforcement Learning
Cai, Kaitong, Zhang, Jusheng, Yang, Jing, Wang, Keze
Hybrid offline-online reinforcement learning (O2O RL) promises both sample efficiency and robust exploration, but suffers from instability due to distribution shift between offline and online data. We introduce RLPD-GX, a framework that decouples policy optimization from safety enforcement: a reward-seeking learner explores freely, while a projection-based guardian guarantees rule-consistent execution and safe value backups. This design preserves the exploratory value of online interactions without collapsing to conservative policies. To further stabilize training, we propose dynamic curricula that gradually extend temporal horizons and anneal offline-online data mixing. We prove convergence via a contraction property of the guarded Bellman operator, and empirically show state-of-the-art performance on Atari-100k, achieving a normalized mean score of 3.02 (+45% over prior hybrid methods) with stronger safety and stability. Beyond Atari, ablations demonstrate consistent gains across safety-critical and long-horizon tasks, underscoring the generality of our design. Extensive and comprehensive results highlight decoupled safety enforcement as a simple yet principled route to robust O2O RL, suggesting a broader paradigm for reconciling exploration and safety in reinforcement learning. Deep reinforcement learning (DRL) has demonstrated remarkable performance in complex decision-making tasks such as strategy games and robotic control (Zhang et al., 2025d; Mnih et al., 2015; Arulkumaran et al., 2017; Li, 2018; Dulac-Arnold et al., 2021; Zhang et al., 2025b). However, its mainstream paradigms, i.e., purely online learning and purely offline learning, are constrained by sample inefficiency and out-of-distribution (OOD) generalization challenges, respectively (Fujimoto et al., 2019; Kumar et al., 2020; Gu et al., 2024b; Zhang et al., 2025h).