Rule-Based Reasoning
From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning
Huang, Yuzhen, Zeng, Weihao, Zeng, Xingshan, Zhu, Qi, He, Junxian
Trustworthy verifiers are essential for the success of reinforcement learning with verifiable reward (RLVR), which is the core methodology behind various large reasoning models such as DeepSeek-R1. In complex domains like mathematical reasoning, rule-based verifiers have been widely adopted in previous works to train strong reasoning models. However, the reliability of these verifiers and their impact on the RL training process remain poorly understood. In this work, we take mathematical reasoning as a case study and conduct a comprehensive analysis of various verifiers in both static evaluation and RL training scenarios. First, we find that current open-source rule-based verifiers often fail to recognize equivalent answers presented in different formats across multiple commonly used mathematical datasets, resulting in non-negligible false negative rates. This limitation adversely affects RL training performance and becomes more pronounced as the policy model gets stronger. Subsequently, we investigate model-based verifiers as a potential solution to address these limitations. While the static evaluation shows that model-based verifiers achieve significantly higher verification accuracy, further analysis and RL results imply that they are highly susceptible to hacking, where they misclassify certain patterns in responses as correct, particularly after fine-tuning. This vulnerability is exploited during policy model optimization, leading to artificially inflated rewards. Our findings underscore the unique challenges inherent to both rule-based and model-based verifiers and provide insights toward developing more accurate and robust reward systems for reinforcement learning.
DYMO-Hair: Generalizable Volumetric Dynamics Modeling for Robot Hair Manipulation
Zhao, Chengyang, Yoo, Uksang, Chaudhury, Arkadeep Narayan, Nam, Giljoo, Francis, Jonathan, Ichnowski, Jeffrey, Oh, Jean
Abstract-- Hair care is an essential daily activity, yet it remains inaccessible to individuals with limited mobility and challenging for autonomous robot systems due to the fine-grained physical structure and complex dynamics of hair . We introduce a novel dynamics learning paradigm that is suited for volumetric quantities such as hair, relying on an action-conditioned latent state editing mechanism, coupled with a compact 3D latent space of diverse hairstyles to improve generalizability. This latent space is pre-trained at scale using a novel hair physics simulator, enabling generalization across previously unseen hairstyles. Experiments in simulation demonstrate that DYMO-Hair's dynamics model outperforms baselines on capturing local deformation for diverse, unseen hairstyles. DYMO-Hair further outperforms baselines in closed-loop hair styling tasks on unseen hairstyles, with an average of 22% lower final geometric error and 42% higher success rate than the state-of-the-art system. Real-world experiments exhibit zero-shot transferability of our system to wigs, achieving consistent success on challenging unseen hairstyles where the state-of-the-art system fails. T ogether, these results introduce a foundation for model-based robot hair care, advancing toward more generalizable, flexible, and accessible robot hair styling in unconstrained physical environments. Hair is central to personal identity and self-esteem [1], [2], yet routine care is difficult for individuals with limited mobility due to reduced coordination, strength, and flexibility [3]. To improve accessibility and autonomy, robot hair care systems have been explored [4]-[7], but existing approaches rely on either handcrafted trajectories or rule-based controllers, restricting generalization across diverse hairstyles and goals. To address these limitations, we propose DYMO-Hair, a model-based robot hair care system. Our system is capable of generalizable and flexible visual goal-conditioned hair manipulation, across diverse hairstyles and objectives in unconstrained physical environments. Chengyang Zhao, Uksang Y oo, Jonathan Francis (by courtesy), Jeffrey Ichnowski, and Jean Oh are with Robotics Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA. Arkadeep Narayan Chaudhury is with Epic Games, Inc., Pittsburgh, Pennsylvania, USA. Giljoo Nam is with Meta Codec Avatars Lab, Pittsburgh, Pennsylvania, USA. Jonathan Francis is with Bosch Center for Artificial Intelligence, Pittsburgh, Pennsylvania, USA. Figure 1. We introduce DYMO-Hair, a unified, model-based robot hair care system.
NASP-T: A Fuzzy Neuro-Symbolic Transformer for Logic-Constrained Aviation Safety Report Classification
Machot, Fadi Al, Machot, Fidaa Al
Deep transformer models excel at multi-label text classification but often violate domain logic that experts consider essential, an issue of particular concern in safety-critical applications. We propose a hybrid neuro-symbolic framework that integrates Answer Set Programming (ASP) with transformer-based learning on the Aviation Safety Reporting System (ASRS) corpus. Domain knowledge is formalized as weighted ASP rules and validated using the Clingo solver. These rules are incorporated in two complementary ways: (i) as rule-based data augmentation, generating logically consistent synthetic samples that improve label diversity and coverage; and (ii) as a fuzzy-logic regularizer, enforcing rule satisfaction in a differentiable form during fine-tuning. This design preserves the interpretability of symbolic reasoning while leveraging the scalability of deep neural architectures. We further tune per-class thresholds and report both standard classification metrics and logic-consistency rates. Compared to a strong Binary Cross-Entropy (BCE) baseline, our approach improves micro- and macro-F1 scores and achieves up to an 86% reduction in rule violations on the ASRS test set. To the best of our knowledge, this constitutes the first large-scale neuro-symbolic application to ASRS reports that unifies ASP-based reasoning, rule-driven augmentation, and differentiable transformer training for trustworthy, safety-critical NLP.
Making Logic a First-Class Citizen in Network Data Generation with ML
Hรจ, Hongyu, Jin, Minhao, Apostolaki, Maria
Generative ML models are increasingly popular in networking for tasks such as telemetry imputation, prediction, and synthetic trace generation. Despite their capabilities, they suffer from two shortcomings: (i) their output is often visibly violating well-known networking rules, which undermines their trustworthiness; and (ii) they are difficult to control, frequently requiring retraining even for minor changes. To address these limitations and unlock the benefits of generative models for networking, we propose a new paradigm for integrating explicit network knowledge in the form of first-order logic rules into ML models used for networking tasks. Rules capture well-known relationships among used signals, e.g., that increased latency precedes packet loss. While the idea is conceptually straightforward, its realization is challenging: networking knowledge is rarely formalized into rules, and naively injecting them into ML models often hampers ML's effectiveness. This paper introduces NetNomos a multi-stage framework that (1) learns rules directly from data (e.g., measurements); (2) filters them to distinguish semantically meaningful ones; and (3) enforces them through a collaborative generation between an ML model and an SMT solver.
Critical appraisal of artificial intelligence for rare-event recognition: principles and pharmacovigilance case studies
Noren, G. Niklas, Meldau, Eva-Lisa, Ellenius, Johan
Many high-stakes AI applications target low-prevalence events, where apparent accuracy can conceal limited real-world value. Relevant AI models range from expert-defined rules and traditional machine learning to generative LLMs constrained for classification. We outline key considerations for critical appraisal of AI in rare-event recognition, including problem framing and test set design, prevalence-aware statistical evaluation, robustness assessment, and integration into human workflows. In addition, we propose an approach to structured case-level examination (SCLE), to complement statistical performance evaluation, and a comprehensive checklist to guide procurement or development of AI models for rare-event recognition. We instantiate the framework in pharmacovigilance, drawing on three studies: rule-based retrieval of pregnancy-related reports; duplicate detection combining machine learning with probabilistic record linkage; and automated redaction of person names using an LLM. We highlight pitfalls specific to the rare-event setting including optimism from unrealistic class balance and lack of difficult positive controls in test sets - and show how cost-sensitive targets align model performance with operational value. While grounded in pharmacovigilance practice, the principles generalize to domains where positives are scarce and error costs may be asymmetric.
From Facts to Foils: Designing and Evaluating Counterfactual Explanations for Smart Environments
Trapp, Anna, Sadeghi, Mersedeh, Vogelsang, Andreas
Abstract--Explainability is increasingly seen as an essential feature of rule-based smart environments. While counterfactual explanations, which describe what could have been done differently to achieve a desired outcome, are a powerful tool in eXplainable AI (XAI), no established methods exist for generating them in these rule-based domains. In this paper, we present the first formalization and implementation of counterfactual explanations tailored to this domain. It is implemented as a plugin that extends an existing explanation engine for smart environments. We conducted a user study (N=17) to evaluate our generated counterfactuals against traditional causal explanations. The results show that user preference is highly contextual: causal explanations are favored for their linguistic simplicity and in time-pressured situations, while counterfactuals are preferred for their actionable content, particularly when a user wants to resolve a problem. Our work contributes a practical framework for a new type of explanation in smart environments and provides empirical evidence to guide the choice of when each explanation type is most effective. Smart environments, such as smart homes, offices, and buildings, integrate sensor-enabled devices to support users in decision-making, monitoring, and managing abnormal situations [1], [2]. The rapid adoption of these environments is fueled by advances in the Internet of Things (IoT) and Artificial Intelligence (AI), decreasing device costs, and improved system integration [3]-[5]. Rule-based systems are a prevalent approach for implementing automation in smart environments, by executing predefined rules when certain conditions are met [6], [7].
GARG-AML against Smurfing: A Scalable and Interpretable Graph-Based Framework for Anti-Money Laundering
Deprez, Bruno, Baesens, Bart, Verdonck, Tim, Verbeke, Wouter
Purpose: This paper introduces a novel graph-based method, GARG-AML, for efficient and effective anti-money laundering (AML). It quantifies smurfing risk, a popular money laundering method, by providing each node in the network with a single interpretable score. The proposed method strikes a balance among computational efficiency, detection power and transparency. Different versions of GARG-AML are introduced for undirected and directed networks. Methodology: GARG-AML constructs the adjacency matrix of a node's second-order neighbourhood in a specific way. This allows us to use the density of different blocks in the adjacency matrix to express the neighbourhood's resemblance to a pure smurfing pattern. GARG-AML is extended using a decision tree and gradient-boosting classifier to increase its performance even more. The methods are tested on synthetic and on open-source data against the current state-of-the-art in AML. Findings: We find that GARG-AML obtains state-of-the-art performance on all datasets. We illustrate that GARG-AML scales well to massive transactions graphs encountered at financial institutions. By leveraging only the adjacency matrix of the second-order neighbourhood and basic network features, this work highlights the potential of fundamental network properties towards advancing fraud detection. Originality: This paper uses only basic network features and expert knowledge on smurfing to construct a performant AML system. The originality lies in the translation of smurfing detection to these features and network representation. Our proposed method is built around the real business needs of scalability and interpretability. It therefore provides a solution that can be easily implemented at financial institutions or incorporated in existing AML solutions.