decision path
Decision-Path Patterns as Tree Reliability Signals: Path-based Adaptive Weighting for Random Forest Classification
The global uniform aggregation of random forests leaves conditional bias along the decision boundary uncorrected. To correct this locally, we propose exploiting the structural pattern of each tree's decision path. At inference, a random forest reaches its prediction through the root-to-leaf path the sample traverses in each tree, so path-level reliability offers a finer granularity than tree-level weighting can access. We show that reliability varies meaningfully across path patterns in the boundary region identified by the forest itself, and that using this signal yields a statistically significant accuracy improvement over RF on 36 binary classification benchmarks (Wilcoxon p < 0.0001). We further devise a way to measure the sufficiency of residual information in the fitted RF's decision boundary, providing an estimate of the expected gain before the method is applied; on the qualifying group identified this way, the method delivers a mean +0.99 pp accuracy improvement with strict wins on every dataset (7/0/0). Class-recall regression -- the typical failure mode of RF correction methods -- is measured: zero minority-recall regressions and a single majority-recall regression at the 0.2 pp threshold, indicating that the correction operates in the direction of bias reduction rather than class trade-off. Our work suggests that the structural information of decision paths, previously overlooked in random forest research, can contribute to RF performance improvement.
RGMDT: Return-Gap-MinimizingDecisionTree ExtractioninNon-EuclideanMetricSpace
In this paper, we establish an upper bound on the return gap between the oracle expert policy and an optimal decision tree policy. This enables us to recast the DT extraction problem into a novel non-euclidean clustering problem over the local observation and action values space of each agent, with action values as cluster labels and the upper bound on the return gap as clustering loss.
Ensembling LLM-Induced Decision Trees for Explainable and Robust Error Detection
Wang, Mengqi, Wang, Jianwei, Liu, Qing, Xu, Xiwei, Xing, Zhenchang, Zhu, Liming, Zhang, Wenjie
Error detection (ED), which aims to identify incorrect or inconsistent cell values in tabular data, is important for ensuring data quality. Recent state-of-the-art ED methods leverage the pre-trained knowledge and semantic capability embedded in large language models (LLMs) to directly label whether a cell is erroneous. However, this LLM-as-a-labeler pipeline (1) relies on the black box, implicit decision process, thus failing to provide explainability for the detection results, and (2) is highly sensitive to prompts, yielding inconsistent outputs due to inherent model stochasticity, therefore lacking robustness. To address these limitations, we propose an LLM-as-an-inducer framework that adopts LLM to induce the decision tree for ED (termed TreeED) and further ensembles multiple such trees for consensus detection (termed ForestED), thereby improving explainability and robustness. Specifically, based on prompts derived from data context, decision tree specifications and output requirements, TreeED queries the LLM to induce the decision tree skeleton, whose root-to-leaf decision paths specify the stepwise procedure for evaluating a given sample. Each tree contains three types of nodes: (1) rule nodes that perform simple validation checks (e.g., format or range), (2) Graph Neural Network (GNN) nodes that capture complex patterns (e.g., functional dependencies), and (3) leaf nodes that output the final decision types (error or clean). Furthermore, ForestED employs uncertainty-based sampling to obtain multiple row subsets, constructing a decision tree for each subset using TreeED. It then leverages an Expectation-Maximization-based algorithm that jointly estimates tree reliability and optimizes the consensus ED prediction. Extensive xperiments demonstrate that our methods are accurate, explainable and robust, achieving an average F1-score improvement of 16.1% over the best baseline.
Automatically Finding Rule-Based Neurons in OthelloGPT
Singh, Aditya, Wen, Zihang, Medicherla, Srujananjali, Karvonen, Adam, Rager, Can
OthelloGPT, a transformer trained to predict valid moves in Othello, provides an ideal testbed for interpretability research. The model is complex enough to exhibit rich computational patterns, yet grounded in rule-based game logic that enables meaningful reverse-engineering. We present an automated approach based on decision trees to identify and interpret MLP neurons that encode rule-based game logic. Our method trains regression decision trees to map board states to neuron activations, then extracts decision paths where neurons are highly active to convert them into human-readable logical forms. These descriptions reveal highly interpretable patterns; for instance, neurons that specifically detect when diagonal moves become legal. Our findings suggest that roughly half of the neurons in layer 5 can be accurately described by compact, rule-based decision trees ($R^2 > 0.7$ for 913 of 2,048 neurons), while the remainder likely participate in more distributed or non-rule-based computations. We verify the causal relevance of patterns identified by our decision trees through targeted interventions. For a specific square, for specific game patterns, we ablate neurons corresponding to those patterns and find an approximately 5-10 fold stronger degradation in the model's ability to predict legal moves along those patterns compared to control patterns. To facilitate future work, we provide a Python tool that maps rule-based game behaviors to their implementing neurons, serving as a resource for researchers to test whether their interpretability methods recover meaningful computational structures.
Model-Agnostic Policy Explanations with Large Language Models
Xi-Jia, Zhang, Guo, Yue, Chen, Shufei, Stepputtis, Simon, Gombolay, Matthew, Sycara, Katia, Campbell, Joseph
Intelligent agents, such as robots, are increasingly deployed in real-world, human-centric environments. To foster appropriate human trust and meet legal and ethical standards, these agents must be able to explain their behavior. However, state-of-the-art agents are typically driven by black-box models like deep neural networks, limiting their interpretability. We propose a method for generating natural language explanations of agent behavior based only on observed states and actions -- without access to the agent's underlying model. Our approach learns a locally interpretable surrogate model of the agent's behavior from observations, which then guides a large language model to generate plausible explanations with minimal hallucination. Empirical results show that our method produces explanations that are more comprehensible and correct than those from baselines, as judged by both language models and human evaluators. Furthermore, we find that participants in a user study more accurately predicted the agent's future actions when given our explanations, suggesting improved understanding of agent behavior.