Goto

Collaborating Authors

 decision rule


Feature Learning for Interpretable, Performant Decision Trees

Neural Information Processing Systems

Points were sampled uniformly in the bands denoted by dashed lines. We posit that these barriers are due, at least in part, to the sensitivity of decision trees to transformations of the input resulting from greedy construction and simple decision rules. Of these, key limitation is the latter; even if we replace greedy construction with a perfect tree learner, simple distributions can nonetheless require an arbitrarily large axis-aligned tree to fit.


Dimension-reduced outcome-weighted learning for estimating individualized treatment regimes in observational studies

Son, Sungtaek, Lila, Eardi, Chan, Kwun Chuen Gary

arXiv.org Machine Learning

Individualized treatment regimes (ITRs) aim to improve clinical outcomes by assigning treatment based on patient-specific characteristics. However, existing methods often struggle with high-dimensional covariates, limiting accuracy, interpretability, and real-world applicability. We propose a novel sufficient dimension reduction approach that directly targets the contrast between potential outcomes and identifies a low-dimensional subspace of the covariates capturing treatment effect heterogeneity. This reduced representation enables more accurate estimation of optimal ITRs through outcome-weighted learning. To accommodate observational data, our method incorporates kernel-based covariate balancing, allowing treatment assignment to depend on the full covariate set and avoiding the restrictive assumption that the subspace sufficient for modeling heterogeneous treatment effects is also sufficient for confounding adjustment. We show that the proposed method achieves universal consistency, i.e., its risk converges to the Bayes risk, under mild regularity conditions. We demonstrate its finite sample performance through simulations and an analysis of intensive care unit sepsis patient data to determine who should receive transthoracic echocardiography.


On the Statistical Consistency of Risk-Sensitive Bayesian Decision-Making

Neural Information Processing Systems

We study data-driven decision-making problems in the Bayesian framework, where the expectation in the Bayes risk is replaced by a risk-sensitive entropic risk measure with respect to the posterior distribution. We focus on problems where calculating the posterior distribution is intractable, a typical situation in modern applications with large datasets and complex data generating models. We leverage a dual representation of the entropic risk measure to introduce a novel risk-sensitive variational Bayesian (RSVB) framework for jointly computing a risk-sensitive posterior approximation and the corresponding decision rule.


R-learning in actor-critic model offers a biologically relevant mechanism for sequential decision-making

Neural Information Processing Systems

In real-world settings, we repeatedly decide whether to pursue better conditions or to keep things unchanged. Examples include time investment, employment, entertainment preferences etc. How do we make such decisions? To address this question, the field of behavioral ecology has developed foraging paradigms - the model settings in which human and non-human subjects decided when to leave depleting food resources. Foraging theory, represented by the marginal value theorem (MVT), provided accurate average-case stay-or-leave rules consistent with behaviors of subjects towards depleting resources. Yet, the algorithms underlying individual choices and ways to learn such algorithms remained unclear.


RISE: Robust Individualized Decision Learning with Sensitive Variables

Neural Information Processing Systems

This paper introduces RISE, a robust individualized decision learning framework with sensitive variables, where sensitive variables are collectible data and important to the intervention decision, but their inclusion in decision making is prohibited due to reasons such as delayed availability or fairness concerns. A naive baseline is to ignore these sensitive variables in learning decision rules, leading to significant uncertainty and bias. To address this, we propose a decision learning framework to incorporate sensitive variables during offline training but not include them in the input of the learned decision rule during model deployment. Specifically, from a causal perspective, the proposed framework intends to improve the worst-case outcomes of individuals caused by sensitive variables that are unavailable at the time of decision. Unlike most existing literature that uses mean-optimal objectives, we propose a robust learning framework by finding a newly defined quantile-or infimum-optimal decision rule. The reliable performance of the proposed method is demonstrated through synthetic experiments and three real-world applications.


Deep Jump Learning for Off-Policy Evaluation in Continuous Treatment Settings

Neural Information Processing Systems

We consider off-policy evaluation (OPE) in continuous treatment settings, such as personalized dose-finding. In OPE, one aims to estimate the mean outcome under a new treatment decision rule using historical data generated by a different decision rule. Most existing works on OPE focus on discrete treatment settings. To handle continuous treatments, we develop a novel estimation method for OPE using deep jump learning. The key ingredient of our method lies in adaptively discretizing the treatment space using deep discretization, by leveraging deep learning and multi-scale change point detection. This allows us to apply existing OPE methods in discrete treatments to handle continuous treatments. Our method is further justified by theoretical results, simulations, and a real application to Warfarin Dosing.


On Dynamic Programming Theory for Leader-Follower Stochastic Games

Dibangoye, Jilles Steeve, Marre, Thibaut Le, Sankur, Ocan, Schwarzentruber, François

arXiv.org Artificial Intelligence

Leader-follower general-sum stochastic games (LF-GSSGs) model sequential decision-making under asymmetric commitment, where a leader commits to a policy and a follower best responds, yielding a strong Stackelberg equilibrium (SSE) with leader-favourable tie-breaking. This paper introduces a dynamic programming (DP) framework that applies Bellman recursion over credible sets-state abstractions formally representing all rational follower best responses under partial leader commitments-to compute SSEs. We first prove that any LF-GSSG admits a lossless reduction to a Markov decision process (MDP) over credible sets. We further establish that synthesising an optimal memoryless deterministic leader policy is NP-hard, motivating the development of ε-optimal DP algorithms with provable guarantees on leader exploitability. Experiments on standard mixed-motive benchmarks-including security games, resource allocation, and adversarial planning-demonstrate empirical gains in leader value and runtime scalability over state-of-the-art methods.


Multiclass threshold-based classification and model evaluation

Legnaro, Edoardo, Guastavino, Sabrina, Marchetti, Francesco

arXiv.org Artificial Intelligence

In this paper, we introduce a threshold-based framework for multiclass classification that generalizes the standard argmax rule. This is done by replacing the probabilistic interpretation of softmax outputs with a geometric one on the multidimensional simplex, where the classification depends on a multidimensional threshold. This change of perspective enables for any trained classification network an \textit{a posteriori} optimization of the classification score by means of threshold tuning, as usually carried out in the binary setting, thus allowing for a further refinement of the prediction capability of any network. Our experiments show indeed that multidimensional threshold tuning yields performance improvements across various networks and datasets. Moreover, we derive a multiclass ROC analysis based on \emph{ROC clouds} -- the attainable (FPR,TPR) operating points induced by a single multiclass threshold -- and summarize them via a \emph{Distance From Point} (DFP) score to $(0,1)$. This yields a coherent alternative to standard One-vs-Rest (OvR) curves and aligns with the observed tuning gains.


When +1% Is Not Enough: A Paired Bootstrap Protocol for Evaluating Small Improvements

Du, Wenzhang

arXiv.org Machine Learning

Recent machine learning papers often report 1-2 percentage point improvements from a single run on a benchmark. These gains are highly sensitive to random seeds, data ordering, and implementation details, yet are rarely accompanied by uncertainty estimates or significance tests. It is therefore unclear when a reported +1-2% reflects a real algorithmic advance versus noise. We revisit this problem under realistic compute budgets, where only a few runs are affordable. We propose a simple, PC-friendly evaluation protocol based on paired multi-seed runs, bias-corrected and accelerated (BCa) bootstrap confidence intervals, and a sign-flip permutation test on per-seed deltas. The protocol is intentionally conservative and is meant as a guardrail against over-claiming. We instantiate it on CIFAR-10, CIFAR-10N, and AG News using synthetic no-improvement, small-gain, and medium-gain scenarios. Single runs and unpaired t-tests often suggest significant gains for 0.6-2.0 point improvements, especially on text. With only three seeds, our paired protocol never declares significance in these settings. We argue that such conservative evaluation is a safer default for small gains under tight budgets.


Enforcing Hard Linear Constraints in Deep Learning Models with Decision Rules

Constante-Flores, Gonzalo E., Chen, Hao, Li, Can

arXiv.org Artificial Intelligence

Deep learning models are increasingly deployed in safety-critical tasks where predictions must satisfy hard constraints, such as physical laws, fairness requirements, or safety limits. However, standard architectures lack built-in mechanisms to enforce such constraints, and existing approaches based on regularization or projection are often limited to simple constraints, computationally expensive, or lack feasibility guarantees. This paper proposes a model-agnostic framework for enforcing input-dependent linear equality and inequality constraints on neural network outputs. The architecture combines a task network trained for prediction accuracy with a safe network trained using decision rules from the stochastic and robust optimization literature to ensure feasibility across the entire input space. The final prediction is a convex combination of the two subnetworks, guaranteeing constraint satisfaction during both training and inference without iterative procedures or runtime optimization. We prove that the architecture is a universal approximator of constrained functions and derive computationally tractable formulations based on linear decision rules. Empirical results on benchmark regression tasks show that our method consistently satisfies constraints while maintaining competitive accuracy and low inference latency.