Law
SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data Appendix Alicia Curth
This appendix is organized as follows: We first present an extended overview of the standard treatment effect estimation setup and discuss differences with the time-to-event setting (Appendix A). Appendix G contains the NeurIPS checklist. In the standard treatment effect estimation setup with binary or continuous outcomes (see e.g. No hidden confounders (1.a), Consistency (1.c) and Positivity/Overlap in treatment assignment (2.a) . Under the assumption of random censoring (which is discussed further in Appendix C.1), the ( t 1) Thus, the classification approach with log-loss is equivalent to optimizing for the likelihood of the hazard.
Identification and Estimation of Joint Probabilities of Potential Outcomes in Observational Studies with Covariate Information
"sufficiency", and "necessity and sufficiency", which are important concepts In practical science, it is crucial to evaluate the likelihood of one event causing another event. For example, epidemiologists pay attention to determining the likelihood of a particular exposure being the cause of a particular disease.
Informative Post-Hoc Explanations Only Exist for Simple Functions
Gรผnther, Eric, Szabados, Balรกzs, Bhattacharjee, Robi, Bordt, Sebastian, von Luxburg, Ulrike
Many researchers have suggested that local post-hoc explanation algorithms can be used to gain insights into the behavior of complex machine learning models. However, theoretical guarantees about such algorithms only exist for simple decision functions, and it is unclear whether and under which assumptions similar results might exist for complex models. In this paper, we introduce a general, learning-theory-based framework for what it means for an explanation to provide information about a decision function. We call an explanation informative if it serves to reduce the complexity of the space of plausible decision functions. With this approach, we show that many popular explanation algorithms are not informative when applied to complex decision functions, providing a rigorous mathematical rejection of the idea that it should be possible to explain any model. We then derive conditions under which different explanation algorithms become informative. These are often stronger than what one might expect. For example, gradient explanations and counterfactual explanations are non-informative with respect to the space of differentiable functions, and SHAP and anchor explanations are not informative with respect to the space of decision trees. Based on these results, we discuss how explanation algorithms can be modified to become informative. While the proposed analysis of explanation algorithms is mathematical, we argue that it holds strong implications for the practical applicability of these algorithms, particularly for auditing, regulation, and high-risk applications of AI.
A Global Dataset of Location Data Integrity-Assessed Reforestation Efforts
John, Angela, Allotey, Selvyn, Koebe, Till, Tyukavina, Alexandra, Weber, Ingmar
Afforestation and reforestation are popular strategies for mitigating climate change by enhancing carbon sequestration. However, the effectiveness of these efforts is often self-reported by project developers, or certified through processes with limited external validation. This leads to concerns about data reliability and project integrity. In response to increasing scrutiny of voluntary carbon markets, this study presents a dataset on global afforestation and reforestation efforts compiled from primary (meta-)information and augmented with time-series satellite imagery and other secondary data. Our dataset covers 1,289,068 planting sites from 45,628 projects spanning 33 years. Since any remote sensing-based validation effort relies on the integrity of a planting site's geographic boundary, this dataset introduces a standardized assessment of the provided site-level location information, which we summarize in one easy-to-communicate key indicator: LDIS -- the Location Data Integrity Score. We find that approximately 79\% of the georeferenced planting sites monitored fail on at least 1 out of 10 LDIS indicators, while 15\% of the monitored projects lack machine-readable georeferenced data in the first place. In addition to enhancing accountability in the voluntary carbon market, the presented dataset also holds value as training data for e.g. computer vision-related tasks with millions of linked Sentinel-2 and Planetscope satellite images.
Group Fairness Meets the Black Box: Enabling Fair Algorithms on Closed LLMs via Post-Processing
Xian, Ruicheng, Wan, Yuxuan, Zhao, Han
Instruction fine-tuned large language models (LLMs) enable a simple zero-shot or few-shot prompting paradigm, also known as in-context learning, for building prediction models. This convenience, combined with continued advances in LLM capability, has the potential to drive their adoption across a broad range of domains, including high-stakes applications where group fairness -- preventing disparate impacts across demographic groups -- is essential. The majority of existing approaches to enforcing group fairness on LLM-based classifiers rely on traditional fair algorithms applied via model fine-tuning or head-tuning on final-layer embeddings, but they are no longer applicable to closed-weight LLMs under the in-context learning setting, which include some of the most capable commercial models today, such as GPT-4, Gemini, and Claude. In this paper, we propose a framework for deriving fair classifiers from closed-weight LLMs via prompting: the LLM is treated as a feature extractor, and features are elicited from its probabilistic predictions (e.g., token log probabilities) using prompts strategically designed for the specified fairness criterion to obtain sufficient statistics for fair classification; a fair algorithm is then applied to these features to train a lightweight fair classifier in a post-hoc manner. Experiments on five datasets, including three tabular ones, demonstrate strong accuracy-fairness tradeoffs for the classifiers derived by our framework from both open-weight and closed-weight LLMs; in particular, our framework is data-efficient and outperforms fair classifiers trained on LLM embeddings (i.e., head-tuning) or from scratch on raw tabular features.
On Strong and Weak Admissibility in Non-Flat Assumption-Based Argumentation
Berthold, Matti, Blรผmel, Lydia, Rapberger, Anna
In this work, we broaden the investigation of admissibility notions in the context of assumption-based argumentation (ABA). More specifically, we study two prominent alternatives to the standard notion of admissibility from abstract argumentation, namely strong and weak admissibility, and introduce the respective preferred, complete and grounded semantics for general (sometimes called non-flat) ABA. To do so, we use abstract bipolar set-based argumentation frameworks (BSAFs) as formal playground since they concisely capture the relations between assumptions and are expressive enough to represent general non-flat ABA frameworks, as recently shown. While weak admissibility has been recently investigated for a restricted fragment of ABA in which assumptions cannot be derived (flat ABA), strong admissibility has not been investigated for ABA so far. We introduce strong admissibility for ABA and investigate desirable properties. We furthermore extend the recent investigations of weak admissibility in the flat ABA fragment to the non-flat case. We show that the central modularization property is maintained under classical, strong, and weak admissibility. We also show that strong and weakly admissible semantics in non-flat ABA share some of the shortcomings of standard admissible semantics and discuss ways to address these.
A Feasibility Experiment on the Application of Predictive Coding to Instant Messaging Corpora
Schoinas, Thanasis, Qadir, Ghulam
Predictive coding, the term used in the legal industry for document classification using machine learning, presents additional challenges when the dataset comprises instant messages, due to their informal nature and smaller sizes. In this paper, we exploit a data management workflow to group messages into day chats, followed by feature selection and a logistic regression classifier to provide an economically feasible predictive coding solution. We also improve the solution's baseline model performance by dimensionality reduction, with focus on quantitative features. We test our methodology on an Instant Bloomberg dataset, rich in quantitative information. In parallel, we provide an example of the cost savings of our approach.
From Individual to Multi-Agent Algorithmic Recourse: Minimizing the Welfare Gap via Capacitated Bipartite Matching
Khotanlou, Zahra, Larson, Kate, Karimi, Amir-Hossein
Decision makers are increasingly relying on machine learning in sensitive situations. In such settings, algorithmic recourse aims to provide individuals with actionable and minimally costly steps to reverse unfavorable AI-driven decisions. While existing research predominantly focuses on single-individual (i.e., seeker) and single-model (i.e., provider) scenarios, real-world applications often involve multiple interacting stakeholders. Optimizing outcomes for seekers under an individual welfare approach overlooks the inherently multi-agent nature of real-world systems, where individuals interact and compete for limited resources. To address this, we introduce a novel framework for multi-agent algorithmic recourse that accounts for multiple recourse seekers and recourse providers. We model this many-to-many interaction as a capacitated weighted bipartite matching problem, where matches are guided by both recourse cost and provider capacity. Edge weights, reflecting recourse costs, are optimized for social welfare while quantifying the welfare gap between individual welfare and this collectively feasible outcome. We propose a three-layer optimization framework: (1) basic capacitated matching, (2) optimal capacity redistribution to minimize the welfare gap, and (3) cost-aware optimization balancing welfare maximization with capacity adjustment costs. Experimental validation on synthetic and real-world datasets demonstrates that our framework enables the many-to-many algorithmic recourse to achieve near-optimal welfare with minimum modification in system settings. This work extends algorithmic recourse from individual recommendations to system-level design, providing a tractable path toward higher social welfare while maintaining individual actionability.