Goto

Collaborating Authors

 admission


NSDAP archive: How DER SPIEGEL processed the data from the Nazi card file

Der Spiegel International

Bereich How DER SPIEGEL processed the data from the NSDAP membership card file aufklappen The NSDAP membership card file was recently made available by the US National Archives on its website in digitized form. DER SPIEGEL downloaded all of these documents and extracted the content with the help of artificial intelligence. To minimize errors when reading the old files, the dataset was first classified with the help of machine learning and pre-sorted into groups. The handwriting on the index cards is in some cases difficult to read, on some the text has faded, many are written in old German script (Sütterlin). Other cards, meanwhile, were filled out with a typewriter.


Causal Fairness for Survival Analysis

arXiv.org Machine Learning

In the data-driven era, large-scale datasets are routinely collected and analyzed using machine learning (ML) and artificial intelligence (AI) to inform decisions in high-stakes domains such as healthcare, employment, and criminal justice, raising concerns about the fairness behavior of these systems. Existing works in fair ML cover tasks such as bias detection, fair prediction, and fair decision-making, but largely focus on static settings. At the same time, fairness in temporal contexts, particularly survival/time-to-event (TTE) analysis, remains relatively underexplored, with current approaches to fair survival analysis adopting statistical fairness definitions, which, even with unlimited data, cannot disentangle the causal mechanisms that generate disparities. To address this gap, we develop a causal framework for fairness in TTE analysis, enabling the decomposition of disparities in survival into contributions from direct, indirect, and spurious pathways. This provides a human-understandable explanation of why disparities arise and how they evolve over time. Our non-parametric approach proceeds in four steps: (1) formalizing the necessary assumptions about censoring and lack of confounding using a graphical model; (2) recovering the conditional survival function given covariates; (3) applying the Causal Reduction Theorem to reframe the problem in a form amenable to causal pathway decomposition; (4) estimating the effects efficiently. Finally, our approach is used to analyze the temporal evolution of racial disparities in outcome after admission to an intensive care unit (ICU).


Supplementary Material Responsibility Statement

Neural Information Processing Systems

Hyponatremia: Predict whether a hyponatremia lab comes back as normal (>=135 mmol/L), mild (>=130 and <135 mmol/L), moderate (>=125 and <130 mmol/L), or severe (<125 mmol/L). We consider all lab results coded as LOINC/LG11363-5, LOINC/2951-2, or LOINC/2947-0. Anemia: Predict whether an anemia lab comes back as normal (>=120 g/L), mild (>=110 and <120 g/L), moderate (>=70 and <110 g/L), or severe (<70 g/L). We consider all lab results coded as LOINC/LP392452-1. Please note that for the results of our baseline experiments in Section 5, we reframe these lab value tasks as binary classification tasks, where a label is "negative" if the result is normal and "positive" otherwise.


Optimization Algorithms

Neural Information Processing Systems

A.1 Proof of Monotonicity and Submodularity In Equation (3a), we stated the objective of the knapsack cover to be Remark 1. f+M is monotonically increasing. A.2 Knapsack Cover To find a solution to problem 3, we use the greedy algorithm proposed by Badanidiyuru and Vondrák [2], which deals with submodular maximization subject to a system of lknapsack constraints and with pmatroid constraints. We present an adapted version of the algorithm in Algorithm 2 where l = 1. Theparameter allows us to 16 trade-off solution time and solution quality. In this work, we set = 0.2.


Smoothing the Landscape: Causal Structure Learning via Diffusion Denoising Objectives

arXiv.org Machine Learning

Understanding causal dependencies in observational data is critical for informing decision-making. These relationships are often modeled as Bayesian Networks (BNs) and Directed Acyclic Graphs (DAGs). Existing methods, such as NOTEARS and DAG-GNN, often face issues with scalability and stability in high-dimensional data, especially when there is a feature-sample imbalance. Here, we show that the denoising score matching objective of diffusion models could smooth the gradients for faster, more stable convergence. We also propose an adaptive k-hop acyclicity constraint that improves runtime over existing solutions that require matrix inversion. We name this framework Denoising Diffusion Causal Discovery (DDCD). Unlike generative diffusion models, DDCD utilizes the reverse denoising process to infer a parameterized causal structure rather than to generate data. We demonstrate the competitive performance of DDCDs on synthetic benchmarking data. We also show that our methods are practically useful by conducting qualitative analyses on two real-world examples. Code is available at this url: https://github.com/haozhu233/ddcd.