Wiesbaden
Regularity of Solutions to Beckmann's Parametric Optimal Transport
Gottschalk, Hanno, Riedlinger, Tobias J.
Beckmann's problem in optimal transport minimizes the total squared flux in a continuous transport problem from a source to a target distribution. In this article, the regularity theory for solutions to Beckmann's problem in optimal transport is developed utilizing an unconstrained Lagrangian formulation and solving the variational first order optimality conditions. It turns out that the Lagrangian multiplier that enforces Beckmann's divergence constraint fulfills a Poisson equation and the flux vector field is obtained as the potential's gradient. Utilizing Schauder estimates from elliptic regularity theory, the exact Hรถlder regularity of the potential, the flux and the flow generating is derived on the basis of Hรถlder regularity of source and target densities on a bounded, regular domain. If the target distribution depends on parameters, as is the case in conditional (``promptable'') generative learning, we provide sufficient conditions for separate and joint Hรถlder continuity of the resulting vector field in the parameter and the data dimension. Following a recent result by Belomnestny et al., one can thus approximate such vector fields with deep ReQu neural networks in C^(k,alpha)-Hรถlder norm. We also show that this approach generalizes to other probability paths, like Fisher-Rao gradient flows.
Performative Learning Theory
Rodemann, Julian, Fischer-Abaigar, Unai, Bailie, James, Muandet, Krikamol
Performative predictions influence the very outcomes they aim to forecast. We study performative predictions that affect a sample (e.g., only existing users of an app) and/or the whole population (e.g., all potential app users). This raises the question of how well models generalize under performativity. For example, how well can we draw insights about new app users based on existing users when both of them react to the app's predictions? We address this question by embedding performative predictions into statistical learning theory. We prove generalization bounds under performative effects on the sample, on the population, and on both. A key intuition behind our proofs is that in the worst case, the population negates predictions, while the sample deceptively fulfills them. We cast such self-negating and self-fulfilling predictions as min-max and min-min risk functionals in Wasserstein space, respectively. Our analysis reveals a fundamental trade-off between performatively changing the world and learning from it: the more a model affects data, the less it can learn from it. Moreover, our analysis results in a surprising insight on how to improve generalization guarantees by retraining on performatively distorted samples. We illustrate our bounds in a case study on prediction-informed assignments of unemployed German residents to job trainings, drawing upon administrative labor market records from 1975 to 2017 in Germany.
Reciprocal Learning
These instances range from active learning over multi-armed bandits to self-training. We show that all these algorithms not only learn parameters from data but also vice versa: They iteratively alter training data in a way that depends on the current model fit. We introduce reciprocal learning as a generalization of these algorithms using the language of decision theory. This allows us to study under what conditions they converge.
Zero Generalization Error Theorem for Random Interpolators via Algebraic Geometry
Yoshida, Naoki, Ishikawa, Isao, Imaizumi, Masaaki
We theoretically demonstrate that the generalization error of interpolators for machine learning models under teacher-student settings becomes 0 once the number of training samples exceeds a certain threshold. Understanding the high generalization ability of large-scale models such as deep neural networks (DNNs) remains one of the central open problems in machine learning theory. While recent theoretical studies have attributed this phenomenon to the implicit bias of stochastic gradient descent (SGD) toward well-generalizing solutions, empirical evidences indicate that it primarily stems from properties of the model itself. Specifically, even randomly sampled interpolators, which are parameters that achieve zero training error, have been observed to generalize effectively. In this study, under a teacher-student framework, we prove that the generalization error of randomly sampled interpolators becomes exactly zero once the number of training samples exceeds a threshold determined by the geometric structure of the interpolator set in parameter space. As a proof technique, we leverage tools from algebraic geometry to mathematically characterize this geometric structure.
Flow Matching for Tabular Data Synthesis
Nasution, Bahrul Ilmi, Eijkelboom, Floor, Elliot, Mark, Allmendinger, Richard, Naesseth, Christian A.
Synthetic data generation is an important tool for privacy-preserving data sharing. While diffusion models have set recent benchmarks, flow matching (FM) offers a promising alternative. This paper presents different ways to implement flow matching for tabular data synthesis. We provide a comprehensive empirical study that compares flow matching (FM and variational FM) with a state-of-the-art diffusion method (TabDDPM and TabSyn) in tabular data synthesis. We evaluate both the standard Optimal Transport (OT) and the Variance Preserving (VP) probability paths, and also compare deterministic and stochastic samplers -- something possible when learning to generate using \textit{variational} flow matching -- characterising the empirical relationship between data utility and privacy risk. Our key findings reveal that flow matching, particularly TabbyFlow, outperforms diffusion baselines. Flow matching methods also achieves better performance with remarkably low function evaluations ($\leq$ 100 steps), offering a substantial computational advantage. The choice of probability path is also crucial, as using the OT path demonstrates superior performance, while VP has potential for producing synthetic data with lower disclosure risk. Lastly, our results show that making flows stochastic not only preserves marginal distributions but, in some instances, enables the generation of high utility synthetic data with reduced disclosure risk.