Plotting


Supplement for Fast Scalable and Accurate Discovery of DAGs Using the Best Order Score Search and Grow-Shrink Trees

Neural Information Processing Systems

Linear Gaussian simulations are included to establish the performance of BOSS compared to existing causal discovery algorithms on Erdős-Rényi and scale-free graphs. These results are tabulated in Tables 1, 2, 3, and 4 which compare the following algorithms: BOSS GRaSP fGES PC DAGMA - converted to CPDAG LiNGAM was not included in the linear Gaussian comparison because no non-Gaussian signal is available. Linear Non-Gaussian simulations are included to establish the performance of BOSS compared to methods that (can) take advantage of non-Gaussian signal. These results are tabulated in Tables 5, 6, 7, and 8 which compare the following algorithms: BOSS DAGMA LiNGAM High dimensional simulations are included to establish the scalability of BOSS compared to GRaSP. These results are tabulated in Tables 9 and 10.


Chasing Fairness Under Distribution Shift: A Model Weight Perturbation Approach

Neural Information Processing Systems

Fairness in machine learning has attracted increasing attention in recent years. The fairness methods improving algorithmic fairness for in-distribution data may not perform well under distribution shifts. In this paper, we first theoretically demonstrate the inherent connection between distribution shift, data perturbation, and model weight perturbation. Subsequently, we analyze the sufficient conditions to guarantee fairness (i.e., low demographic parity) for the target dataset, including fairness for the source dataset, and low prediction difference between the source and target datasets for each sensitive attribute group. Motivated by these sufficient conditions, we propose robust fairness regularization (RFR) by considering the worst case within the model weight perturbation ball for each sensitive attribute group. We evaluate the effectiveness of our proposed RFR algorithm on synthetic and real distribution shifts across various datasets. Experimental results demonstrate that RFR achieves better fairness-accuracy trade-off performance compared with several baselines.


Chasing Fairness Under Distribution Shift: A Model Weight Perturbation Approach

Neural Information Processing Systems

Fairness in machine learning has attracted increasing attention in recent years. The fairness methods improving algorithmic fairness for in-distribution data may not perform well under distribution shifts. In this paper, we first theoretically demonstrate the inherent connection between distribution shift, data perturbation, and model weight perturbation. Subsequently, we analyze the sufficient conditions to guarantee fairness (i.e., low demographic parity) for the target dataset, including fairness for the source dataset, and low prediction difference between the source and target datasets for each sensitive attribute group. Motivated by these sufficient conditions, we propose robust fairness regularization (RFR) by considering the worst case within the model weight perturbation ball for each sensitive attribute group. We evaluate the effectiveness of our proposed RFR algorithm on synthetic and real distribution shifts across various datasets. Experimental results demonstrate that RFR achieves better fairness-accuracy trade-off performance compared with several baselines.


Variational Annealing on Graphs for Combinatorial Optimization Sebastian Sanokowski 1,2 Wilhelm Berghammer 2 Sebastian Lehner

Neural Information Processing Systems

Several recent unsupervised learning methods use probabilistic approaches to solve combinatorial optimization (CO) problems based on the assumption of statistically independent solution variables. We demonstrate that this assumption imposes performance limitations in particular on difficult problem instances. Our results corroborate that an autoregressive approach which captures statistical dependencies among solution variables yields superior performance on many popular CO problems. We introduce subgraph tokenization in which the configuration of a set of solution variables is represented by a single token. This tokenization technique alleviates the drawback of the long sequential sampling procedure which is inherent to autoregressive methods without sacrificing expressivity. Importantly, we theoretically motivate an annealed entropy regularization and show empirically that it is essential for efficient and stable learning.



The Gain of Ordering in Online Learning

Neural Information Processing Systems

We study fixed-design online learning where the learner is allowed to choose the order of the datapoints in order to minimize their regret (aka self-directed online learning).


Sub-optimality of the Naive Mean Field approximation for proportional high-dimensional Linear Regression

Neural Information Processing Systems

The Naïve Mean Field (NMF) approximation is widely employed in modern Machine Learning due to the huge computational gains it bestows on the statistician. Despite its popularity in practice, theoretical guarantees for high-dimensional problems are only available under strong structural assumptions (e.g., sparsity). Moreover, existing theory often does not explain empirical observations noted in the existing literature. In this paper, we take a step towards addressing these problems by deriving sharp asymptotic characterizations for the NMF approximation in high-dimensional linear regression. Our results apply to a wide class of natural priors and allow for model mismatch (i.e., the underlying statistical model can be different from the fitted model).


A Generalized Alternating Method for Bilevel Optimization under the Polyak-Łojasiewicz Condition

Neural Information Processing Systems

Bilevel optimization has recently regained interest owing to its applications in emerging machine learning fields such as hyperparameter optimization, metalearning, and reinforcement learning. Recent results have shown that simple alternating (implicit) gradient-based algorithms can match the convergence rate of single-level gradient descent (GD) when addressing bilevel problems with a strongly convex lower-level objective. However, it remains unclear whether this result can be generalized to bilevel problems beyond this basic setting. In this paper, we first introduce a stationary metric for the considered bilevel problems, which generalizes the existing metric, for a nonconvex lower-level objective that satisfies the Polyak-Łojasiewicz (PL) condition.