stationary point
A Unified Framework for Structure-Aware Clustering and Heterogeneous Causal Graph Learning
Du, Honglin, Liang, Muxuan, Zhong, Xiang
In complex multivariate systems, interactions among variables are defined by dependency structures, often encoded as directed acyclic graphs ($\text{DAGs}$). However, dependency structures can vary across subjects, and ignoring this structural heterogeneity introduces bias and obscures subpopulation-specific dependencies. To address this, we propose Directed Acyclic Graph-based Dependency Clustering via Alternating Direction Method of Multipliers (DAG-DC-ADMM), a unified framework built upon Structural Equation Modeling (SEM) that jointly learns cluster assignments and cluster-specific dependency structures. We encode acyclicity via a smooth constraint and integrate a groupwise truncated Lasso fusion penalty (gTLP) to cluster subjects based on their structural similarity. This yields a nonconvex optimization problem that incorporates sparsity, acyclicity, and structural consensus constraints. We address the nonconvexity by using the augmented Lagrangian method and solve it with an adapted version of the Alternating Direction Method of Multipliers (ADMM) for difference-of-convex programs. For certain graph structures, such as upper triangular adjacency matrices, our algorithm is guaranteed to converge to a Karush-Kuhn-Tucker (KKT) point. Experiments demonstrate that our method recovers cluster-specific causal dependency structures with a high true positive rate and a low false discovery rate. This capability enables the robust discovery of heterogeneous dependencies across subjects where the subpopulation label is unknown.
Complexity of Non-Log-Concave Sampling in Fisher Information
We study the query complexity of obtaining a relative Fisher information guarantee for sampling from a log-smooth non-log-concave distribution; this is a sampling analog of finding an approximate stationary point in optimization. Our algorithm is based on the proximal sampler, which is an implicit discretization of the Langevin diffusion, and requires an implementation of the backward step known as the restricted Gaussian oracle (RGO). We show that by leveraging the recent results for log-concave sampling with high-accuracy guarantees in Rรฉnyi divergence, we can obtain an approximate RGO implementation that -- when used with the proximal sampler -- yields a complexity guarantee in relative Fisher information that inherits the same dimension dependence as log-concave sampling, and improves upon prior work for non-log-concave sampling. We also show a converse reduction that any improvement in the dimension dependence in relative Fisher information for non-log-concave sampling will yield an improved dimension dependence for high-accuracy log-concave sampling.
Penalty-Based First-Order Methods for Bilevel Optimization with Minimax and Constrained Lower-Level Problems
Shen, Yiyang, He, Yutian, Wang, Weiran, Lin, Qihang
We study a class of bilevel optimization problems in which both the upper- and lower-level problems have minimax structures. This setting captures a broad range of emerging applications. Despite the extensive literature on bilevel optimization and minimax optimization separately, existing methods mainly focus on bilevel optimization with lower-level minimization problems, often under strong convexity assumptions, and are not directly applicable to the minimax lower-level setting considered here. To address this gap, we develop penalty-based first-order methods for bilevel minimax optimization without requiring strong convexity of the lower-level problem. In the deterministic setting, we establish that the proposed method finds an $ฮต$-KKT point with $\tilde{O}(ฮต^{-4})$ oracle complexity. We further show that bilevel problems with convex constrained lower-level minimization can be reformulated as special cases of our framework via Lagrangian duality, leading to an $\tilde{O}(ฮต^{-4})$ complexity bound that improves upon the existing $\tilde{O}(ฮต^{-7})$ result. Finally, we extend our approach to the stochastic setting, where only stochastic gradient oracles are available, and prove that the proposed stochastic method finds a nearly $ฮต$-KKT point with $\tilde{O}(ฮต^{-9})$ oracle complexity.
Supplementary to Smooth Bilevel Programming for Sparse Regularization Clarice Poon, Gabriel Peyrรฉ APseudocode for gradient descent implementation
Note that f(ฮฒt) = gt is computed either as in line 5 or line 9 of the algorithm and one can use these computations for any gradient based algorithm (e.g. Note also that this is simply gradient descent on a smooth function, and one can apply typical methods to choosing the stepsize ฮณk, such as the Barzilai-Borwein stepsize [Barzilai and Borwein, 1988]. Algorithm 1: Gradient descent implementation of Ncvx-Pro for solving Lasso. 1 initialization v0 Rn (with no zero entries), stepsize ฮณt > 0; Result: ฮฒt 2 while not converged do 3 if n6 mand ฮป>0 then 4 ut = diag(vt)X>Xdiag(vt) + ฮปId To show that i) implies ii), recall that a convex, proper and lower semicontinuous function ฯ can be written in terms of its convex conjugate which has domain Rd . For the expression of ฯwhen Ris a norm,from the above, we know that ฯ = ( ฯ) ( z), and recall that for any norm, R(ฮฒ) = maxR (w)61hw, ฮฒi. We derive some properties of the function h: Lemma 1.