nully
Communication-Efficient Federated Bilevel Optimization with Local and Global Lower Level Problems
Li, Junyi, Huang, Feihu, Huang, Heng
Bilevel Optimization has witnessed notable progress recently with new emerging efficient algorithms, yet it is underexplored in the Federated Learning setting. It is unclear how the challenges of Federated Learning affect the convergence of bilevel algorithms. In this work, we study Federated Bilevel Optimization problems. We first propose the FedBiO algorithm that solves the hyper-gradient estimation problem efficiently, then we propose FedBiOAcc to accelerate FedBiO. FedBiO has communication complexity $O(\epsilon^{-1.5})$ with linear speed up, while FedBiOAcc achieves communication complexity $O(\epsilon^{-1})$, sample complexity $O(\epsilon^{-1.5})$ and also the linear speed up. We also study Federated Bilevel Optimization problems with local lower level problems, and prove that FedBiO and FedBiOAcc converges at the same rate with some modification.
Fast Adaptive Federated Bilevel Optimization
Bilevel optimization is a popular hierarchical model in machine learning, and has been widely applied to many machine learning tasks such as meta learning, hyperparameter learning and policy optimization. Although many bilevel optimization algorithms recently have been developed, few adaptive algorithm focuses on the bilevel optimization under the distributed setting. It is well known that the adaptive gradient methods show superior performances on both distributed and non-distributed optimization. In the paper, thus, we propose a novel adaptive federated bilevel optimization algorithm (i.e.,AdaFBiO) to solve the distributed bilevel optimization problems, where the objective function of Upper-Level (UL) problem is possibly nonconvex, and that of Lower-Level (LL) problem is strongly convex. Specifically, our AdaFBiO algorithm builds on the momentum-based variance reduced technique and local-SGD to obtain the best known sample and communication complexities simultaneously. In particular, our AdaFBiO algorithm uses the unified adaptive matrices to flexibly incorporate various adaptive learning rates to update variables in both UL and LL problems. Moreover, we provide a convergence analysis framework for our AdaFBiO algorithm, and prove it needs the sample complexity of $\tilde{O}(\epsilon^{-3})$ with communication complexity of $\tilde{O}(\epsilon^{-2})$ to obtain an $\epsilon$-stationary point. Experimental results on federated hyper-representation learning and federated data hyper-cleaning tasks verify efficiency of our algorithm.
Zeroth-Order Algorithms for Nonconvex Minimax Problems with Improved Complexities
Wang, Zhongruo, Balasubramanian, Krishnakumar, Ma, Shiqian, Razaviyayn, Meisam
In this paper, we study zeroth-order algorithms for minimax optimization problems that are nonconvex in one variable and strongly-concave in the other variable. Such minimax optimization problems have attracted significant attention lately due to their applications in modern machine learning tasks. We first design and analyze the Zeroth-Order Gradient Descent Ascent (\texttt{ZO-GDA}) algorithm, and provide improved results compared to existing works, in terms of oracle complexity. Next, we propose the Zeroth-Order Gradient Descent Multi-Step Ascent (\texttt{ZO-GDMSA}) algorithm that significantly improves the oracle complexity of \texttt{ZO-GDA}. We also provide stochastic version of \texttt{ZO-GDA} and \texttt{ZO-GDMSA} to handle stochastic nonconvex minimax problems, and provide oracle complexity results.
Approximation capabilities of neural networks on unbounded domains
We prove universal approximation theorems of neural networks in $L^{p}(\mathbb{R} \times [0, 1]^n)$, under the conditions that $p \in [2, \infty)$ and that the activiation function belongs to among others a monotone sigmoid, relu, elu, softplus or leaky relu. Our results partially generalize classical universal approximation theorems on $[0,1]^n.$
Strong Equivalence for LPMLN Programs
LPMLN is a probabilistic extension of answer set programs with the weight scheme adapted from Markov Logic. We study the concept of strong equivalence in LPMLN, which is a useful mathematical tool for simplifying a part of an LPMLN program without looking at the rest of it. We show that the verification of strong equivalence in LPMLN can be reduced to equivalence checking in classical logic via a reduct and choice rules as well as to equivalence checking under the "soft" logic of here-and-there. The result allows us to leverage an answer set solver for LPMLN strong equivalence checking. The study also suggests us a few reformulations of the LPMLN semantics using choice rules, the logic of here-and-there, and classical logic.
A Syntactic Operator for Forgetting that Satisfies Strong Persistence
Berthold, Matti, Gonçalves, Ricardo, Knorr, Matthias, Leite, João
Whereas the operation of forgetting has recently seen a considerable amount of attention in the context of Answer Set Programming (ASP), most of it has focused on theoretical aspects, leaving the practical issues largely untouched. Recent studies include results about what sets of properties operators should satisfy, as well as the abstract characterization of several operators and their theoretical limits. However, no concrete operators have been investigated. In this paper, we address this issue by presenting the first concrete operator that satisfies strong persistence - a property that seems to best capture the essence of forgetting in the context of ASP - whenever this is possible, and many other important properties. The operator is syntactic, limiting the computation of the forgetting result to manipulating the rules in which the atoms to be forgotten occur, naturally yielding a forgetting result that is close to the original program. This paper is under consideration for acceptance in TPLP.