Elvira, Clément
A New Branch-and-Bound Pruning Framework for $\ell_0$-Regularized Problems
Guyard, Theo, Herzet, Cédric, Elvira, Clément, Arslan, Ayşe-Nur
We consider the resolution of learning problems involving $\ell_0$-regularization via Branch-and-Bound (BnB) algorithms. These methods explore regions of the feasible space of the problem and check whether they do not contain solutions through "pruning tests". In standard implementations, evaluating a pruning test requires to solve a convex optimization problem, which may result in computational bottlenecks. In this paper, we present an alternative to implement pruning tests for some generic family of $\ell_0$-regularized problems. Our proposed procedure allows the simultaneous assessment of several regions and can be embedded in standard BnB implementations with a negligible computational overhead. We show through numerical simulations that our pruning strategy can improve the solving time of BnB procedures by several orders of magnitude for typical problems encountered in machine-learning applications.
One to beat them all: "RYU'' -- a unifying framework for the construction of safe balls
Tran, Thu-Le, Elvira, Clément, Dang, Hong-Phuong, Herzet, Cédric
In this paper, we put forth a novel framework (named ``RYU'') for the construction of ``safe'' balls, i.e. regions that provably contain the dual solution of a target optimization problem. We concentrate on the standard setup where the cost function is the sum of two terms: a closed, proper, convex Lipschitz-smooth function and a closed, proper, convex function. The RYU framework is shown to generalize or improve upon all the results proposed in the last decade for the considered family of optimization problems.
Safe Peeling for L0-Regularized Least-Squares with supplementary material
Guyard, Théo, Monnoyer, Gilles, Elvira, Clément, Herzet, Cédric
Abstract--We introduce a new methodology dubbed "safe The rest of the paper is organized as follows. Our peeling This paper focuses on the resolution of the so-called "l IV and its performance is regularized least-squares" problem given by illustrated in Sec. V. Proofs of the results presented in the p Unfortunately, this problem also turns out to be has to be understood component-wise, e.g., x [l,u] means NP-hard [4, Th. 1]. Moreover, η() denotes the indicator function flurry of contributions proposing tractable procedures able to which equals to 0 if the condition in argument is fulfilled and recover approximate solutions of (1-P). This observation, combined with some recent III. Due to space limitation, we only review the elements of A standard approach is to use a Branch-and-Bound (BnB) interest to introduce the proposed peeling method. We refer the algorithm that solves (1-P), see [6-11]. In this paper, we propose a new strategy, dubbed "safe peeling", to accelerate the exact resolution of (1-P). In a A. Pruning nutshell, our contribution is a computationally simple test applied at each node of the BnB decision tree to identify some The crux of BnB methods consists in identifying and discarding intervals of R In this section, we introduce our proposed peeling procedure. This assumption will One ubiquitous approach in the literature [7, 9, 13] to find be relaxed later on in Sec. A proof of this result is available in App. A. A consequence On the one hand, item i) implies that the pruning test (4) of preserving the pruning decision is that taking the new involves the following quantity (rather than p This additional constraint usually takes the form " M x This impairment pertains to a large class of mixed-integer problems and with M > 0 and is known as "Big-M" constraint, see [7, Sec.
Beyond GAP screening for Lasso by exploiting new dual cutting half-spaces with supplementary material
Tran, Thu-Le, Elvira, Clément, Dang, Hong-Phuong, Herzet, Cédric
In this paper, we propose a novel safe screening test for Lasso. Our procedure is based on a safe region with a dome geometry and exploits a canonical representation of the set of half-spaces (referred to as "dual cutting half-spaces" in this paper) containing the dual feasible set. The proposed safe region is shown to be always included in the state-of-the-art "GAP Sphere" and "GAP Dome" proposed by Fercoq et al. (and strictly so under very mild conditions) while involving the same computational burden. Numerical experiments confirm that our new dome enables to devise more powerful screening tests than GAP regions and lead to significant acceleration to solve Lasso.
Bayesian nonparametric Principal Component Analysis
Elvira, Clément, Chainais, Pierre, Dobigeon, Nicolas
Principal component analysis (PCA) is very popular to perform dimension reduction. The selection of the number of significant components is essential but often based on some practical heuristics depending on the application. Only few works have proposed a probabilistic approach able to infer the number of significant components. To this purpose, this paper introduces a Bayesian nonparametric principal component analysis (BNP-PCA). The proposed model projects observations onto a random orthogonal basis which is assigned a prior distribution defined on the Stiefel manifold. The prior on factor scores involves an Indian buffet process to model the uncertainty related to the number of components. The parameters of interest as well as the nuisance parameters are finally inferred within a fully Bayesian framework via Monte Carlo sampling. A study of the (in-)consistence of the marginal maximum a posteriori estimator of the latent dimension is carried out. A new estimator of the subspace dimension is proposed. Moreover, for sake of statistical significance, a Kolmogorov-Smirnov test based on the posterior distribution of the principal components is used to refine this estimate. The behaviour of the algorithm is first studied on various synthetic examples. Finally, the proposed BNP dimension reduction approach is shown to be easily yet efficiently coupled with clustering or latent factor models within a unique framework.
Bayesian anti-sparse coding
Elvira, Clément, Chainais, Pierre, Dobigeon, Nicolas
Sparse representations have proven their efficiency in solving a wide class of inverse problems encountered in signal and image processing. Conversely, enforcing the information to be spread uniformly over representation coefficients exhibits relevant properties in various applications such as digital communications. Anti-sparse regularization can be naturally expressed through an $\ell_{\infty}$-norm penalty. This paper derives a probabilistic formulation of such representations. A new probability distribution, referred to as the democratic prior, is first introduced. Its main properties as well as three random variate generators for this distribution are derived. Then this probability distribution is used as a prior to promote anti-sparsity in a Gaussian linear inverse problem, yielding a fully Bayesian formulation of anti-sparse coding. Two Markov chain Monte Carlo (MCMC) algorithms are proposed to generate samples according to the posterior distribution. The first one is a standard Gibbs sampler. The second one uses Metropolis-Hastings moves that exploit the proximity mapping of the log-posterior distribution. These samples are used to approximate maximum a posteriori and minimum mean square error estimators of both parameters and hyperparameters. Simulations on synthetic data illustrate the performances of the two proposed samplers, for both complete and over-complete dictionaries. All results are compared to the recent deterministic variational FITRA algorithm.