Goto

Collaborating Authors

 Optimization


Lipschitz constant estimation of Neural Networks via sparse polynomial optimization

arXiv.org Machine Learning

We introduce LiPopt, a polynomial optimization framework for computing increasingly tighter upper bounds on the Lipschitz constant of neural networks. The underlying optimization problems boil down to either linear (LP) or semidefinite (SDP) programming. We show how to use the sparse connectivity of a network, to significantly reduce the complexity of computation. This is specially useful for convolutional as well as pruned neural networks. We conduct experiments on networks with random weights as well as networks trained on MNIST, showing that in the particular case of the $\ell_\infty$-Lipschitz constant, our approach yields superior estimates, compared to baselines available in the literature.


Optimization in Machine Learning: A Distribution Space Approach

arXiv.org Machine Learning

We present the viewpoint that optimization problems encountered in machine learning can often be interpreted as minimizing a convex functional over a function space, but with a non-convex constraint set introduced by model parameterization. This observation allows us to repose such problems via a suitable relaxation as convex optimization problems in the space of distributions over the training parameters. We derive some simple relationships between the distribution-space problem and the original problem, e.g. a distribution-space solution is at least as good as a solution in the original space. Moreover, we develop a numerical algorithm based on mixture distributions to perform approximate optimization directly in distribution space. Consistency of this approximation is established and the numerical efficacy of the proposed algorithm is illustrated on simple examples. In both theory and practice, this formulation provides an alternative approach to large-scale optimization in machine learning.


A stochastic approach to handle knapsack problems in the creation of ensembles

arXiv.org Machine Learning

Ensemble-based methods are highly popular approaches that increase the accuracy of a decision by aggregating the opinions of individual voters. The common point is to maximize accuracy; however, a natural limitation occurs if incremental costs are also assigned to the individual voters. Consequently, we investigate creating ensembles under an additional constraint on the total cost of the members. This task can be formulated as a knapsack problem, where the energy is the ensemble accuracy formed by some aggregation rules. However, the generally applied aggregation rules lead to a nonseparable energy function, which takes the common solution tools -- such as dynamic programming -- out of action. We introduce a novel stochastic approach that considers the energy as the joint probability function of the member accuracies. This type of knowledge can be efficiently incorporated in a stochastic search process as a stopping rule, since we have the information on the expected accuracy or, alternatively, the probability of finding more accurate ensembles. Experimental analyses of the created ensembles of pattern classifiers and object detectors confirm the efficiency of our approach. Moreover, we propose a novel stochastic search strategy that better fits the energy, compared with general approaches such as simulated annealing.


Taxonomy of Dual Block-Coordinate Ascent Methods for Discrete Energy Minimization

arXiv.org Machine Learning

We consider the maximum-a-posteriori inference problem in discrete graphical models and study solvers based on the dual block-coordinate ascent rule. We map all existing solvers in a single framework, allowing for a better understanding of their design principles. We theoretically show that some block-optimizing updates are sub-optimal and how to strictly improve them. On a wide range of problem instances of varying graph connectivity, we study the performance of existing solvers as well as new variants that can be obtained within the framework. As a result of this exploration we build a new state-of-the art solver, performing uniformly better on the whole range of test instances.


On the Combined Impact of Population Size and Sub-problem Selection in MOEA/D

arXiv.org Artificial Intelligence

This paper intends to understand and to improve the working principle of decomposition-based multi-objective evolutionary algorithms. We review the design of the well-established Moea/d framework to support the smooth integration of different strategies for sub-problem selection, while emphasizing the role of the population size and of the number of offspring created at each generation. By conducting a comprehensive empirical analysis on a wide range of multi-and many-objective combinatorial NK landscapes, we provide new insights into the combined effect of those parameters on the anytime performance of the underlying search process. In particular, we show that even a simple random strategy selecting sub-problems at random outperforms existing sophisticated strategies. We also study the sensitivity of such strategies with respect to the ruggedness and the objective space dimension of the target problem.


Sparse Regression at Scale: Branch-and-Bound rooted in First-Order Optimization

arXiv.org Machine Learning

We consider the least squares regression problem, penalized with a combination of the $\ell_{0}$ and $\ell_{2}$ norms (a.k.a. $\ell_0 \ell_2$ regularization). Recent work presents strong evidence that the resulting $\ell_0$-based estimators can outperform popular sparse learning methods, under many important high-dimensional settings. However, exact computation of $\ell_0$-based estimators remains a major challenge. Indeed, state-of-the-art mixed integer programming (MIP) methods for $\ell_0 \ell_2$-regularized regression face difficulties in solving many statistically interesting instances when the number of features $p \sim 10^4$. In this work, we present a new exact MIP framework for $\ell_0\ell_2$-regularized regression that can scale to $p \sim 10^7$, achieving over $3600$x speed-ups compared to the fastest exact methods. Unlike recent work, which relies on modern MIP solvers, we design a specialized nonlinear BnB framework, by critically exploiting the problem structure. A key distinguishing component in our algorithm lies in efficiently solving the node relaxations using specialized first-order methods, based on coordinate descent (CD). Our CD-based method effectively leverages information across the BnB nodes, through using warm starts, active sets, and gradient screening. In addition, we design a novel method for obtaining dual bounds from primal solutions, which certifiably works in high dimensions. Experiments on synthetic and real high-dimensional datasets demonstrate that our method is not only significantly faster than the state of the art, but can also deliver certifiably optimal solutions to statistically challenging instances that cannot be handled with existing methods. We open source the implementation through our toolkit L0BnB.


Augmentation of the Reconstruction Performance of Fuzzy C-Means with an Optimized Fuzzification Factor Vector

arXiv.org Artificial Intelligence

Information granules have been considered to be the fundamental constructs of Granular Computing (GrC). As a useful unsupervised learning technique, Fuzzy C-Means (FCM) is one of the most frequently used methods to construct information granules. The FCM-based granulation-degranulation mechanism plays a pivotal role in GrC. In this paper, to enhance the quality of the degranulation (reconstruction) process, we augment the FCM-based degranulation mechanism by introducing a vector of fuzzification factors (fuzzification factor vector) and setting up an adjustment mechanism to modify the prototypes and the partition matrix. The design is regarded as an optimization problem, which is guided by a reconstruction criterion. In the proposed scheme, the initial partition matrix and prototypes are generated by the FCM. Then a fuzzification factor vector is introduced to form an appropriate fuzzification factor for each cluster to build up an adjustment scheme of modifying the prototypes and the partition matrix. With the supervised learning mode of the granulation-degranulation process, we construct a composite objective function of the fuzzification factor vector, the prototypes and the partition matrix. Subsequently, the particle swarm optimization (PSO) is employed to optimize the fuzzification factor vector to refine the prototypes and develop the optimal partition matrix. Finally, the reconstruction performance of the FCM algorithm is enhanced. We offer a thorough analysis of the developed scheme. In particular, we show that the classical FCM algorithm forms a special case of the proposed scheme. Experiments completed for both synthetic and publicly available datasets show that the proposed approach outperforms the generic data reconstruction approach.


Introduction to Evolutionary Algorithms

#artificialintelligence

Evolution by natural selection is a scientific theory which aims to explain how natural systems evolved over time into more complex systems. In evolutionary algorithms, a fitness value can be used as a guide to indicate how close we are to a solution (eg. the higher the value, the closer we are to our desired objective). By grouping closer together all the elements in a population which share a similar fitnesses and further apart all the dissimilar elements, we can then construct a Fitness Landscape (Figure 1). One of the main problems faced by evolutionary algorithms is the presence of local optima in the fitness landscape. Local optima, can, in fact, mislead our algorithm to not reach our desired global maxima in favour of a less optimal solution.


Lightwave Power Transfer for Federated Learning-based Wireless Networks

arXiv.org Artificial Intelligence

Federated Learning (FL) has been recently presented as a new technique for training shared machine learning models in a distributed manner while respecting data privacy. However, implementing FL in wireless networks may significantly reduce the lifetime of energy-constrained mobile devices due to their involvement in the construction of the shared learning models. To handle this issue, we propose a novel approach at the physical layer based on the application of lightwave power transfer in the FL-based wireless network and a resource allocation scheme to manage the network's power efficiency. Hence, we formulate the corresponding optimization problem and then propose a method to obtain the optimal solution. Numerical results reveal that, the proposed scheme can provide sufficient energy to a mobile device for performing FL tasks without using any power from its own battery. Hence, the proposed approach can support the FL-based wireless network to overcome the issue of limited energy in mobile devices.


Relaxed Dual Optimal Inequalities for Relaxed Columns: with Application to Vehicle Routing

arXiv.org Artificial Intelligence

In this paper we accelerate the column generation (CG) solution to expanded linear programming (LP) relaxations (Barnhart et al. 1996) using dual optimal inequalities (Ben Amor et al. 2006) (DOI). Expanded LP relaxations are used to solve integer linear programs (ILPs) for which compact LP relaxations are very loose. In contrast to compact LP relaxations, which contain a small number of variables, expanded LP relaxations contain a massive number of variables (called columns). However an expanded LP relaxation is often much tighter than the corresponding compact LP relaxation, and permits efficient (often in practice exact) optimization (Yarkony et al. 2019) of the corresponding ILP. To solve expanded LP relaxations, CG is used. Since the set of all feasible columns is enormous and can not be easily enumerated, a sufficient set is constructed iteratively using CG. The process of identifying negative reduced cost columns is called pricing.