Goto

Collaborating Authors

 Optimization


Optimal Auction Design for the Gradual Procurement of Strategic Service Provider Agents

arXiv.org Artificial Intelligence

We consider an outsourcing problem where a software agent procures multiple services from providers with uncertain reliabilities to complete a computational task before a strict deadline. The service consumer requires a procurement strategy that achieves the optimal balance between success probability and invocation cost. However, the service providers are self-interested and may misrepresent their private cost information if it benefits them. For such settings, we design a novel procurement auction that provides the consumer with the highest possible revenue, while giving sufficient incentives to providers to tell the truth about their costs. This auction creates a contingent plan for gradual service procurement that suggests recruiting a new provider only when the success probability of the already hired providers drops below a time-dependent threshold. To make this auction incentive compatible, we propose a novel weighted threshold payment scheme which pays the minimum among all truthful mechanisms. Using the weighted payment scheme, we also design a low-complexity near-optimal auction that reduces the computational complexity of the optimal mechanism by 99% with only marginal performance loss (less than 1%). We demonstrate the effectiveness and strength of our proposed auctions through both game theoretical and numerical analysis. The experiment results confirm that the proposed auctions exhibit 59% improvement in performance over the current state-of-the-art, by increasing success probability up to 79% and reducing invocation cost by up to 11%.


On Slowly-varying Non-stationary Bandits

arXiv.org Machine Learning

Reinforcement learning, and specifically bandit optimization, in dynamically changing environments has remained an active topic of study in machine learning. A variety of non-stationary bandit settings have been studied incorporating a range of structural assumptions. At one end are classical stochastic models such as restless bandits [Whittle, 1988], where the state of the arms governs the bandit problem at any instant, but the transitions between these problems (states) follow probabilistic dynamics. At the other extreme are settings with non-stochastic and arbitrarily changing rewards such as prediction with expert advice (and the EXP3 algorithm)[Cesa-Bianchi and Lugosi, 2006; Auer et al., 2002]. In between these extremes lie settings of changing environments where the adversary (environment) is assumed to be limited in its ability to change the rewards, i.e., a structural constraint is put on the amount of change in the rewards across time. These include the abrupt change (or switching experts) model [Garivier and Moulines, 2011], where at most k arbitrary changes to the reward distributions are allowed in the entire time horizon, and the variation-budgeted (drifting) change model [Besbes et al., 2014], in which the total magnitude of changes (of rewards) across successive time steps is constrained to be within an overall budget. In this paper, we focus on slowly-varying bandits - a different and arguably commonly encountered, yet less studied, model of non-stationary bandits. In this setting, the arms are allowed to change arbitrarily over time as long as the amount of change in their mean rewards between two successive time steps is bounded uniformly across the horizon. Many real-world settings naturally involve observables whose distributions are'smooth' over time, in the sense that their instantaneous rate of change is not too large, e.g., slowly drifting distributions in natural language tasks [Lu et al., 2020], data from physical transducers (position, velocity, power, temperature, chemical concentration), and slowly fading wireless


Gaussian Process Bandit Optimization with Few Batches

arXiv.org Machine Learning

In this paper, we consider the problem of black-box optimization using Gaussian Process (GP) bandit optimization with a small number of batches. Assuming the unknown function has a low norm in the Reproducing Kernel Hilbert Space (RKHS), we introduce a batch algorithm inspired by batched finite-arm bandit algorithms, and show that it achieves the cumulative regret upper bound $O^\ast(\sqrt{T\gamma_T})$ using $O(\log\log T)$ batches within time horizon $T$, where the $O^\ast(\cdot)$ notation hides dimension-independent logarithmic factors and $\gamma_T$ is the maximum information gain associated with the kernel. This bound is near-optimal for several kernels of interest and improves on the typical $O^\ast(\sqrt{T}\gamma_T)$ bound, and our approach is arguably the simplest among algorithms attaining this improvement. In addition, in the case of a constant number of batches (not depending on $T$), we propose a modified version of our algorithm, and characterize how the regret is impacted by the number of batches, focusing on the squared exponential and Mat\'ern kernels. The algorithmic upper bounds are shown to be nearly minimax optimal via analogous algorithm-independent lower bounds.


WARPd: A linearly convergent first-order method for inverse problems with approximate sharpness conditions

arXiv.org Machine Learning

Reconstruction of signals from undersampled and noisy measurements is a topic of considerable interest. Sharpness conditions directly control the recovery performance of restart schemes for first-order methods without the need for restrictive assumptions such as strong convexity. However, they are challenging to apply in the presence of noise or approximate model classes (e.g., approximate sparsity). We provide a first-order method: Weighted, Accelerated and Restarted Primal-dual (WARPd), based on primal-dual iterations and a novel restart-reweight scheme. Under a generic approximate sharpness condition, WARPd achieves stable linear convergence to the desired vector. Many problems of interest fit into this framework. For example, we analyze sparse recovery in compressed sensing, low-rank matrix recovery, matrix completion, TV regularization, minimization of $\|Bx\|_{l^1}$ under constraints ($l^1$-analysis problems for general $B$), and mixed regularization problems. We show how several quantities controlling recovery performance also provide explicit approximate sharpness constants. Numerical experiments show that WARPd compares favorably with specialized state-of-the-art methods and is ideally suited for solving large-scale problems. We also present a noise-blind variant based on the Square-Root LASSO decoder. Finally, we show how to unroll WARPd as neural networks. This approximation theory result provides lower bounds for stable and accurate neural networks for inverse problems and sheds light on architecture choices. Code and a gallery of examples are made available online as a MATLAB package.


IQNAS: Interpretable Integer Quadratic Programming Neural Architecture Search

arXiv.org Machine Learning

Realistic use of neural networks often requires adhering to multiple constraints on latency, energy and memory among others. A popular approach to find fitting networks is through constrained Neural Architecture Search (NAS). However, previous methods use complicated predictors for the accuracy of the network. Those predictors are hard to interpret and sensitive to many hyperparameters to be tuned, hence, the resulting accuracy of the generated models is often harmed. In this work we resolve this by introducing Interpretable Integer Quadratic programming Neural Architecture Search (IQNAS), that is based on an accurate and simple quadratic formulation of both the accuracy predictor and the expected resource requirement, together with a scalable search method with theoretical guarantees. The simplicity of our proposed predictor together with the intuitive way it is constructed bring interpretability through many insights about the contribution of different design choices. For example, we find that in the examined search space, adding depth and width is more effective at deeper stages of the network and at the beginning of each resolution stage. Our experiments show that IQNAS generates comparable to or better architectures than other state-of-the-art NAS methods within a reduced search cost for each additional generated network, while strictly satisfying the resource constraints.


Vector Optimization with Stochastic Bandit Feedback

arXiv.org Machine Learning

We introduce vector optimization problems with stochastic bandit feedback, which extends the best arm identification problem to vector-valued rewards. We consider $K$ designs, with multi-dimensional mean reward vectors, which are partially ordered according to a polyhedral ordering cone $C$. This generalizes the concept of Pareto set in multi-objective optimization and allows different sets of preferences of decision-makers to be encoded by $C$. Different than prior work, we define approximations of the Pareto set based on direction-free covering and gap notions. We study the setting where an evaluation of each design yields a noisy observation of the mean reward vector. Under subgaussian noise assumption, we investigate the sample complexity of the na\"ive elimination algorithm in an ($\epsilon,\delta$)-PAC setting, where the goal is to identify an ($\epsilon,\delta$)-PAC Pareto set with the minimum number of design evaluations. In particular, we identify cone-dependent geometric conditions on the deviations of empirical reward vectors from their mean under which the Pareto front can be approximated accurately. We run experiments to verify our theoretical results and illustrate how $C$ and sampling budget affect the Pareto set, returned ($\epsilon,\delta$)-PAC Pareto set and the success of identification.


Variational Wasserstein Barycenters with c-Cyclical Monotonicity

arXiv.org Machine Learning

Summarizing, combining and comparing probability distributions defined on a metric are fundamental tasks in machine learning, statistics and computer science, including multiple sensors, Bayesian inference, among others. For instance, in Bayesian inference one runs posterior sampling algorithm in parallel on different machines using small subsets of the massive data, and then aggregates subset posterior distributions via their barycenter as an approximation to the true posterior for the full data [1, 2]. Besides Bayesian inference, the average or barycenter of a collection of distributions has been successfully applied in various machine learning applications, say image processing [3] and clustering [4, 5]. The theory of optimal transport (OT) [6-9] provides a powerful framework to carry out such comparisons. OT equips the space of distributions with a distance metric known as the Wasserstein distance, which has gained substantial popularity in different fields, leading in particular to the natural consideration of barycenters. The barycenter of multiple given probability distributions under Wasserstein distance is defined as a distribution minimizing the sum of Wasserstein distances to all distributions. Due to the geometric properties of Wasserstein distance, the Wasserstein barycenter can better capture the underlying geometric structure than the barycenter with respect to other popular distances, e.g., Euclidean distance, see Figure 1. As a result, Wasserstein barycenters have a broad range of applications in text mixing [3], imaging [2, 10, 11], and model ensemble [12].


Differentially Private Coordinate Descent for Composite Empirical Risk Minimization

arXiv.org Machine Learning

Machine learning models can leak information about the data used to train them. Differentially Private (DP) variants of optimization algorithms like Stochastic Gradient Descent (DP-SGD) have been designed to mitigate this, inducing a trade-off between privacy and utility. In this paper, we propose a new method for composite Differentially Private Empirical Risk Minimization (DP-ERM): Differentially Private proximal Coordinate Descent (DP-CD). We analyze its utility through a novel theoretical analysis of inexact coordinate descent, and highlight some regimes where DP-CD outperforms DP-SGD, thanks to the possibility of using larger step sizes. We also prove new lower bounds for composite DP-ERM under coordinate-wise regularity assumptions, that are, in some settings, nearly matched by our algorithm. In practical implementations, the coordinate-wise nature of DP-CD updates demands special care in choosing the clipping thresholds used to bound individual contributions to the gradients. A natural parameterization of these thresholds emerges from our theory, limiting the addition of unnecessarily large noise without requiring coordinate-wise hyperparameter tuning or extra computational cost.


Diversified Sampling for Batched Bayesian Optimization with Determinantal Point Processes

arXiv.org Machine Learning

In Bayesian Optimization (BO) we study black-box function optimization with noisy point evaluations and Bayesian priors. Convergence of BO can be greatly sped up by batching, where multiple evaluations of the black-box function are performed in a single round. The main difficulty in this setting is to propose at the same time diverse and informative batches of evaluation points. In this work, we introduce DPP-Batch Bayesian Optimization (DPP-BBO), a universal framework for inducing batch diversity in sampling based BO by leveraging the repulsive properties of Determinantal Point Processes (DPP) to naturally diversify the batch sampling procedure. We illustrate this framework by formulating DPP-Thompson Sampling (DPP-TS) as a variant of the popular Thompson Sampling (TS) algorithm and introducing a Markov Chain Monte Carlo procedure to sample from it. We then prove novel Bayesian simple regret bounds for both classical batched TS as well as our counterpart DPP-TS, with the latter bound being tighter. Our real-world, as well as synthetic, experiments demonstrate improved performance of DPP-BBO over classical batching methods with Gaussian process and Cox process models.


Learning from ants

#artificialintelligence

Learning from ants: Ant colony optimization algorithms are versatile and useful for several real-world applications. These applications usually center on complex optimization problems. Here are three uses for the algorithm. In a logistics example, perhaps the distance between destinations, traffic conditions, types of packages being delivered, and times of day are important constraints to optimize the operations of the business. ACOs can help with that.