An estimation of distribution algorithm with adaptive Gibbs sampling for unconstrained global optimization Machine Learning

In this paper is proposed a new heuristic approach belonging to the field of evolutionary Estimation of Distribution Algorithms (EDAs). EDAs builds a probability model and a set of solutions is sampled from the model which characterizes the distribution of such solutions. The main framework of the proposed method is an estimation of distribution algorithm, in which an adaptive Gibbs sampling is used to generate new promising solutions and, in combination with a local search strategy, it improves the individual solutions produced in each iteration. The Estimation of Distribution Algorithm with Adaptive Gibbs Sampling we are proposing in this paper is called AGEDA. We experimentally evaluate and compare this algorithm against two deterministic procedures and several stochastic methods in three well known test problems for unconstrained global optimization. It is empirically shown that our heuristic is robust in problems that involve three central aspects that mainly determine the difficulty of global optimization problems, namely high-dimensionality, multi-modality and non-smoothness.

Adaptive Stochastic Optimization: From Sets to Paths

Neural Information Processing Systems

Adaptive stochastic optimization optimizes an objective function adaptively under uncertainty. Adaptive stochastic optimization plays a crucial role in planning and learning under uncertainty, but is, unfortunately, computationally intractable in general. This paper introduces two conditions on the objective function, the marginal likelihood rate bound and the marginal likelihood bound, which enable efficient approximate solution of adaptive stochastic optimization. Several interesting classes of functions satisfy these conditions naturally, e.g., the version space reduction function for hypothesis learning. We describe Recursive Adaptive Coverage (RAC), a new adaptive stochastic optimization algorithm that exploits these conditions, and apply it to two planning tasks under uncertainty. In constrast to the earlier submodular optimization approach, our algorithm applies to adaptive stochastic optimization algorithm over both sets and paths.

Adaptive Maximization of Pointwise Submodular Functions With Budget Constraint Machine Learning

We study the worst-case adaptive optimization problem with budget constraint that is useful for modeling various practical applications in artificial intelligence and machine learning. We investigate the near-optimality of greedy algorithms for this problem with both modular and non-modular cost functions. In both cases, we prove that two simple greedy algorithms are not near-optimal but the best between them is near-optimal if the utility function satisfies pointwise submodularity and pointwise cost-sensitive submodularity respectively. This implies a combined algorithm that is near-optimal with respect to the optimal algorithm that uses half of the budget. We discuss applications of our theoretical results and also report experiments comparing the greedy algorithms on the active learning problem.

Beyond Adaptive Submodularity: Approximation Guarantees of Greedy Policy with Adaptive Submodularity Ratio Machine Learning

We propose a new concept named adaptive submodularity ratio to study the greedy policy for sequential decision making. While the greedy policy is known to perform well for a wide variety of adaptive stochastic optimization problems in practice, its theoretical properties have been analyzed only for a limited class of problems. We narrow the gap between theory and practice by using adaptive submodularity ratio, which enables us to prove approximation guarantees of the greedy policy for a substantially wider class of problems. Examples of newly analyzed problems include important applications such as adaptive influence maximization and adaptive feature selection. Our adaptive submodularity ratio also provides bounds of adaptivity gaps. Experiments confirm that the greedy policy performs well with the applications being considered compared to standard heuristics.

DADAM: A Consensus-based Distributed Adaptive Gradient Method for Online Optimization Machine Learning

Adaptive gradient-based optimization methods such as \textsc{Adagrad}, \textsc{Rmsprop}, and \textsc{Adam} are widely used in solving large-scale machine learning problems including deep learning. A number of schemes have been proposed in the literature aiming at parallelizing them, based on communications of peripheral nodes with a central node, but incur high communications cost. To address this issue, we develop a novel consensus-based distributed adaptive moment estimation method (\textsc{Dadam}) for online optimization over a decentralized network that enables data parallelization, as well as decentralized computation. The method is particularly useful, since it can accommodate settings where access to local data is allowed. Further, as established theoretically in this work, it can outperform centralized adaptive algorithms, for certain classes of loss functions used in applications. We analyze the convergence properties of the proposed algorithm and provide a dynamic regret bound on the convergence rate of adaptive moment estimation methods in both stochastic and deterministic settings. Empirical results demonstrate that \textsc{Dadam} works also well in practice and compares favorably to competing online optimization methods.