Optimization
Developmental Bayesian Optimization of Black-Box with Visual Similarity-Based Transfer Learning
Depierre, Amaury, Petit, Maxime, Wang, Xiaofang, Dellandréa, Emmanuel, Chen, Liming
We present a developmental framework based on a long-term memory and reasoning mechanisms (Vision Similarity and Bayesian Optimisation). This architecture allows a robot to optimize autonomously hyper-parameters that need to be tuned from any action and/or vision module, treated as a black-box. The learning can take advantage of past experiences (stored in the episodic and procedural memories) in order to warm-start the exploration using a set of hyper-parameters previously optimized from objects similar to the new unknown one (stored in a semantic memory). As example, the system has been used to optimized 9 continuous hyper-parameters of a professional software (Kamido) both in simulation and with a real robot (industrial robotic arm Fanuc) with a total of 13 different objects. The robot is able to find a good object-specific optimization in 68 (simulation) or 40 (real) trials. In simulation, we demonstrate the benefit of the transfer learning based on visual similarity, as opposed to an amnesic learning (i.e. learning from scratch all the time). Moreover, with the real robot, we show that the method consistently outperforms the manual optimization from an expert with less than 2 hours of training time to achieve more than 88% of success.
Detecting egregious responses in neural sequence-to-sequence models
In this work, we attempt to answer a critical question: whether there exists some input sequence that will cause a well-trained discrete-space neural network sequence-to-sequence (seq2seq) model to generate egregious outputs (aggressive, malicious, attacking, etc.). And if such inputs exist, how to find them efficiently. We adopt an empirical methodology, in which we first create lists of egregious output sequences, and then design a discrete optimization algorithm to find input sequences that will cause the model to generate them. Moreover, the optimization algorithm is enhanced for large vocabulary search and constrained to search for input sequences that are likely to be input by real-world users. In our experiments, we apply this approach to dialogue response generation models trained on three real-world dialogue data-sets: Ubuntu, Switchboard and OpenSubtitles, testing whether the model can generate malicious responses. We demonstrate that given the trigger inputs our algorithm finds, a significant number of malicious sentences are assigned large probability by the model, which reveals an undesirable consequence of standard seq2seq training. Recently, research on adversarial attacks (Goodfellow et al., 2014; Szegedy et al., 2013) has been gaining increasing attention: it has been found that for trained deep neural networks (DNNs), when an imperceptible perturbation is applied to the input, the output of the model can change significantly (from correct to incorrect). This line of research has serious implications for our understanding of deep learning models and how we can apply them securely in real-world applications. It has also motivated researchers to design new models or training procedures (Madry et al., 2017), to make the model more robust to those attacks.
Heuristic Optimization of Electrical Energy Systems: A Perpetual Motion Scheme and Refined Metrics to Compare the Solutions
Chicco, Gianfranco, Mazza, Andrea
Many optimization problems admit a number of local optima, among which there is the global optimum. For these problems, various heuristic optimization methods have been proposed. Comparing the results of these solvers requires the definition of suitable metrics. In the electrical energy systems literature, simple metrics such as best value obtained, the mean value, the median or the standard deviation of the solutions are still used. However, the comparisons carried out with these metrics are rather weak, and on these bases a somehow uncontrolled proliferation of heuristic solvers is taking place. This paper addresses the overall issue of understanding the reasons of this proliferation, showing that the assessment of the best solver can be cast into a perpetual motion scheme. Moreover, this paper shows how the use of more refined metrics defined to compare the optimization result, associated with the definition of appropriate benchmarks, may make the comparisons among the solvers more robust. The proposed metrics are based on the concept of first-order stochastic dominance and are defined for the cases in which: : (i) the globally optimal solution can be found (for testing purposes); and (ii) the number of possible solutions is so large that practically it cannot be guaranteed that the global optimum has been found. Illustrative examples are provided for a typical problem in the electrical energy systems area - distribution network reconfiguration. The conceptual results obtained are generally valid to compare the results of other optimization problems.
Learning to Progressively Plan
For problem solving, making reactive decisions based on problem description is fast but inaccurate, while search-based planning using heuristics gives better solutions but could be exponentially slow. In this paper, we propose a new approach that improves an existing solution by iteratively picking and rewriting its local components until convergence. The rewriting policy employs a neural network trained with reinforcement learning. We evaluate our approach in two domains: job scheduling and expression simplification. Compared to common effective heuristics, baseline deep models and search algorithms, our approach efficiently gives solutions with higher quality.
A simple parameter-free and adaptive approach to optimization under a minimal local smoothness assumption
Bartlett, Peter L., Gabillon, Victor, Valko, Michal
We study the problem of optimizing a function under a \emph{budgeted number of evaluations}. We only assume that the function is \emph{locally} smooth around one of its global optima. The difficulty of optimization is measured in terms of 1) the amount of \emph{noise} $b$ of the function evaluation and 2) the local smoothness, $d$, of the function. A smaller $d$ results in smaller optimization error. We come with a new, simple, and parameter-free approach. First, for all values of $b$ and $d$, this approach recovers at least the state-of-the-art regret guarantees. Second, our approach additionally obtains these results while being \textit{agnostic} to the values of both $b$ and $d$. This leads to the first algorithm that naturally adapts to an \textit{unknown} range of noise $b$ and leads to significant improvements in a moderate and low-noise regime. Third, our approach also obtains a remarkable improvement over the state-of-the-art \SOO algorithm when the noise is very low which includes the case of optimization under deterministic feedback ($b=0$). There, under our minimal local smoothness assumption, this improvement is of exponential magnitude and holds for a class of functions that covers the vast majority of functions that practitioners optimize ($d=0$). We show that our algorithmic improvement is also borne out in the numerical experiments, where we empirically show faster convergence on common benchmark functions.
Riemannian Adaptive Optimization Methods
Bécigneul, Gary, Ganea, Octavian-Eugen
Several first order stochastic optimization methods commonly used in the Euclidean domain such as stochastic gradient descent (SGD), accelerated gradient descent or variance reduced methods have already been adapted to certain Riemannian settings. However, some of the most popular of these optimization tools - namely Adam , Adagrad and the more recent Amsgrad - remain to be generalized to Riemannian manifolds. We discuss the difficulty of generalizing such adaptive schemes to the most agnostic Riemannian setting, and then provide algorithms and convergence proofs for geodesically convex objectives in the particular case of a product of Riemannian manifolds, in which adaptivity is implemented across manifolds in the cartesian product. Our generalization is tight in the sense that choosing the Euclidean space as Riemannian manifold yields the same algorithms and regret bounds as those that were already known for the standard algorithms. Experimentally, we show faster convergence and to a lower train loss value for Riemannian adaptive methods over their corresponding baselines on the realistic task of embedding the WordNet taxonomy in the Poincare ball.
Taming VAEs
Rezende, Danilo Jimenez, Viola, Fabio
In spite of remarkable progress in deep latent variable generative modeling, training still remains a challenge due to a combination of optimization and generalization issues. In practice, a combination of heuristic algorithms (such as hand-crafted annealing of KL-terms) is often used in order to achieve the desired results, but such solutions are not robust to changes in model architecture or dataset. The best settings can often vary dramatically from one problem to another, which requires doing expensive parameter sweeps for each new case. Here we develop on the idea of training VAEs with additional constraints as a way to control their behaviour. We first present a detailed theoretical analysis of constrained VAEs, expanding our understanding of how these models work. We then introduce and analyze a practical algorithm termed Generalized ELBO with Constrained Optimization, GECO. The main advantage of GECO for the machine learning practitioner is a more intuitive, yet principled, process of tuning the loss. This involves defining of a set of constraints, which typically have an explicit relation to the desired model performance, in contrast to tweaking abstract hyper-parameters which implicitly affect the model behavior. Encouraging experimental results in several standard datasets indicate that GECO is a very robust and effective tool to balance reconstruction and compression constraints.
How to Use the Facebook Budget Optimization Tool for Improved Results : Social Media Examiner
Wondering how to allocate your budget to reach the most effective Facebook audiences? Facebook's Budget Optimization tool uses an algorithm to automatically optimize your budget distribution across ad sets. In this article, you'll discover what the Facebook Budget Optimization tool offers, how it works, and when you should use it. How to Use the Facebook Budget Optimization Tool for Improved Results by Meg Brunson on Social Media Examiner. As with most aspects of Facebook advertising, the most effective way to identify what's working best for your business is through testing.
Artificial Intelligence Enabled Software Defined Networking: A Comprehensive Overview
Software defined networking (SDN) represents a promising networking architecture that combines central management and network programmability. SDN separates the control plane from the data plane and moves the network management to a central point, called the controller, that can be programmed and used as the brain of the network. Recently, the research community has showed an increased tendency to benefit from the recent advancements in the artificial intelligence (AI) field to provide learning abilities and better decision making in SDN. In this study, we provide a detailed overview of the recent efforts to include AI in SDN. Our study showed that the research efforts focused on three main sub-fields of AI namely: machine learning, meta-heuristics and fuzzy inference systems. Accordingly, in this work we investigate their different application areas and potential use, as well as the improvements achieved by including AI-based techniques in the SDN paradigm.
Convex Relaxation Methods for Community Detection
Li, Xiaodong, Chen, Yudong, Xu, Jiaming
This paper surveys recent theoretical advances in convex optimization approaches for community detection. We introduce some important theoretical techniques and results for establishing the consistency of convex community detection under various statistical models. In particular, we discuss the basic techniques based on the primal and dual analysis. We also present results that demonstrate several distinctive advantages of convex community detection, including robustness against outlier nodes, consistency under weak assortativity, and adaptivity to heterogeneous degrees. This survey is not intended to be a complete overview of the vast literature on this fast-growing topic. Instead, we aim to provide a big picture of the remarkable recent development in this area and to make the survey accessible to a broad audience. We hope that this expository article can serve as an introductory guide for readers who are interested in using, designing, and analyzing convex relaxation methods in network analysis.