Search
Online Optimization with Predictions and Non-convex Losses
Lin, Yiheng, Goel, Gautam, Wierman, Adam
We study online optimization in a setting where an online learner seeks to optimize a per-round hitting cost, which may be non-convex, while incurring a movement cost when changing actions between rounds. We ask: \textit{under what general conditions is it possible for an online learner to leverage predictions of future cost functions in order to achieve near-optimal costs?} Prior work has provided near-optimal online algorithms for specific combinations of assumptions about hitting and switching costs, but no general results are known. In this work, we give two general sufficient conditions that specify a relationship between the hitting and movement costs which guarantees that a new algorithm, Synchronized Fixed Horizon Control (SFHC), provides a $1+O(1/w)$ competitive ratio, where $w$ is the number of predictions available to the learner. Our conditions do not require the cost functions to be convex, and we also derive competitive ratio results for non-convex hitting and movement costs. Our results provide the first constant, dimension-free competitive ratio for online non-convex optimization with movement costs. Further, we give an example of a natural instance, Convex Body Chasing (CBC), where the sufficient conditions are not satisfied and we can prove that no online algorithm can have a competitive ratio that converges to 1.
Optimal Experimental Design for Staggered Rollouts
Xiong, Ruoxuan, Athey, Susan, Bayati, Mohsen, Imbens, Guido
Experimentation has become an increasingly prevalent tool for guiding policy choices, firm decisions, and product innovation. A common hurdle in designing experiments is the lack of statistical power. In this paper, we study optimal multi-period experimental design under the constraint that the treatment cannot be easily removed once implemented; for example, a government or firm might implement treatment in different geographies at different times, where the treatment cannot be easily removed due to practical constraints. The design problem is to select which units to treat at which time, intending to test hypotheses about the effect of the treatment. When the potential outcome is a linear function of a unit effect, a time effect, and observed discrete covariates, we provide an analytically feasible solution to the design problem where the variance of the estimator for the treatment effect is at most 1+O(1/N^2) times the variance of the optimal design, where N is the number of units. This solution assigns units in a staggered treatment adoption pattern, where the proportion treated is a linear function of time. In the general setting where outcomes depend on latent covariates, we show that historical data can be utilized in the optimal design. We propose a data-driven local search algorithm with the minimax decision criterion to assign units to treatment times. We demonstrate that our approach improves upon benchmark experimental designs through synthetic experiments on real-world data sets from several domains, including healthcare, finance, and retail. Finally, we consider the case where the treatment effect changes with the time of treatment, showing that the optimal design treats a smaller fraction of units at the beginning and a greater share at the end.
Bayesian Active Learning for Structured Output Design
Matsui, Kota, Kusakawa, Shunya, Ando, Keisuke, Kutsukake, Kentaro, Ujihara, Toru, Takeuchi, Ichiro
In this paper, we propose an active learning method for an inverse problem that aims to find an input that achieves a desired structured-output. The proposed method provides new acquisition functions for minimizing the error between the desired structured-output and the prediction of a Gaussian process model, by effectively incorporating the correlation between multiple outputs of the underlying multi-valued black box output functions. The effectiveness of the proposed method is verified by applying it to two synthetic shape search problem and real data. In the real data experiment, we tackle the input parameter search which achieves the desired crystal growth rate in silicon carbide (SiC) crystal growth modeling, that is a problem of materials informatics.
The Bias-Expressivity Trade-off
Lauw, Julius, Macias, Dominique, Trikha, Akshay, Vendemiatti, Julia, Montanez, George D.
Learning algorithms need bias to generalize and perform better than random guessing. We examine the flexibility (expressivity) of biased algorithms. An expressive algorithm can adapt to changing training data, altering its outcome based on changes in its input. We measure expressivity by using an information-theoretic notion of entropy on algorithm outcome distributions, demonstrating a trade-off between bias and expressivity. To the degree an algorithm is biased is the degree to which it can outperform uniform random sampling, but is also the degree to which is becomes inflexible. We derive bounds relating bias to expressivity, proving the necessary trade-offs inherent in trying to create strongly performing yet flexible algorithms.
A different take on the best-first game tree pruning algorithms
The alpha-beta pruning algorithms have been popular in game tree searching ever since they were discovered. Numerous enhancements are proposed in literature and it is often overwhelming as to which would be the best for implementation. A certain enhancement can take far too long to fine tune its hyper parameters or to decide whether it is going to not make much of a difference due to the memory limitations. On the other hand are the best first pruning techniques, mostly the counterparts of the infamous SSS* algorithm, the algorithm which proved out to be disruptive at the time of its discovery but gradually became outcast as being too memory intensive and having a higher time complexity. Later research doesn't see the best first approaches to be completely different from the depth first based enhancements but both seem to be transitionary in the sense that a best first approach could be looked as a depth first approach with a certain set of enhancements and with the growing power of the computers, SSS* didn't seem to be as taxing on the memory either. Even so, there seems to be quite difficulty in understanding the nature of the SSS* algorithm, why it does what it does and it being termed as being too complex to fathom, visualize and understand on an intellectual level. This article tries to bridge this gap and provide some experimental results comparing the two with the most promising advances.
Bridging Bayesian and Minimax Mean Square Error Estimation via Wasserstein Distributionally Robust Optimization
Nguyen, Viet Anh, Shafieezadeh-Abadeh, Soroosh, Kuhn, Daniel, Esfahani, Peyman Mohajerin
We introduce a distributionally robust minimium mean square error estimation model with a Wasserstein ambiguity set to recover an unknown signal from a noisy observation. The proposed model can be viewed as a zero-sum game between a statistician choosing an estimator---that is, a measurable function of the observation---and a fictitious adversary choosing a prior---that is, a pair of signal and noise distributions ranging over independent Wasserstein balls---with the goal to minimize and maximize the expected squared estimation error, respectively. We show that if the Wasserstein balls are centered at normal distributions, then the zero-sum game admits a Nash equilibrium, where the players' optimal strategies are given by an {\em affine} estimator and a {\em normal} prior, respectively. We further prove that this Nash equilibrium can be computed by solving a tractable convex program. Finally, we develop a Frank-Wolfe algorithm that can solve this convex program orders of magnitude faster than state-of-the-art general purpose solvers. We show that this algorithm enjoys a linear convergence rate and that its direction-finding subproblems can be solved in quasi-closed form.
Minimax Nonparametric Two-sample Test
Xing, Xin, Shang, Zuofeng, Du, Pang, Ma, Ping, Zhong, Wenxuan, Liu, Jun S.
We consider the problem of comparing probability densities between two groups. To model the complex pattern of the underlying densities, we formulate the problem as a nonparametric density hypothesis testing problem. The major difficulty is that conventional tests may fail to distinguish the alternative from the null hypothesis under the controlled type I error. In this paper, we model log-transformed densities in a tensor product reproducing kernel Hilbert space (RKHS) and propose a probabilistic decomposition of this space. Under such a decomposition, we quantify the difference of the densities between two groups by the component norm in the probabilistic decomposition. Based on the Bernstein width, a sharp minimax lower bound of the distinguishable rate is established for the nonparametric two-sample test. We then propose a penalized likelihood ratio (PLR) test possessing the Wilks' phenomenon with an asymptotically Chi-square distributed test statistic and achieving the established minimax testing rate. Simulations and real applications demonstrate that the proposed test outperforms the conventional approaches under various scenarios.
Searching to Exploit Memorization Effect in Learning from Corrupted Labels
Yang, Hansi, Yao, Quanming, Han, Bo, Niu, Gang
Sample-selection approaches, which attempt to pick up clean instances from the noisy training data set, have become one promising direction to robust learning from corrupted labels. These methods all build on the memorization effect, which means deep networks learn easy patterns first and then gradually over-fit the training data set. In this paper, we show how to properly select instances so that the training process can benefit the most from the memorization effect is a hard problem. Specifically, memorization can heavily depend on many factors, e.g., data set and network architecture. Nonetheless, there still exist general patterns of how memorization can occur. These facts motivate us to exploit memorization by automated machine learning (AutoML) techniques. First, we design an expressive but compact search space based on observed general patterns. Then, we propose to use the natural gradient-based search algorithm to efficiently search through space. Finally, extensive experiments on both synthetic data sets and benchmark data sets demonstrate that the proposed method can not only be much efficient than existing AutoML algorithms but can also achieve much better performance than the state-of-the-art approaches for learning from corrupted labels.
Designing unmanned aerial vehicle trajectories for energy minimization
A team of researchers at the University of Luxembourg and the University of Ontario Institute of Technology have recently proposed a new approach to design trajectories for energy-efficient unmanned aerial vehicle (UAV)-enabled wireless communications. Their paper, prepublished on arXiv, specifically focuses on cases in which an UAV acts as a flying base station (BS) to serve ground users (GSs) within some predetermined latency constraints. "Our goal is to design the UAV trajectory to minimize the total energy consumption while satisfying the RT requirement and energy budget, which is accomplished via jointly optimizing the trajectory and UAV's velocities along subsequent hops," the researchers wrote in their paper. Optimizing a UAV's trajectory and its velocities together can be somewhat difficult to achieve. To do so, the researchers developed an approach that carries out two consecutive steps. Their approach entails the use of two distinct algorithms, a heuristic search and a dynamic programming (DP) algorithm.
REMI: Mining Intuitive Referring Expressions on Knowledge Bases
Galárraga, Luis, Delaunay, Julien, Dessalles, Jean-Louis
A referring expression (RE) is a description that identifies a set of instances unambiguously. Mining REs from data finds applications in natural language generation, algorithmic journalism, and data maintenance. Since there may exist multiple REs for a given set of entities, it is common to focus on the most intuitive ones, i.e., the most concise and informative. In this paper we present REMI, a system that can mine intuitive REs on large RDF knowledge bases. Our experimental evaluation shows that REMI finds REs deemed intuitive by users. Moreover we show that REMI is several orders of magnitude faster than an approach based on inductive logic programming.