Search
Exploiting Structure of Uncertainty for Efficient Combinatorial Semi-Bandits
Perrault, Pierre, Perchet, Vianney, Valko, Michal
We improve the efficiency of algorithms for stochastic \emph{combinatorial semi-bandits}. In most interesting problems, state-of-the-art algorithms take advantage of structural properties of rewards, such as \emph{independence}. However, while being minimax optimal in terms of regret, these algorithms are intractable. In our paper, we first reduce their implementation to a specific \emph{submodular maximization}. Then, in case of \emph{matroid} constraints, we design adapted approximation routines, thereby providing the first efficient algorithms that exploit the reward structure. In particular, we improve the state-of-the-art efficient gap-free regret bound by a factor $\sqrt{k}$, where $k$ is the maximum action size. Finally, we show how our improvement translates to more general \emph{budgeted combinatorial semi-bandits}.
Progressive Focus Search for the Static and Stochastic VRPTW with both Random Customers and Reveal Times
Saint-Guillain, Michael, Solnon, Christine, Deville, Yves
Static stochastic VRPs aim at modeling real-life VRPs by considering uncertainty on data. In particular, the SS-VRPTW-CR considers stochastic customers with time windows and does not make any assumption on their reveal times, which are stochastic as well. Based on customer request probabilities, we look for an a priori solution composed preventive vehicle routes, minimizing the expected number of unsatisfied customer requests at the end of the day. A route describes a sequence of strategic vehicle relocations, from which nearby requests can be rapidly reached. Instead of reoptimizing online, a so-called recourse strategy defines the way the requests are handled, whenever they appear. In this paper, we describe a new recourse strategy for the SS-VRPTW-CR, improving vehicle routes by skipping useless parts. We show how to compute the expected cost of a priori solutions, in pseudo-polynomial time, for this recourse strategy. We introduce a new meta-heuristic, called Progressive Focus Search (PFS), which may be combined with any local-search based algorithm for solving static stochastic optimization problems. PFS accelerates the search by using approximation factors: from an initial rough simplified problem, the search progressively focuses to the actual problem description. We evaluate our contributions on a new, real-world based, public benchmark.
The Long and the Short of It: Summarising Event Sequences with Serial Episodes
Tatti, Nikolaj, Vreeken, Jilles
An ideal outcome of pattern mining is a small set of informative patterns, containing no redundancy or noise, that identifies the key structure of the data at hand. Standard frequent pattern miners do not achieve this goal, as due to the pattern explosion typically very large numbers of highly redundant patterns are returned. We pursue the ideal for sequential data, by employing a pattern set mining approach-an approach where, instead of ranking patterns individually, we consider results as a whole. Pattern set mining has been successfully applied to transactional data, but has been surprisingly under studied for sequential data. In this paper, we employ the MDL principle to identify the set of sequential patterns that summarises the data best. In particular, we formalise how to encode sequential data using sets of serial episodes, and use the encoded length as a quality score. As search strategy, we propose two approaches: the first algorithm selects a good pattern set from a large candidate set, while the second is a parameter-free any-time algorithm that mines pattern sets directly from the data. Experimentation on synthetic and real data demonstrates we efficiently discover small sets of informative patterns.
Neural-Network Guided Expression Transformation
Edelmann, Romain, Kunฤak, Viktor
Optimizing compilers, as well as other translator systems, often work by rewriting expressions according to equivalence preserving rules. Given an input expression and its optimized form, finding the sequence of rules that were applied is a non-trivial task. Most of the time, the tools provide no proof, of any kind, of the equivalence between the original expression and its optimized form. In this work, we propose to reconstruct proofs of equivalence of simple mathematical expressions, after the fact, by finding paths of equivalence preserving transformations between expressions. We propose to find those sequences of transformations using a search algorithm, guided by a neural network heuristic. Using a Tree-LSTM recursive neural network, we learn a distributed representation of expressions where the Manhattan distance between vectors approximately corresponds to the rewrite distance between expressions. We then show how the neural network can be efficiently used to search for transformation paths, leading to substantial gain in speed compared to an uninformed exhaustive search. In one of our experiments, our neural-network guided search algorithm is able to solve more instances with a 2 seconds timeout per instance than breadth-first search does with a 5 minutes timeout per instance.
Bootstrapped Coordinate Search for Multidimensional Scaling
In this work, a unified framework for gradient-free Multidimensional Scaling (MDS) based on Coordinate Search (CS) is proposed. This family of algorithms is an instance of General Pattern Search (GPS) methods which avoid the explicit computation of derivatives but instead evaluate the objective function while searching on coordinate steps of the embedding space. The backbone element of CSMDS framework is the corresponding probability matrix that correspond to how likely is each corresponding coordinate to be evaluated. We propose a Bootstrapped instance of CSMDS (BS CSMDS) which enhances the probability of the direction that decreases the most the objective function while also reducing the corresponding probability of all the other coordinates. BS CSMDS manages to avoid unnecessary function evaluations and result to significant speedup over other CSMDS alternatives while also obtaining the same error rate. Experiments on both synthetic and real data reveal that BS CSMDS performs consistently better than other CSMDS alternatives under various experimental setups.
Causal Effect Identification from Multiple Incomplete Data Sources: A General Search-based Approach
Tikka, Santtu, Hyttinen, Antti, Karvanen, Juha
A causal effect is defined as the distribution P (Y do(X), Z) where variables Y are observed, variables X are intervened upon (forced to values irrespective of their natural causes) and variables Z are conditioned on. Instead of placing various parametric restrictions based on background knowledge, we are interested in this paper in the question of identifiability: can the causal effect be uniquely determined from the distributions (data) we have and a graph representing our structural knowledge on the generating causal system. 1 In the most basic setting we are identifying causal effects from a single observational input distribution, corresponding to passively observed data. To solve such problems more generally than what is possible with the backdoor adjustment (Spirtes et al., 1993; Pearl, 2009; Greenland et al., 1999), Pearl (1995) introduced do-calculus, a set of three rules that together with probability theory enable the manipulation of interventional distributions. Shpitser and Pearl (2006a) and Huang and Valtorta (2006) showed that do-calculus is complete by presenting polynomial-time algorithms whose each step can be seen as a rule of do-calculus or as an operation based on basic probability theory. The algorithms have a high practical value because the rules of do-calculus do not by themselves provide an indication on the order in which they should be applied.
Dynamic Planning Networks
Tasfi, Norman, Capretz, Miriam
We introduce Dynamic Planning Networks (DPN), a novel architecture for deep reinforcement learning, that combines model-based and model-free aspects for online planning. Our architecture learns to dynamically construct plans using a learned state-transition model by selecting and traversing between simulated states and actions to maximize information before acting. In contrast to model-free methods, model-based planning lets the agent efficiently test action hypotheses without performing costly trial-and-error in the environment. DPN learns to efficiently form plans by expanding a single action-conditional state transition at a time instead of exhaustively evaluating each action, reducing the required number of state-transitions during planning by up to 96%. We observe various emergent planning patterns used to solve environments, including classical search methods such as breadth-first and depth-first search. DPN shows improved data efficiency, performance, and generalization to new and unseen domains in comparison to several baselines.
Minimax experimental design: Bridging the gap between statistical and worst-case approaches to least squares regression
Dereziลski, Michaล, Clarkson, Kenneth L., Mahoney, Michael W., Warmuth, Manfred K.
In experimental design, we are given a large collection of vectors, each with a hidden response value that we assume derives from an underlying linear model, and we wish to pick a small subset of the vectors such that querying the corresponding responses will lead to a good estimator of the model. A classical approach in statistics is to assume the responses are linear, plus zero-mean i.i.d. Gaussian noise, in which case the goal is to provide an unbiased estimator with smallest mean squared error (A-optimal design). A related approach, more common in computer science, is to assume the responses are arbitrary but fixed, in which case the goal is to estimate the least squares solution using few responses, as quickly as possible, for worst-case inputs. Despite many attempts, characterizing the relationship between these two approaches has proven elusive. We address this by proposing a framework for experimental design where the responses are produced by an arbitrary unknown distribution. We show that there is an efficient randomized experimental design procedure that achieves strong variance bounds for an unbiased estimator using few responses in this general model. Nearly tight bounds for the classical A-optimality criterion, as well as improved bounds for worst-case responses, emerge as special cases of this result. In the process, we develop a new algorithm for a joint sampling distribution called volume sampling, and we propose a new i.i.d. importance sampling method: inverse score sampling. A key novelty of our analysis is in developing new expected error bounds for worst-case regression by controlling the tail behavior of i.i.d. sampling via the jointness of volume sampling. Our result motivates a new minimax-optimality criterion for experimental design which can be viewed as an extension of both A-optimal design and sampling for worst-case regression.
Supervised classification via minimax probabilistic transformations
Mazuelas, Santiago, Zanoni, Andrea, Perez, Aritz
One of the most common and studied problem in machine learning is classification. While conventional algorithms for supervised classification rely on the determination of a function from features to labels, we propose a different approach based on the estimation of a probabilistic transformation from features to labels. Indeed, we determine a conditional probability distribution of the labels given the features and then features are classified as labels following such distribution. In order to compute the conditional distribution, we follow a robust minimax approach, minimizing the worst-case expectation of the 0-1 loss. By doing so, we find the probabilistic transformation which achieves the minimum risk against an uncertainty set consistent with the training data. We show numerical results obtained by an implementation in python of this method and we compare its performance with state of the art techniques.
Local minimax rates for closeness testing of discrete distributions
Lam-Weil, Joseph, Carpentier, Alexandra, Sriperumbudur, Bharath K.
We consider the closeness testing (or two-sample testing) problem in the Poisson vector model - which is known to be asymptotically equivalent to the model of multinomial distributions. The goal is to distinguish whether two data samples are drawn from the same unspecified distribution, or whether their respective distributions are separated in $L_1$-norm. In this paper, we focus on adapting the rate to the shape of the underlying distributions, i.e. we consider a local minimax setting. We provide, to the best of our knowledge, the first local minimax rate for the separation distance up to logarithmic factors, together with a test that achieves it. In view of the rate, closeness testing turns out to be substantially harder than the related one-sample testing problem over a wide range of cases.