Genre
Algorithm Runtime Prediction: Methods & Evaluation
Hutter, Frank, Xu, Lin, Hoos, Holger H., Leyton-Brown, Kevin
Perhaps surprisingly, it is possible to predict how long an algorithm will take to run on a previously unseen input, using machine learning techniques to build a model of the algorithm's runtime as a function of problem-specific instance features. Such models have important applications to algorithm analysis, portfolio-based algorithm selection, and the automatic configuration of parameterized algorithms. Over the past decade, a wide variety of techniques have been studied for building such models. Here, we describe extensions and improvements of existing models, new families of models, and -- perhaps most importantly -- a much more thorough treatment of algorithm parameters as model inputs. We also comprehensively describe new and existing features for predicting algorithm runtime for propositional satisfiability (SAT), travelling salesperson (TSP) and mixed integer programming (MIP) problems. We evaluate these innovations through the largest empirical analysis of its kind, comparing to a wide range of runtime modelling techniques from the literature. Our experiments consider 11 algorithms and 35 instance distributions; they also span a very wide range of SAT, MIP, and TSP instances, with the least structured having been generated uniformly at random and the most structured having emerged from real industrial applications. Overall, we demonstrate that our new models yield substantially better runtime predictions than previous approaches in terms of their generalization to new problem instances, to new algorithms from a parameterized space, and to both simultaneously.
Scaling SVM and Least Absolute Deviations via Exact Data Reduction
Wang, Jie, Wonka, Peter, Ye, Jieping
The support vector machine (SVM) is a widely used method for classification. Although many efforts have been devoted to develop efficient solvers, it remains challenging to apply SVM to large-scale problems. A nice property of SVM is that the non-support vectors have no effect on the resulting classifier. Motivated by this observation, we present fast and efficient screening rules to discard non-support vectors by analyzing the dual problem of SVM via variational inequalities (DVI). As a result, the number of data instances to be entered into the optimization can be substantially reduced. Some appealing features of our screening method are: (1) DVI is safe in the sense that the vectors discarded by DVI are guaranteed to be non-support vectors; (2) the data set needs to be scanned only once to run the screening, whose computational cost is negligible compared to that of solving the SVM problem; (3) DVI is independent of the solvers and can be integrated with any existing efficient solvers. We also show that the DVI technique can be extended to detect non-support vectors in the least absolute deviations regression (LAD). To the best of our knowledge, there are currently no screening methods for LAD. We have evaluated DVI on both synthetic and real data sets. Experiments indicate that DVI significantly outperforms the existing state-of-the-art screening rules for SVM, and is very effective in discarding non-support vectors for LAD. The speedup gained by DVI rules can be up to two orders of magnitude.
Predicting the NFL using Twitter
Sinha, Shiladitya, Dyer, Chris, Gimpel, Kevin, Smith, Noah A.
We study the relationship between social media output and National Football League (NFL) games, using a dataset containing messages from Twitter and NFL game statistics. Specifically, we consider tweets pertaining to specific teams and games in the NFL season and use them alongside statistical game data to build predictive models for future game outcomes (which team will win?) and sports betting outcomes (whichteamwill winwiththepointspread?will thetotalpointsbe over/under the line?). We experiment with several feature sets and find that simple features using large volumes of tweets can match or exceed the performance of more traditional features that use game statistics.
A feasible roadmap for unsupervised deconvolution of two-source mixed gene expressions
Wang, Niya, Hoffman, Eric P., Clarke, Robert, Zhang, Zhen, Herrington, David M., Shih, Ie-Ming, Levine, Douglas A., Yu, Guoqiang, Xuan, Jianhua, Wang, Yue
Tissue heterogeneity is a major confounding factor in studying individual populations that cannot be resolved directly by global profiling (Hoffman, et al., 2004). Experimental solutions to mitigate tissue heterogeneity are expensive, time consuming, inapplicable to existing data, and may alter the original gene expression patterns (Kuhn, et al., 2011; Shen-Orr, et al., 2010). Alternatively, various in silico methods perform basically a supervised deconvolution based on either externally-obtained constituent proportions (Shen-Orr, et al., 2010; Stuart, et al., 2004) or previously-acquired cell-specific signatures (Kuhn, et al., 2011; Lu, et al., 2003). In the earlier issues of this journal, a few articles have reported semi-supervised methods that were specifically focused on dissecting two-source mixed gene expressions. Gosink et al. used (known) expression data from a single cell type to determine the proportion (and subsequently expression profile) of each cell type in a heterogeneous sample (Gosink, et al., 2007).
MLI: An API for Distributed Machine Learning
Sparks, Evan R., Talwalkar, Ameet, Smith, Virginia, Kottalam, Jey, Pan, Xinghao, Gonzalez, Joseph, Franklin, Michael J., Jordan, Michael I., Kraska, Tim
The recent success stories of machine learning (ML) driven applications have created an increasing demand for scalable ML solutions. Nonetheless, ML researchers often prefer to code their solutions in statistical computing languages such as MATLAB or R, as these languages allow them to code in fewer lines using syntax that resembles high-level pseudocode. MATLAB and R allow researchers to avoid low-level implementation details, leading to quickly developed prototypes that are often sufficient for small scale exploration. However, these prototypes are typically ad-hoc, non-robust, and non-scalable implementations. In contrast, industrial implementations of these solutions often require a relatively heavy amount of development effort and are difficult to change once implemented.
Perturbative Corrections for Approximate Inference in Gaussian Latent Variable Models
Opper, Manfred, Paquet, Ulrich, Winther, Ole
Expectation Propagation (EP) provides a framework for approximate inference. When the model under consideration is over a latent Gaussian field, with the approximation being Gaussian, we show how these approximations can systematically be corrected. A perturbative expansion is made of the exact but intractable correction, and can be applied to the model's partition function and other moments of interest. The correction is expressed over the higher-order cumulants which are neglected by EP's local matching of moments. Through the expansion, we see that EP is correct to first order. By considering higher orders, corrections of increasing polynomial complexity can be applied to the approximation. The second order provides a correction in quadratic time, which we apply to an array of Gaussian process and Ising models. The corrections generalize to arbitrarily complex approximating families, which we illustrate on tree-structured Ising model approximations. Furthermore, they provide a polynomial-time assessment of the approximation error. We also provide both theoretical and practical insights on the exactness of the EP solution.
Durkheim Project Data Analysis Report
This report describes the suicidality prediction models created under the DARPA DCAPS program in association with the Durkheim Project [http://durkheimproject.org/]. The models were built primarily from unstructured text (free-format clinician notes) for several hundred patient records obtained from the Veterans Health Administration (VHA). The models were constructed using a genetic programming algorithm applied to bag-of-words and bag-of-phrases datasets. The influence of additional structured data was explored but was found to be minor. Given the small dataset size, classification between cohorts was high fidelity (98%). Cross-validation suggests these models are reasonably predictive, with an accuracy of 50% to 69% on five rotating folds, with ensemble averages of 58% to 67%. One particularly noteworthy result is that word-pairs can dramatically improve classification accuracy; but this is the case only when one of the words in the pair is already known to have a high predictive value. By contrast, the set of all possible word-pairs does not improve on a simple bag-of-words model.
Randomized co-training: from cortical neurons to machine learning and back again
Despite its size and complexity, the human cortex exhibits striking anatomical regularities, suggesting there may simple meta-algorithms underlying cortical learning and computation. We expect such meta-algorithms to be of interest since they need to operate quickly, scalably and effectively with little-to-no specialized assumptions. This note focuses on a specific question: How can neurons use vast quantities of unlabeled data to speed up learning from the comparatively rare labels provided by reward systems? As a partial answer, we propose randomized co-training as a biologically plausible meta-algorithm satisfying the above requirements. As evidence, we describe a biologically-inspired algorithm, Correlated Nystrom Views (XNV) that achieves state-of-the-art performance in semi-supervised learning, and sketch work in progress on a neuronal implementation.
Mean Field Bayes Backpropagation: scalable training of multilayer neural networks with binary weights
Significant success has been reported recently using deep neural networks for classification. Such large networks can be computationally intensive, even after training is over. Implementing these trained networks in hardware chips with a limited precision of synaptic weights may improve their speed and energy efficiency by several orders of magnitude, thus enabling their integration into small and low-power electronic devices. With this motivation, we develop a computationally efficient learning algorithm for multilayer neural networks with binary weights, assuming all the hidden neurons have a fan-out of one. This algorithm, derived within a Bayesian probabilistic online setting, is shown to work well for both synthetic and real-world problems, performing comparably to algorithms with real-valued weights, while retaining computational tractability.
Active Learning of Linear Embeddings for Gaussian Processes
Garnett, Roman, Osborne, Michael A., Hennig, Philipp
We propose an active learning method for discovering low-dimensional structure in high-dimensional Gaussian process (GP) tasks. Such problems are increasingly frequent and important, but have hitherto presented severe practical difficulties. We further introduce a novel technique for approximately marginalizing GP hyperparameters, yielding marginal predictions robust to hyperparameter mis-specification. Our method offers an efficient means of performing GP regression, quadrature, or Bayesian optimization in high-dimensional spaces.