Optimization
Factored Contextual Policy Search with Bayesian Optimization
Karkus, Peter, Kupcsik, Andras, Hsu, David, Lee, Wee Sun
Scarce data is a major challenge to scaling robot learning to truly complex tasks, as we need to generalize locally learned policies over different "contexts". Bayesian optimization approaches to contextual policy search (CPS) offer data-efficient policy learning that generalize over a context space. We propose to improve data- efficiency by factoring typically considered contexts into two components: target- type contexts that correspond to a desired outcome of the learned behavior, e.g. target position for throwing a ball; and environment type contexts that correspond to some state of the environment, e.g. initial ball position or wind speed. Our key observation is that experience can be directly generalized over target-type contexts. Based on that we introduce Factored Contextual Policy Search with Bayesian Optimization for both passive and active learning settings. Preliminary results show faster policy generalization on a simulated toy problem.
Supervised topic models for clinical interpretability
Hughes, Michael C., Elibol, Huseyin Melih, McCoy, Thomas, Perlis, Roy, Doshi-Velez, Finale
Supervised topic models can help clinical researchers find interpretable cooccurence patterns in count data that are relevant for diagnostics. However, standard formulations of supervised Latent Dirichlet Allocation have two problems. First, when documents have many more words than labels, the influence of the labels will be negligible. Second, due to conditional independence assumptions in the graphical model the impact of supervised labels on the learned topic-word probabilities is often minimal, leading to poor predictions on heldout data. We investigate penalized optimization methods for training sLDA that produce interpretable topic-word parameters and useful heldout predictions, using recognition networks to speed-up inference. We report preliminary results on synthetic data and on predicting successful anti-depressant medication given a patient's diagnostic history.
Statistical Attribution & Optimization in the B2B World.
There has been a lot of activity recently around revenue attribution - marketers want to develop a better understanding of their customer acquisition funnel and be able to measure progress against it. Most of this attention has been focused on the B2C space. However, less work has been done measuring the performance of B2B marketing activities. While Salesforce is an excellent platform for managing leads and campaigns, their business model is founded on developing a sales and marketing ecosystem comprising partnerships with specialist vendors that can provide more focused solutions to specific sales and marketing issues. As a result, companies such as Full Circle Insights, Bright Funnel and Bizable have emerged to fill the void in B2B marketing attribution by leveraging the Salesforce platform.
Communication Lower Bounds for Distributed Convex Optimization: Partition Data on Features
Chen, Zihao, Luo, Luo, Zhang, Zhihua
Recently, there has been an increasing interest in designing distributed convex optimization algorithms under the setting where the data matrix is partitioned on features. Algorithms under this setting sometimes have many advantages over those under the setting where data is partitioned on samples, especially when the number of features is huge. Therefore, it is important to understand the inherent limitations of these optimization problems. In this paper, with certain restrictions on the communication allowed in the procedures, we develop tight lower bounds on communication rounds for a broad class of non-incremental algorithms under this setting. We also provide a lower bound on communication rounds for a class of (randomized) incremental algorithms.
Hypervolume-based Multi-objective Bayesian Optimization with Student-t Processes
van der Herten, Joachim, Couckuyt, Ivo, Dhaene, Tom
Student-$t$ processes have recently been proposed as an appealing alternative non-parameteric function prior. They feature enhanced flexibility and predictive variance. In this work the use of Student-$t$ processes are explored for multi-objective Bayesian optimization. In particular, an analytical expression for the hypervolume-based probability of improvement is developed for independent Student-$t$ process priors of the objectives. Its effectiveness is shown on a multi-objective optimization problem which is known to be difficult with traditional Gaussian processes.
Influential Node Detection in Implicit Social Networks using Multi-task Gaussian Copula Models
Li, Qunwei, Kailkhura, Bhavya, Thiagarajan, Jayaraman J., Zhang, Zhenliang, Varshney, Pramod K.
Influential node detection is a central research topic in social network analysis. Many existing methods rely on the assumption that the network structure is completely known \textit{a priori}. However, in many applications, network structure is unavailable to explain the underlying information diffusion phenomenon. To address the challenge of information diffusion analysis with incomplete knowledge of network structure, we develop a multi-task low rank linear influence model. By exploiting the relationships between contagions, our approach can simultaneously predict the volume (i.e. time series prediction) for each contagion (or topic) and automatically identify the most influential nodes for each contagion. The proposed model is validated using synthetic data and an ISIS twitter dataset. In addition to improving the volume prediction performance significantly, we show that the proposed approach can reliably infer the most influential users for specific contagions.
Learning to Abstain from Binary Prediction
Consider a general practice physician treating a patient with unusual or ambiguous symptoms. The general practitioner often does not have the capability to confidently diagnose such an ailment. The doctor is faced with a difficult choice: either commit to a potentially erroneous diagnosis and act on it, which can have catastrophic consequences; orabstain from any such diagnosis and refer the patient to a specialist or hospital instead, which is safer but will certainly cost extra time and resources. Such a situation motivates the study of classifiers which are able not only to form a hypothesis about the correct classification, but also abstain entirely from making a prediction. A sufficiently self-aware abstaining classifier might abstain on examples on which it is most unsure about the label, lowering the average prediction error it suffers when it does commit to a prediction. Like the doctor in the example, however, there is typically no use in abstaining on all data, so the amount of overall abstaining is somehow restricted. The classifier must allocate limited abstentions where they will most reduce error. There has been much historical work in decision theory and machine learning on learning such abstaining classifiers (e.g.
Bethe Projections for Non-Local Inference
Vilnis, Luke, Belanger, David, Sheldon, Daniel, McCallum, Andrew
Many inference problems in structured prediction are naturally solved by augmenting a tractable dependency structure with complex, non-local auxiliary objectives. This includes the mean field family of variational inference algorithms, soft- or hard-constrained inference using Lagrangian relaxation or linear programming, collective graphical models, and forms of semi-supervised learning such as posterior regularization. We present a method to discriminatively learn broad families of inference objectives, capturing powerful non-local statistics of the latent variables, while maintaining tractable and provably fast inference using non-Euclidean projected gradient descent with a distance-generating function given by the Bethe entropy. We demonstrate the performance and flexibility of our method by (1) extracting structured citations from research papers by learning soft global constraints, (2) achieving state-of-the-art results on a widely-used handwriting recognition task using a novel learned non-convex inference procedure, and (3) providing a fast and highly scalable algorithm for the challenging problem of inference in a collective graphical model applied to bird migration.
Learning in Quantum Control: High-Dimensional Global Optimization for Noisy Quantum Dynamics
Palittapongarnpim, Pantita, Wittek, Peter, Zahedinejad, Ehsan, Vedaie, Shakib, Sanders, Barry C.
Quantum control is valuable for various quantum technologies such as high-fidelity gates for universal quantum computing, adaptive quantum-enhanced metrology, and ultra-cold atom manipulation. Although supervised machine learning and reinforcement learning are widely used for optimizing control parameters in classical systems, quantum control for parameter optimization is mainly pursued via gradient-based greedy algorithms. Although the quantum fitness landscape is often compatible with greedy algorithms, sometimes greedy algorithms yield poor results, especially for large-dimensional quantum systems. We employ differential evolution algorithms to circumvent the stagnation problem of non-convex optimization. We improve quantum control fidelity for noisy system by averaging over the objective function. To reduce computational cost, we introduce heuristics for early termination of runs and for adaptive selection of search subspaces. Our implementation is massively parallel and vectorized to reduce run time even further. We demonstrate our methods with two examples, namely quantum phase estimation and quantum gate design, for which we achieve superior fidelity and scalability than obtained using greedy algorithms.
The Machine Learning Workflow (IT Best Kept Secret Is Optimization)
I have been giving two talks recently on the machine learning workflow, discussing pain points within it and how we might address them. First one was at Spark Summit Europe at Brussels, the other one at MLConf at San Francisco. You can find videos and slides for each below. Main message is that the machine learning workflow is not that simple. That was a great event.