Goto

Collaborating Authors

 Directed Networks


Shaping the Narrative Arc: An Information-Theoretic Approach to Collaborative Dialogue

arXiv.org Artificial Intelligence

We consider the problem of designing an artificial agent capable of interacting with humans in collaborative dialogue to produce creative, engaging narratives. In this task, the goal is to establish universe details, and to collaborate on an interesting story in that universe, through a series of natural dialogue exchanges. Our model can augment any probabilistic conversational agent by allowing it to reason about universe information established and what potential next utterances might reveal. Ideally, with each utterance, agents would reveal just enough information to add specificity and reduce ambiguity without limiting the conversation. We empirically show that our model allows control over the rate at which the agent reveals information and that doing so significantly improves accuracy in predicting the next line of dialogues from movies. We close with a case-study with four professional theatre performers, who preferred interactions with our model-augmented agent over an unaugmented agent.


A large-scale crowdsourced analysis of abuse against women journalists and politicians on Twitter

arXiv.org Machine Learning

We report the first, to the best of our knowledge, hand-in-hand collaboration between human rights activists and machine learners, leveraging crowd-sourcing to study online abuse against women on Twitter. On a technical front, we carefully curate an unbiased yet low-variance dataset of labeled tweets, analyze it to account for the variability of abuse perception, and establish baselines, preparing it for release to community research efforts. On a social impact front, this study provides the technical backbone for a media campaign aimed at raising public and deciders' awareness and elevating the standards expected from social media companies.


Bayesian active learning for optimization and uncertainty quantification in protein docking

arXiv.org Machine Learning

Motivation: Ab initio protein docking represents a major challenge for optimizing a noisy and costly "black box"-like function in a high-dimensional space. Despite progress in this field, there is no docking method available for rigorous uncertainty quantification (UQ) of its solution quality (e.g. interface RMSD or iRMSD). Results: We introduce a novel algorithm, Bayesian Active Learning (BAL), for optimization and UQ of such black-box functions and flexible protein docking. BAL directly models the posterior distribution of the global optimum (or native structures for protein docking) with active sampling and posterior estimation iteratively feeding each other. Furthermore, we use complex normal modes to represent a homogeneous Euclidean conformation space suitable for high-dimension optimization and construct funnel-like energy models for encounter complexes. Over a protein docking benchmark set and a CAPRI set including homology docking, we establish that BAL significantly improve against both starting points by rigid docking and refinements by particle swarm optimization, providing for one third targets a top-3 near-native prediction. BAL also generates tight confidence intervals with half range around 25% of iRMSD and confidence level at 85%. Its estimated probability of a prediction being native or not achieves binary classification AUROC at 0.93 and AUPRC over 0.60 (compared to 0.14 by chance); and also found to help ranking predictions. To the best of our knowledge, this study represents the first uncertainty quantification solution for protein docking, with theoretical rigor and comprehensive assessment. Source codes are available at https://github.com/Shen-Lab/BAL.


ProBO: a Framework for Using Probabilistic Programming in Bayesian Optimization

arXiv.org Machine Learning

Optimizing an expensive-to-query function is a common task in science and engineering, where it is beneficial to keep the number of queries to a minimum. A popular strategy is Bayesian optimization (BO), which leverages probabilistic models for this task. Most BO today uses Gaussian processes (GPs), or a few other surrogate models. However, there is a broad set of Bayesian modeling techniques that we may want to use to capture complex systems and reduce the number of queries. Probabilistic programs (PPs) are modern tools that allow for flexible model composition, incorporation of prior information, and automatic inference. In this paper, we develop ProBO, a framework for BO using only standard operations common to most PPs. This allows a user to drop in an arbitrary PP implementation and use it directly in BO. To do this, we describe black box versions of popular acquisition functions that can be used in our framework automatically, without model-specific derivation, and show how to optimize these functions. We also introduce a model, which we term the Bayesian Product of Experts, that integrates into ProBO and can be used to combine information from multiple models implemented with different PPs. We show empirical results using multiple PP implementations, and compare against standard BO methods.


Minimizing Negative Transfer of Knowledge in Multivariate Gaussian Processes: A Scalable and Regularized Approach

arXiv.org Machine Learning

Recently there has been an increasing interest in the multivariate Gaussian process (MGP) which extends the Gaussian process (GP) to deal with multiple outputs. One approach to construct the MGP and account for non-trivial commonalities amongst outputs employs a convolution process (CP). The CP is based on the idea of sharing latent functions across several convolutions. Despite the elegance of the CP construction, it provides new challenges that need yet to be tackled. First, even with a moderate number of outputs, model building is extremely prohibitive due to the huge increase in computational demands and number of parameters to be estimated. Second, the negative transfer of knowledge may occur when some outputs do not share commonalities. In this paper we address these issues. We propose a regularized pairwise modeling approach for the MGP established using CP. The key feature of our approach is to distribute the estimation of the full multivariate model into a group of bivariate GPs which are individually built. Interestingly pairwise modeling turns out to possess unique characteristics, which allows us to tackle the challenge of negative transfer through penalizing the latent function that facilitates information sharing in each bivariate model. Predictions are then made through combining predictions from the bivariate models within a Bayesian framework. The proposed method has excellent scalability when the number of outputs is large and minimizes the negative transfer of knowledge between uncorrelated outputs. Statistical guarantees for the proposed method are studied and its advantageous features are demonstrated through numerical studies.


Toward Sensor-based Sleep Monitoring with Electrodermal Activity Measures

arXiv.org Machine Learning

We use self-report and electrodermal activity (EDA) wearable sensor data from 77 nights of sleep on six participants to test the efficacy of EDA data for sleep monitoring. We used factor analysis to find latent factors in the EDA data, and causal model search to find the most probable graphical model accounting for self-reported sleep efficiency (SE), sleep quality (SQ), and the latent EDA factors. Structural equation modeling was used to confirm fit of the extracted graph. Based on the generated graph, logistic regression and naive Bayes models were used to test the efficacy of the EDA data in predicting SE and SQ. Six EDA features extracted from the total signal over a night's sleep could be explained by two latent factors, EDA Magnitude and EDA Storms. EDA Magnitude performed as a strong predictor for SE to aid detection of substantial changes in time asleep. The performance of EDA Magnitured and SE in classifying SQ showed promise for wearable sleep monitoring applications. However, our data suggest that obtaining a more accurate sensor-based measure of SE will be necessary before smaller changes in SQ can be detected from EDA sensor data alone.


Sequential Bayesian Detection of Spike Activities from Fluorescence Observations

arXiv.org Machine Learning

Extracting and detecting spike activities from the fluorescence observations is an important step in understanding how neuron systems work. The main challenge lies in that the combination of the ambient noise with dynamic baseline fluctuation, often contaminates the observations, thereby deteriorating the reliability of spike detection. This may be even worse in the face of the nonlinear biological process, the coupling interactions between spikes and baseline, and the unknown critical parameters of an underlying physiological model, in which erroneous estimations of parameters will affect the detection of spikes causing further error propagation. In this paper, we propose a random finite set (RFS) based Bayesian approach. The dynamic behaviors of spike sequence, fluctuated baseline and unknown parameters are formulated as one RFS. This RFS state is capable of distinguishing the hidden active/silent states induced by spike and non-spike activities respectively, thereby \emph{negating the interaction role} played by spikes and other factors. Then, premised on the RFS states, a Bayesian inference scheme is designed to simultaneously estimate the model parameters, baseline, and crucial spike activities. Our results demonstrate that the proposed scheme can gain an extra $12\%$ detection accuracy in comparison with the state-of-the-art MLSpike method.


Divergence Triangle for Joint Training of Generator Model, Energy-based Model, and Inference Model

arXiv.org Machine Learning

This paper proposes the divergence triangle as a framework for joint training of generator model, energy-based model and inference model. The divergence triangle is a compact and symmetric (anti-symmetric) objective function that seamlessly integrates variational learning, adversarial learning, wake-sleep algorithm, and contrastive divergence in a unified probabilistic formulation. This unification makes the processes of sampling, inference, energy evaluation readily available without the need for costly Markov chain Monte Carlo methods. Our experiments demonstrate that the divergence triangle is capable of learning (1) an energy-based model with well-formed energy landscape, (2) direct sampling in the form of a generator network, and (3) feed-forward inference that faithfully reconstructs observed as well as synthesized data. The divergence triangle is a robust training method that can learn from incomplete data.


A Convolutional Neural Network for the Automatic Diagnosis of Collagen VI related Muscular Dystrophies

arXiv.org Machine Learning

The symptoms include proximal and axial muscle weakness, distal hyperlaxity, joint contractures, and critical respiratory insufficiency,which requires assisted ventilation and results in a reduced live expectancy. Moreover, the skin and other connective tissues where collagen VI is abundant are also affected [1, 2]. The collagen VI structural defects are related to mutations of three main genes (COL6A1, COL6A2, and COL6A3, OMIM 254090 and 158810). Thus, the new advances in genome editing tools open the possibility to successfully treat these neuromuscular diseases for the first time. This opportunity, though, comes with important challenges. Beyond the challenges of gene editing, this paper focuses on the challenges arising when trying to formally evaluate the efficiency of the therapeutic approaches in the recovery of the collagen VI microfibrillar network.


Bayesian nonparametric multiway regression for clustered binomial data

arXiv.org Machine Learning

We introduce a Bayesian nonparametric regression model for data with multiway (tensor) structure, motivated by an application to periodontal disease (PD) data. Our outcome is the number of diseased sites measured over four different tooth types for each subject, with subject-specific covariates available as predictors. The outcomes are not well-characterized by simple parametric models, so we use a nonparametric approach with a binomial likelihood wherein the latent probabilities are drawn from a mixture with an arbitrary number of components, analogous to a Dirichlet Process (DP). We use a flexible probit stick-breaking formulation for the component weights that allows for covariate dependence and clustering structure in the outcomes. The parameter space for this model is large and multiway: patients $\times$ tooth types $\times$ covariates $\times$ components. We reduce its effective dimensionality, and account for the multiway structure, via low-rank assumptions. We illustrate how this can improve performance, and simplify interpretation, while still providing sufficient flexibility. We describe a general and efficient Gibbs sampling algorithm for posterior computation. The resulting fit to the PD data outperforms competitors, and is interpretable and well-calibrated. An interactive visual of the predictive model is available at http://ericfrazerlock.com/toothdata/ToothDisplay.html , and the code is available at https://github.com/lockEF/NonparametricMultiway .