Uncertainty
Molecular De Novo Design through Deep Reinforcement Learning
Olivecrona, Marcus, Blaschke, Thomas, Engkvist, Ola, Chen, Hongming
This work introduces a method to tune a sequence-based generative model for molecular de novo design that through augmented episodic likelihood can learn to generate structures with certain specified desirable properties. We demonstrate how this model can execute a range of tasks such as generating analogues to a query structure and generating compounds predicted to be active against a biological target. As a proof of principle, the model is first trained to generate molecules that do not contain sulphur. As a second example, the model is trained to generate analogues to the drug Celecoxib, a technique that could be used for scaffold hopping or library expansion starting from a single molecule. Finally, when tuning the model towards generating compounds predicted to be active against the dopamine receptor type 2, the model generates structures of which more than 95% are predicted to be active, including experimentally confirmed actives that have not been included in either the generative model nor the activity prediction model.
A Bayesian algorithm for distributed network localization using distance and direction data
Naseri, Hassan, Koivunen, Visa
A reliable, accurate, and affordable positioning service is highly required in wireless networks. In this paper, the novel Message Passing Hybrid Localization (MPHL) algorithm is proposed to solve the problem of cooperative distributed localization using distance and direction estimates. This hybrid approach combines two sensing modalities to reduce the uncertainty in localizing the network nodes. A statistical model is formulated for the problem, and approximate minimum mean square error (MMSE) estimates of the node locations are computed. The proposed MPHL is a distributed algorithm based on belief propagation (BP) and Markov chain Monte Carlo (MCMC) sampling. It improves the identifiability of the localization problem and reduces its sensitivity to the anchor node geometry, compared to distance-only or direction-only localization techniques. For example, the unknown location of a node can be found if it has only a single neighbor; and a whole network can be localized using only a single anchor node. Numerical results are presented showing that the average localization error is significantly reduced in almost every simulation scenario, about 50% in most cases, compared to the competing algorithms.
Plausible Deniability for Privacy-Preserving Data Synthesis
Bindschaedler, Vincent, Shokri, Reza, Gunter, Carl A.
Releasing full data records is one of the most challenging problems in data privacy. On the one hand, many of the popular techniques such as data de-identification are problematic because of their dependence on the background knowledge of adversaries. On the other hand, rigorous methods such as the exponential mechanism for differential privacy are often computationally impractical to use for releasing high dimensional data or cannot preserve high utility of original data due to their extensive data perturbation. This paper presents a criterion called plausible deniability that provides a formal privacy guarantee, notably for releasing sensitive datasets: an output record can be released only if a certain amount of input records are indistinguishable, up to a privacy parameter. This notion does not depend on the background knowledge of an adversary. Also, it can efficiently be checked by privacy tests. We present mechanisms to generate synthetic datasets with similar statistical properties to the input data and the same format. We study this technique both theoretically and experimentally. A key theoretical result shows that, with proper randomization, the plausible deniability mechanism generates differentially private synthetic data. We demonstrate the efficiency of this generative technique on a large dataset; it is shown to preserve the utility of original data with respect to various statistical analysis and machine learning measures.
Fast Low-Rank Bayesian Matrix Completion with Hierarchical Gaussian Prior Models
Yang, Linxiao, Fang, Jun, Duan, Huiping, Li, Hongbin, Zeng, Bing
The problem of low rank matrix completion is considered in this paper. To exploit the underlying low-rank structure of the data matrix, we propose a hierarchical Gaussian prior model, where columns of the low-rank matrix are assumed to follow a Gaussian distribution with zero mean and a common precision matrix, and a Wishart distribution is specified as a hyperprior over the precision matrix. We show that such a hierarchical Gaussian prior has the potential to encourage a low-rank solution. Based on the proposed hierarchical prior model, a variational Bayesian method is developed for matrix completion, where the generalized approximate massage passing (GAMP) technique is embedded into the variational Bayesian inference in order to circumvent cumbersome matrix inverse operations. Simulation results show that our proposed method demonstrates superiority over existing state-of-the-art matrix completion methods.
Logical Formalizations of Commonsense Reasoning: A Survey
Commonsense reasoning is in principle a central problem in artificial intelligence, but it is a very difficult one. One approach that has been pursued since the earliest days of the field has been to encode commonsense knowledge as statements in a logic-based representation language and to implement commonsense reasoning as some form of logical inference. This paper surveys the use of logic-based representations of commonsense knowledge in artificial intelligence research.
Bayesian Compressive Sensing Using Normal Product Priors
Zhou, Zhou, Liu, Kaihui, Fang, Jun
In this paper, we introduce a new sparsity-promoting prior, namely, the "normal product" prior, and develop an efficient algorithm for sparse signal recovery under the Bayesian framework. The normal product distribution is the distribution of a product of two normally distributed variables with zero means and possibly different variances. Like other sparsity-encouraging distributions such as the Student's $t$-distribution, the normal product distribution has a sharp peak at origin, which makes it a suitable prior to encourage sparse solutions. A two-stage normal product-based hierarchical model is proposed. We resort to the variational Bayesian (VB) method to perform the inference. Simulations are conducted to illustrate the effectiveness of our proposed algorithm as compared with other state-of-the-art compressed sensing algorithms.
Models of retrieval in sentence comprehension: A computational evaluation using Bayesian hierarchical modeling
Nicenboim, Bruno, Vasishth, Shravan
Research on interference has provided evidence that the formation of dependencies between non-adjacent words relies on a cue-based retrieval mechanism. Two different models can account for one of the main predictions of interference, i.e., a slowdown at a retrieval site, when several items share a feature associated with a retrieval cue: Lewis and Vasishth's (2005) activation-based model and McElree's (2000) direct access model. Even though these two models have been used almost interchangeably, they are based on different assumptions and predict differences in the relationship between reading times and response accuracy. The activation-based model follows the assumptions of ACT-R, and its retrieval process behaves as a lognormal race between accumulators of evidence with a single variance. Under this model, accuracy of the retrieval is determined by the winner of the race and retrieval time by its rate of accumulation. In contrast, the direct access model assumes a model of memory where only the probability of retrieval varies between items; in this model, differences in latencies are a by-product of the possibility and repairing incorrect retrievals. We implemented both models in a Bayesian hierarchical framework in order to evaluate them and compare them. We show that some aspects of the data are better fit under the direct access model than under the activation-based model. We suggest that this finding does not rule out the possibility that retrieval may be behaving as a race model with assumptions that follow less closely the ones from the ACT-R framework. We show that by introducing a modification of the activation model, i.e, by assuming that the accumulation of evidence for retrieval of incorrect items is not only slower but noisier (i.e., different variances for the correct and incorrect items), the model can provide a fit as good as the one of the direct access model.
Massively-Parallel Feature Selection for Big Data
Tsamardinos, Ioannis, Borboudakis, Giorgos, Katsogridakis, Pavlos, Pratikakis, Polyvios, Christophides, Vassilis
We present the Parallel, Forward-Backward with Pruning (PFBP) algorithm for feature selection (FS) in Big Data settings (high dimensionality and/or sample size). To tackle the challenges of Big Data FS PFBP partitions the data matrix both in terms of rows (samples, training examples) as well as columns (features). By employing the concepts of $p$-values of conditional independence tests and meta-analysis techniques PFBP manages to rely only on computations local to a partition while minimizing communication costs. Then, it employs powerful and safe (asymptotically sound) heuristics to make early, approximate decisions, such as Early Dropping of features from consideration in subsequent iterations, Early Stopping of consideration of features within the same iteration, or Early Return of the winner in each iteration. PFBP provides asymptotic guarantees of optimality for data distributions faithfully representable by a causal network (Bayesian network or maximal ancestral graph). Our empirical analysis confirms a super-linear speedup of the algorithm with increasing sample size, linear scalability with respect to the number of features and processing cores, while dominating other competitive algorithms in its class.
Bayesian Learning of Clique Tree Structure
Savkli, Cetin, Carr, J. Ryan, Graff, Philip, Kennell, Lauren
The problem of categorical data analysis in high dimensions is considered. A discussion of the fundamental difficulties of probability modeling is provided, and a solution to the derivation of high dimensional probability distributions based on Bayesian learning of clique tree decomposition is presented. The main contributions of this paper are an automated determination of the optimal clique tree structure for probability modeling, the resulting derived probability distribution, and a corresponding unified approach to clustering and anomaly detection based on the probability distribution.