Goto

Collaborating Authors

 ijl


Coarse-graining conformational dynamics with multi-dimensional generalized Langevin equation: how, when, and why

Xie, Pinchen, Qiu, Yunrui, E, Weinan

arXiv.org Artificial Intelligence

A data-driven ab initio generalized Langevin equation (AIGLE) approach is developed to learn and simulate high-dimensional, heterogeneous, coarse-grained conformational dynamics. Constrained by the fluctuation-dissipation theorem, the approach can build coarse-grained models in dynamical consistency with all-atom molecular dynamics. We also propose practical criteria for AIGLE to enforce long-term dynamical consistency. Case studies of a toy polymer, with 20 coarse-grained sites, and the alanine dipeptide, with two dihedral angles, elucidate why one should adopt AIGLE or its Markovian limit for modeling coarse-grained conformational dynamics in practice.


A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts

Nguyen, Huy, Akbarian, Pedram, Nguyen, TrungTin, Ho, Nhat

arXiv.org Machine Learning

Mixture-of-experts (MoE) model incorporates the power of multiple submodels via gating functions to achieve greater performance in numerous regression and classification applications. From a theoretical perspective, while there have been previous attempts to comprehend the behavior of that model under the regression settings through the convergence analysis of maximum likelihood estimation in the Gaussian MoE model, such analysis under the setting of a classification problem has remained missing in the literature. We close this gap by establishing the convergence rates of density estimation and parameter estimation in the softmax gating multinomial logistic MoE model. Notably, when part of the expert parameters vanish, these rates are shown to be slower than polynomial rates owing to an inherent interaction between the softmax gating and expert functions via partial differential equations. To address this issue, we propose using a novel class of modified softmax gating functions which transform the input value before delivering them to the gating functions. As a result, the previous interaction disappears and the parameter estimation rates are significantly improved.


Lifted Inference beyond First-Order Logic

Malhotra, Sagar, Bizzaro, Davide, Serafini, Luciano

arXiv.org Artificial Intelligence

Weighted First Order Model Counting (WFOMC) is fundamental to probabilistic inference in statistical relational learning models. As WFOMC is known to be intractable in general ($\#$P-complete), logical fragments that admit polynomial time WFOMC are of significant interest. Such fragments are called domain liftable. Recent works have shown that the two-variable fragment of first order logic extended with counting quantifiers ($\mathrm{C^2}$) is domain-liftable. However, many properties of real-world data, like acyclicity in citation networks and connectivity in social networks, cannot be modeled in $\mathrm{C^2}$, or first order logic in general. In this work, we expand the domain liftability of $\mathrm{C^2}$ with multiple such properties. We show that any $\mathrm{C^2}$ sentence remains domain liftable when one of its relations is restricted to represent a directed acyclic graph, a connected graph, a tree (resp. a directed tree) or a forest (resp. a directed forest). All our results rely on a novel and general methodology of "counting by splitting". Besides their application to probabilistic inference, our results provide a general framework for counting combinatorial structures. We expand a vast array of previous results in discrete mathematics literature on directed acyclic graphs, phylogenetic networks, etc.


Weighted First Order Model Counting with Directed Acyclic Graph Axioms

Malhotra, Sagar, Serafini, Luciano

arXiv.org Artificial Intelligence

Statistical Relational Learning (SRL) integrates First-Order Logic (FOL) and probability theory for learning and inference over relational data. Probabilistic inference and learning in many SRL models can be reduced to Weighted First Order Model Counting (WFOMC). However, WFOMC is known to be intractable ($\mathrm{\#P_1-}$ complete). Hence, logical fragments that admit polynomial time WFOMC are of significant interest. Such fragments are called domain liftable. Recent line of works have shown the two-variable fragment of FOL, extended with counting quantifiers ($\mathrm{C^2}$) to be domain-liftable. However, many properties of real-world data can not be modelled in $\mathrm{C^2}$. In fact many ubiquitous properties of real-world data are inexressible in FOL. Acyclicity is one such property, found in citation networks, genealogy data, temporal data e.t.c. In this paper we aim to address this problem by investigating the domain liftability of directed acyclicity constraints. We show that the fragment $\mathrm{C^2}$ with a Directed Acyclic Graph (DAG) axiom, i.e., a predicate in the language is axiomatized to represent a DAG, is domain-liftable. We present a method based on principle of inclusion-exclusion for WFOMC of $\mathrm{C^2}$ formulas extended with DAG axioms.


On confidence intervals for precision matrices and the eigendecomposition of covariance matrices

Popordanoska, Teodora, Tiulpin, Aleksei, Bounliphone, Wacha, Blaschko, Matthew B.

arXiv.org Artificial Intelligence

The eigendecomposition of a matrix is the central procedure in probabilistic models based on matrix factorization, for instance principal component analysis and topic models. Quantifying the uncertainty of such a decomposition based on a finite sample estimate is essential to reasoning under uncertainty when employing such models. This paper tackles the challenge of computing confidence bounds on the individual entries of eigenvectors of a covariance matrix of fixed dimension. Moreover, we derive a method to bound the entries of the inverse covariance matrix, the so-called precision matrix. The assumptions behind our method are minimal and require that the covariance matrix exists, and its empirical estimator converges to the true covariance. We make use of the theory of U-statistics to bound the $L_2$ perturbation of the empirical covariance matrix. From this result, we obtain bounds on the eigenvectors using Weyl's theorem and the eigenvalue-eigenvector identity and we derive confidence intervals on the entries of the precision matrix using matrix inversion perturbation bounds. As an application of these results, we demonstrate a new statistical test, which allows us to test for non-zero values of the precision matrix. We compare this test to the well-known Fisher-z test for partial correlations, and demonstrate the soundness and scalability of the proposed statistical test, as well as its application to real-world data from medical and physics domains.


Tensorial and bipartite block models for link prediction in layered networks and temporal networks

Tarres-Deulofeu, Marc, Godoy-Lorite, Antonia, Guimera, Roger, Sales-Pardo, Marta

arXiv.org Machine Learning

Imagine a team of researchers looking for promising drug combinations to treat a specific cancer type for which current treatments are ineffective. The team has data on the effect of certain pairs of drugs on other cancer types, but the data are very sparse--only a few drug pairs have been tested on each cancer type, and each drug pair is tested in a few cancer types, at best, or has never been tested at all. The challenge is to select the most promising drug pairs for testing with the target cancer type, so as to minimize the cost associated to unsuccessful tests. We can formalize this challenge as the following inference problem: We have a partial observation of the pairwise interactions between a set of nodes (drugs) in different "network layers" (cancer types), and we need to infer which are the unobserved interactions within each layer (drug interactions in each cancer type). This challenge is relevant for the many systems that can be represented as multilayer networks [1-4], and is also formally analogous to the challenge of predicting the existence of interactions between nodes in time-resolved networks [5-11]. For instance, we would face the same situation if we had data about the daily email or phone communications between users, and wanted to infer the existence of interactions between pairs of users on a certain unobserved day; in this case each layer would be a different day. Here, we introduce new generative models that are suitable to address the challenge above. We model all layers concurrently, so that our approach takes full advantage of the information contained in all layers to make predictions for any one of them.


Safe Triplet Screening for Distance Metric Learning

Yoshida, Tomoki, Takeuchi, Ichiro, Karasuyama, Masayuki

arXiv.org Machine Learning

We study safe screening for metric learning. Distance metric learning can optimize a metric over a set of triplets, each one of which is defined by a pair of same class instances and an instance in a different class. However, the number of possible triplets is quite huge even for a small dataset. Our safe triplet screening identifies triplets which can be safely removed from the optimization problem without losing the optimality. Compared with existing safe screening studies, triplet screening is particularly significant because of (1) the huge number of possible triplets, and (2) the semi-definite constraint in the optimization. We derive several variants of screening rules, and analyze their relationships. Numerical experiments on benchmark datasets demonstrate the effectiveness of safe triplet screening.


Assumed Density Filtering Methods for Learning Bayesian Neural Networks

Ghosh, Soumya (Disney Research) | Fave, Francesco Maria Delle (Disney Research) | Yedidia, Jonathan (Disney Research)

AAAI Conferences

Buoyed by the success of deep multilayer neural networks, there is renewed interest in scalable learning of Bayesian neural networks. Here, we study algorithms that utilize recent advances in Bayesian inference to efficiently learn distributions over network weights. In particular, we focus on recently proposed assumed density filtering based methods for learning Bayesian neural networks -- Expectation and Probabilistic backpropagation. Apart from scaling to large datasets, these techniques seamlessly deal with non-differentiable activation functions and provide parameter (learning rate, momentum) free learning. In this paper, we first rigorously compare the two algorithms and in the process develop several extensions, including a version of EBP for continuous regression problems and a PBP variant for binary classification. Next, we extend both algorithms to deal with multiclass classification and count regression problems. On a variety of diverse real world benchmarks, we find our extensions to be effective, achieving results competitive with the state-of-the-art.