Goto

Collaborating Authors

 Directed Networks


Tensor SVD: Statistical and Computational Limits

arXiv.org Machine Learning

In this paper, we propose a general framework for tensor singular value decomposition (tensor SVD), which focuses on the methodology and theory for extracting the hidden low-rank structure from high-dimensional tensor data. Comprehensive results are developed on both the statistical and computational limits for tensor SVD. This problem exhibits three different phases according to the signal-noise-ratio (SNR). In particular, with strong SNR, we show that the classical higher order orthogonal iteration achieves the minimax optimal rate of convergence in estimation; with weak SNR, the information-theoretical lower bound implies that it is impossible to have consistent estimation in general; with moderate SNR, we show that the non-convex maximum likelihood estimation provides optimal solution, but with NP-hard computational cost; moreover, under the hardness hypothesis of hypergraphic planted clique detection, there are no polynomial-time algorithms performing consistently in general.


Static & DYNAMICAL Machine Learning โ€“ What is the Difference?

@machinelearnbot

In an earlier blog, "Need for DYNAMICAL Machine Learning: Bayesian exact recursive estimation", I introduced the need for Dynamical ML as we now enter the "Walk" stage of "Crawl-Walk-Run" evolution of machine learning. First, I defined Static ML as follows: Given a set of inputs and outputs, find a static map between the two during supervised "Training" and use this static map for business purposes during "Operation". I made the following points using IoT as an example. Dynamical ML solution involves State-Space data model (more below). What more does a Dynamical ML solution offer?


Deep Generative Models for Relational Data with Side Information

arXiv.org Machine Learning

We present a probabilistic framework for overlapping community discovery and link prediction for relational data, given as a graph. The proposed framework has: (1) a deep architecture which enables us to infer multiple layers of latent features/communities for each node, providing superior link prediction performance on more complex networks and better interpretability of the latent features; and (2) a regression model which allows directly conditioning the node latent features on the side information available in form of node attributes. Our framework handles both (1) and (2) via a clean, unified model, which enjoys full local conjugacy via data augmentation, and facilitates efficient inference via closed form Gibbs sampling. Moreover, inference cost scales in the number of edges which is attractive for massive but sparse networks. Our framework is also easily extendable to model weighted networks with count-valued edges. We compare with various state-of-the-art methods and report results, both quantitative and qualitative, on several benchmark data sets.


Bayesian Additive Adaptive Basis Tensor Product Models for Modeling High Dimensional Surfaces: An application to high-throughput toxicity testing

arXiv.org Machine Learning

Many modern data sets are sampled with error from complex high-dimensional surfaces. Methods such as tensor product splines or Gaussian processes are effective/well suited for characterizing a surface in two or three dimensions but may suffer from difficulties when representing higher dimensional surfaces. Motivated by high throughput toxicity testing where observed dose-response curves are cross sections of a surface defined by a chemical's structural properties, a model is developed to characterize this surface to predict untested chemicals' dose-responses. This manuscript proposes a novel approach that models the multidimensional surface as a sum of learned basis functions formed as the tensor product of lower dimensional functions, which are themselves representable by a basis expansion learned from the data. The model is described, a Gibbs sampling algorithm proposed, and is investigated in a simulation study as well as data taken from the US EPA's ToxCast high throughput toxicity testing platform.


Nonparametric Bayesian label prediction on a graph

arXiv.org Machine Learning

An implementation of a nonparametric Bayesian approach to solving binary classification problems on graphs is described. A hierarchical Bayesian approach with a randomly scaled Gaussian prior is considered. The prior uses the graph Laplacian to take into account the underlying geometry of the graph. A method based on a theoretically optimal prior and a more flexible variant using partial conjugacy are proposed. Two simulated data examples and two examples using real data are used in order to illustrate the proposed methods.


Senior Data Scientist โ€“ Antwerp, Belgium

#artificialintelligence

You will design and implement state-of-the-art methods for both supervised and unsupervised learning, with a focus on sensor data such as gyroscope and accelerometer streams. Your knowledge of signal processing allows you to apply the necessary pre-processing steps such as band pass filtering, down sampling while anti-aliasing, Fourier or Cepstrum coefficient extraction and spectrogram modeling. You have experience with temporal modeling techniques, both for discrete state spaces (e.g. You have provable experience with generative (e.g. You have a strong theoretical and mathematical background and are able to reason about machine learning peculiarities in order to answer questions such as: Is a Bayesian classifier with Gaussian likelihoods and priors the same as a Euclidean distance classifier if equal and diagonal covariance matrices are used?


Analytic Decision Analysis via Symbolic Dynamic Programming for Parameterized Hybrid MDPs

AAAI Conferences

For example, we may need to (i) perform inverse learning of the cost parameters of a multi-objective reward based on observed agent behavior; (ii) perform sensitivity analyses of policies to various parameter settings; or (iii) analyze and optimize policy performance as a function of policy parameters. When such problems have mixed discrete and continuous state and/or action spaces, this leads to parameterized hybrid MDPs (PHMDPs) that are often approximately solved via discretization, sampling, and/or local gradient methods (when optimization is involved). In this paper we combine two recent advances that allow for the first exact solution and optimization of PHMDPs. We first show how each of the aforementioned use cases can be formalized as PHMDPs, which can then be solved via an extension of symbolic dynamic programming (SDP) even when the solution is piecewise nonlinear. Secondly, we can leverage recent advances in non-convex solvers that require symbolic forms of the objective function for non-convex global optimization in (i), (ii), and (iii) using SDP to derive symbolic solutions for each PHMDP formalization. We demonstrate the efficacy and scalability of our optimal analytical framework on nonlinear examples of each of the aforementioned use cases.


Differentially Private Learning of Undirected Graphical Models using Collective Graphical Models

arXiv.org Machine Learning

We investigate the problem of learning discrete, undirected graphical models in a differentially private way. We show that the approach of releasing noisy sufficient statistics using the Laplace mechanism achieves a good trade-off between privacy, utility, and practicality. A naive learning algorithm that uses the noisy sufficient statistics "as is" outperforms general-purpose differentially private learning algorithms. However, it has three limitations: it ignores knowledge about the data generating process, rests on uncertain theoretical foundations, and exhibits certain pathologies. We develop a more principled approach that applies the formalism of collective graphical models to perform inference over the true sufficient statistics within an expectation-maximization framework. We show that this learns better models than competing approaches on both synthetic data and on real human mobility data used as a case study.


Provable benefits of representation learning

arXiv.org Machine Learning

There is general consensus that learning representations is useful for a variety of reasons, e.g. efficient use of labeled data (semi-supervised learning), transfer learning and understanding hidden structure of data. Popular techniques for representation learning include clustering, manifold learning, kernel-learning, autoencoders, Boltzmann machines, etc. To study the relative merits of these techniques, it's essential to formalize the definition and goals of representation learning, so that they are all become instances of the same definition. This paper introduces such a formal framework that also formalizes the utility of learning the representation. It is related to previous Bayesian notions, but with some new twists. We show the usefulness of our framework by exhibiting simple and natural settings -- linear mixture models and loglinear models, where the power of representation learning can be formally shown. In these examples, representation learning can be performed provably and efficiently under plausible assumptions (despite being NP-hard), and furthermore: (i) it greatly reduces the need for labeled data (semi-supervised learning) and (ii) it allows solving classification tasks when simpler approaches like nearest neighbors require too much data (iii) it is more powerful than manifold learning methods.


On The Projection Operator to A Three-view Cardinality Constrained Set

arXiv.org Machine Learning

The cardinality constraint is an intrinsic way to restrict the solution structure in many domains, for example, sparse learning, feature selection, and compressed sensing. To solve a cardinality constrained problem, the key challenge is to solve the projection onto the cardinality constraint set, which is NP-hard in general when there exist multiple overlapped cardinality constraints. In this paper, we consider the scenario where the overlapped cardinality constraints satisfy a Three-view Cardinality Structure (TVCS), which reflects the natural restriction in many applications, such as identification of gene regulatory networks and task-worker assignment problem. We cast the projection into a linear programming, and show that for TVCS, the vertex solution of this linear programming is the solution for the original projection problem. We further prove that such solution can be found with the complexity proportional to the number of variables and constraints. We finally use synthetic experiments and two interesting applications in bioinformatics and crowdsourcing to validate the proposed TVCS model and method.