Goto

Collaborating Authors

 Directed Networks


Weighted Community Detection and Data Clustering Using Message Passing

arXiv.org Machine Learning

Grouping objects into clusters based on similarities or weights between them is one of the most important problems in science and engineering. In this work, by extending message passing algorithms and spectral algorithms proposed for unweighted community detection problem, we develop a non-parametric method based on statistical physics, by mapping the problem to Potts model at the critical temperature of spin glass transition and applying belief propagation to solve the marginals corresponding to the Boltzmann distribution. Our algorithm is robust to over-fitting and gives a principled way to determine whether there are significant clusters in the data and how many clusters there are. We apply our method to different clustering tasks and use extensive numerical experiments to illustrate the advantage of our method over existing algorithms. In the community detection problem in weighted and directed networks, we show that our algorithm significantly outperforms existing algorithms. In the clustering problem when the data was generated by mixture models in the sparse regime we show that our method works to the theoretical limit of detectability and gives accuracy very close to that of the optimal Bayesian inference. In the semi-supervised clustering problem, our method only needs several labels to work perfectly in classic datasets. Finally, we further develop Thouless-Anderson-Palmer equations which reduce heavily the computation complexity in dense-networks but gives almost the same performance as belief propagation.


Contextual Explanation Networks

arXiv.org Artificial Intelligence

We introduce contextual explanation networks (CENs)---a class of models that learn to predict by generating and leveraging intermediate explanations. CENs are deep networks that generate parameters for context-specific probabilistic graphical models which are further used for prediction and play the role of explanations. Contrary to the existing post-hoc model-explanation tools, CENs learn to predict and to explain jointly. Our approach offers two major advantages: (i) for each prediction, valid instance-specific explanations are generated with no computational overhead and (ii) prediction via explanation acts as a regularization and boosts performance in low-resource settings. We prove that local approximations to the decision boundary of our networks are consistent with the generated explanations. Our results on image and text classification and survival analysis tasks demonstrate that CENs are competitive with the state-of-the-art while offering additional insights behind each prediction, valuable for decision support.


Marketing Analytics: Methods, Practice, Implementation, and Links to Other Fields

arXiv.org Machine Learning

Marketing analytics is a diverse field, with both academic researchers and practitioners coming from a range of backgrounds including marketing, operations research, statistics, and computer science. This paper provides an integrative review at the boundary of these three areas. The topics of visualization, segmentation, and class prediction are featured. Links between the disciplines are emphasized. For each of these topics, a historical overview is given, starting with initial work in the 1960s and carrying through to the present day. Recent innovations for modern large and complex "big data" sets are described. Practical implementation advice is given, along with a directory of open source R routines for implementing marketing analytics techniques.


Cloud 66 Blog

#artificialintelligence

Complex equations can be calculated faster as ever and everybody can start experimenting with machine learning. Do want to give it a try too? This blog is a very simple primer into the exciting world of machine learning and comes with a working demo written in Node. Machine learning is the art of using computer algorithms to learn from experiences and use those experiences for future predictions. Tom Mitchell gave a really simple definition of machine learning.


A Review of Multiple Try MCMC algorithms for Signal Processing

arXiv.org Machine Learning

Many applications in signal processing require the estimation of some parameters of interest given a set of observed data. More specifically, Bayesian inference needs the computation of {\it a-posteriori} estimators which are often expressed as complicated multi-dimensional integrals. Unfortunately, analytical expressions for these estimators cannot be found in most real-world applications, and Monte Carlo methods are the only feasible approach. A very powerful class of Monte Carlo techniques is formed by the Markov Chain Monte Carlo (MCMC) algorithms. They generate a Markov chain such that its stationary distribution coincides with the target posterior density. In this work, we perform a thorough review of MCMC methods using multiple candidates in order to select the next state of the chain, at each iteration. With respect to the classical Metropolis-Hastings method, the use of multiple try techniques foster the exploration of the sample space. We present different Multiple Try Metropolis schemes, Ensemble MCMC methods, Particle Metropolis-Hastings algorithms and the Delayed Rejection Metropolis technique. We highlight limitations, benefits, connections and differences among the different methods, and compare them by numerical simulations.


Loan Prediction โ€“ Using PCA and Naive Bayes Classification with R

@machinelearnbot

Nowadays, there are numerous risks related to bank loans both for the banks and the borrowers getting the loans. The risk analysis about bank loans needs understanding about the risk and the risk level. Banks need to analyze their customers for loan eligibility so that they can specifically target those customers. Banks wanted to automate the loan eligibility process (real time) based on customer details such as Gender, Marital Status, Age, Occupation, Income, debts, and others provided in their online application form. As the number of transactions in banking sector is rapidly growing and huge data volumes are available, the customers' behavior can be easily analyzed and the risks around loan can be reduced.


Machine Learning: Classification Models โ€“ Fuzz โ€“ Medium

#artificialintelligence

These days the terms "AI", "Machine Learning", "Deep Learning" are thrown around by companies in every industry, they're the type of words that make any forward-looking executive salivate. You might think these are new concepts that seemed to have appeared overnight, but the reality is they've been around for a while and it's the hard work of many within the field that has really moved it into the spotlight as the latest tech trend. While these terms are sometimes used interchangeably by the media they certainly are not the same, but I'll leave that discussion for another time. It's surely an exciting time for the industry, from a slew of open source libraries (TenserFlow, PredictionIO, DeepLearning4J, or see github) coming into popularity and every cloud provider from Amazon, IBM, Microsoft (the list goes on) all offering their own tools to help get started in the AI/ML/DL field. If you've stumbled on this article, you're probably well aware of everything I've mentioned above, so now that we've gotten past the obligatory intro, let's get to what the title actually claims this article is about.


Development of ICA and IVA Algorithms with Application to Medical Image Analysis

arXiv.org Machine Learning

Independent component analysis (ICA) is a widely used BSS method that can uniquely achieve source recovery, subject to only scaling and permutation ambiguities, through the assumption of statistical independence on the part of the latent sources. Independent vector analysis (IVA) extends the applicability of ICA by jointly decomposing multiple datasets through the exploitation of the dependencies across datasets. Though both ICA and IVA algorithms cast in the maximum likelihood (ML) framework enable the use of all available statistical information in reality, they often deviate from their theoretical optimality properties due to improper estimation of the probability density function (PDF). This motivates the development of flexible ICA and IVA algorithms that closely adhere to the underlying statistical description of the data. Although it is attractive minimize the assumptions, important prior information about the data, such as sparsity, is usually available. If incorporated into the ICA model, use of this additional information can relax the independence assumption, resulting in an improvement in the overall separation performance. Therefore, the development of a unified mathematical framework that can take into account both statistical independence and sparsity is of great interest. In this work, we first introduce a flexible ICA algorithm that uses an effective PDF estimator to accurately capture the underlying statistical properties of the data. We then discuss several techniques to accurately estimate the parameters of the multivariate generalized Gaussian distribution, and how to integrate them into the IVA model. Finally, we provide a mathematical framework that enables direct control over the influence of statistical independence and sparsity, and use this framework to develop an effective ICA algorithm that can jointly exploit these two forms of diversity.


A Distributed Framework for the Construction of Transport Maps

arXiv.org Machine Learning

The need to reason about uncertainty in large, complex, and multi-modal datasets has become increasingly common across modern scientific environments. The ability to transform samples from one distribution $P$ to another distribution $Q$ enables the solution to many problems in machine learning (e.g. Bayesian inference, generative modeling) and has been actively pursued from theoretical, computational, and application perspectives across the fields of information theory, computer science, and biology. Performing such transformations , in general, still comprises computational difficulties, especially in high dimensions. Here, we consider the problem of computing such "measure transport maps" with efficient and parallelizable methods. Under the mild assumptions that $P$ need not be known but can be sampled from, that the density of $Q$ is known up to a proportionality constant, and that $Q$ is log-concave, we provide a convex optimization problem pertaining to relative entropy minimization. We show how an empirical minimization formulation and polynomial chaos map parameterization can allow for learning a transport map between $P$ and $Q$ with distributed and scalable methods. We also leverage findings from nonequilibrium thermodynamics to represent the transport map as a composition of simpler maps, each of which is learned sequentially with a transport cost regularized version of the aforementioned problem formulation. We provide examples of our framework within the context of Bayesian inference for the Boston housing dataset, active learning for optimizing human computer interfaces, density estimation for probabilistic sleep staging with EEG, and generative modeling for handwritten digit images from the MNIST dataset.


The landscape of the spiked tensor model

arXiv.org Machine Learning

We consider the problem of estimating a large rank-one tensor ${\boldsymbol u}^{\otimes k}\in({\mathbb R}^{n})^{\otimes k}$, $k\ge 3$ in Gaussian noise. Earlier work characterized a critical signal-to-noise ratio $\lambda_{Bayes}= O(1)$ above which an ideal estimator achieves strictly positive correlation with the unknown vector of interest. Remarkably no polynomial-time algorithm is known that achieved this goal unless $\lambda\ge C n^{(k-2)/4}$ and even powerful semidefinite programming relaxations appear to fail for $1\ll \lambda\ll n^{(k-2)/4}$. In order to elucidate this behavior, we consider the maximum likelihood estimator, which requires maximizing a degree-$k$ homogeneous polynomial over the unit sphere in $n$ dimensions. We compute the expected number of critical points and local maxima of this objective function and show that it is exponential in the dimensions $n$, and give exact formulas for the exponential growth rate. We show that (for $\lambda$ larger than a constant) critical points are either very close to the unknown vector ${\boldsymbol u}$, or are confined in a band of width $\Theta(\lambda^{-1/(k-1)})$ around the maximum circle that is orthogonal to ${\boldsymbol u}$. For local maxima, this band shrinks to be of size $\Theta(\lambda^{-1/(k-2)})$. These `uninformative' local maxima are likely to cause the failure of optimization algorithms.