AITopics

The inference of the causal relationship between a pair of observed variables is a fundamental problem in science, and most existing approaches are based on one single causal model. In practice, however, observations are often collected from multiple sources with heterogeneous causal models due to certain uncontrollable factors, which renders causal analysis results obtained by a single model skeptical. In this paper, we generalize the Additive Noise Model (ANM) to a mixture model, which consists of a finite number of ANMs, and provide the condition of its causal identifiability. To conduct model estimation, we propose Gaussian Process Partially Observable Model (GPPOM), and incorporate independence enforcement into it to learn latent parameter associated with each observation. Causal inference and clustering according to the underlying generating mechanisms of the mixture model are addressed in this work. Experiments on synthetic and real data demonstrate the effectiveness of our proposed approach.

artificial intelligence, data mining, machine learning, (17 more...)

Country: North America (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Supervising Unsupervised Learning

Garg, Vikas

We introduce a framework to transfer knowledge acquired from a repository of (heterogeneous) supervised datasets to new unsupervised datasets. Our perspective avoids the subjectivity inherent in unsupervised learning by reducing it to supervised learning, and provides a principled way to evaluate unsupervised algorithms. We demonstrate the versatility of our framework via rigorous agnostic bounds on a variety of unsupervised problems. In the context of clustering, our approach helps choose the number of clusters and the clustering algorithm, remove the outliers, and provably circumvent Kleinberg's impossibility result. Experiments across hundreds of problems demonstrate improvements in performance on unsupervised data with simple algorithms despite the fact our problems come from heterogeneous domains. Additionally, our framework lets us leverage deep networks to learn common features across many small datasets, and perform zero shot learning.

algorithm, artificial intelligence, machine learning, (17 more...)

Country: North America (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.92)

Bipartite Stochastic Block Models with Tiny Clusters

Neumann, Stefan

We study the problem of finding clusters in random bipartite graphs. We present a simple two-step algorithm which provably finds even tiny clusters of size $O(n^\epsilon)$, where $n$ is the number of vertices in the graph and $\epsilon > 0$. Previous algorithms were only able to identify clusters of size $\Omega(\sqrt{n})$. We evaluate the algorithm on synthetic and on real-world data; the experiments show that the algorithm can find extremely small clusters even in presence of high destructive noise.

algorithm, artificial intelligence, machine learning, (17 more...)

Country:

North America > Canada (0.28)
Europe > Austria > Vienna (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Semi-crowdsourced Clustering with Deep Generative Models

Luo, Yucen, TIAN, TIAN, Shi, Jiaxin, Zhu, Jun, Zhang, Bo

We consider the semi-supervised clustering problem where crowdsourcing provides noisy information about the pairwise comparisons on a small subset of data, i.e., whether a sample pair is in the same cluster. We propose a new approach that includes a deep generative model (DGM) to characterize low-level features of the data, and a statistical relational model for noisy pairwise annotations on its subset. The two parts share the latent variables. To make the model automatically trade-off between its complexity and fitting data, we also develop its fully Bayesian variant. The challenge of inference is addressed by fast (natural-gradient) stochastic variational inference algorithms, where we effectively combine variational message passing for the relational part and amortized learning of the DGM under a unified framework. Empirical results on synthetic and real-world datasets show that our model outperforms previous crowdsourced clustering methods.

annotation, artificial intelligence, machine learning, (16 more...)

Country: Asia > China (0.14)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.61)

Mixture Matrix Completion

Pimentel-Alarcon, Daniel

Completing a data matrix X has become an ubiquitous problem in modern data science, with motivations in recommender systems, computer vision, and networks inference, to name a few. One typical assumption is that X is low-rank. A more general model assumes that each column of X corresponds to one of several low-rank matrices. This paper generalizes these models to what we call mixture matrix completion (MMC): the case where each entry of X corresponds to one of several low-rank matrices. MMC is a more accurate model for recommender systems, and brings more flexibility to other completion and clustering problems. We make four fundamental contributions about this new model. First, we show that MMC is theoretically possible (well-posed). Second, we give its precise information-theoretic identifiability conditions. Third, we derive the sample complexity of MMC. Finally, we give a practical algorithm for MMC with performance comparable to the state-of-the-art for simpler related problems, both on synthetic and real data.

artificial intelligence, machine learning, matrix, (15 more...)

Country: North America (0.46)

Genre: Research Report (0.46)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Lindenbaum, Ofir, Stanley, Jay, Wolf, Guy, Krishnaswamy, Smita

Geometry Based Data Generation

We propose a new type of generative model for high-dimensional data that learns a manifold geometry of the data, rather than density, and can generate points evenly along this manifold. This is in contrast to existing generative models that represent data density, and are strongly affected by noise and other artifacts of data collection. We demonstrate how this approach corrects sampling biases and artifacts, thus improves several downstream data analysis tasks, such as clustering and classification. Finally, we demonstrate that this approach is especially useful in biology where, despite the advent of single-cell technologies, rare subpopulations and gene-interaction relationships are affected by biased sampling. We show that SUGAR can generate hypothetical populations, and it is able to reveal intrinsic patterns and mutual-information relationships between genes on a single-cell RNA sequencing dataset of hematopoiesis.

artificial intelligence, machine learning, sugar, (15 more...)

Country: North America > United States (0.14)

Industry:

Health & Medicine > Therapeutic Area > Immunology (0.69)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Khosla, Meenakshi, Jamison, Keith, Ngo, Gia H., Kuceyeski, Amy, Sabuncu, Mert R.

Machine learning in resting-state fMRI analysis

arXiv.org Machine LearningDec-30-2018

Machine learning techniques have gained prominence for the analysis of resting-state functional Magnetic Resonance Imaging (rsfMRI) data. Here, we present an overview of various unsupervised and supervised machine learning applicationsto rsfMRI. We present a methodical taxonomy of machine learning methods in resting-state fMRI. We identify three major divisions of unsupervised learning methods with regard to their applications to rsfMRI, based on whether they discover principal modes of variation across space, time or population. Next, we survey the algorithms and rsfMRI feature representations that have driven the success of supervised subject-level predictions. Thegoal is to provide a high-level overview of the burgeoning field of rsfMRI from the perspective of machine learning applications. Keywords: Machine learning, resting-state, functional MRI, intrinsic networks, brain connectivity 1. Introduction Resting-state fMRI (rsfMRI) is a widely used neuroimaging tool that ...

connectivity, functional connectivity, rsfmri, (16 more...)

arXiv.org Machine Learning

1812.11477

Country:

North America > United States > New York > Tompkins County > Ithaca (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Poland > Lesser Poland Province > Kraków (0.04)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

#artificialintelligenceDec-29-2018, 19:55:21 GMT

Why Use K-Means for Time Series Data? (Part One) Blog InfluxData

As an only child, I spent a lot of time by myself. Oftentimes my only respite from the extreme boredom of being by myself was daydreaming. I would meditate on objects in my environment and rotate them around in my head. I now attribute my love of jigsaw puzzles, math, and art to all the time I dedicated to visualization practice. My love for those things inspired me to try and understand more about how statistical functions and K-Means Clustering are used in anomaly detection for time series data.

anomaly, artificial intelligence, machine learning, (13 more...)

#artificialintelligence

Industry: Health & Medicine > Consumer Health (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.39)

#artificialintelligenceDec-29-2018, 16:21:47 GMT

Meta-Learning For Better Machine Learning

In a related post we discussed the Cold Start Problem in Data Science -- how do you start to build a model when you have either no training data or no clear choice of model parameters. An example of a cold start problem is k-Means Clustering, where the number of clusters k in the data set is not known in advance, and the locations of those clusters in feature space (i.e., the cluster means) are not known either. So, you start by assuming a value for k and making random assumptions about the cluster means, and then iterate until you find the optimal set of clusters, based upon some evaluation metric. See the related post for more details about the cold start challenge. See the attached graphic below for a simple demonstration of a k-Means Clustering application.

artificial intelligence, machine learning, parameterization, (12 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.79)