AITopics

1710.04073

Country:

North America > United States (1.00)
Europe > United Kingdom > England (0.45)

Genre: Research Report (0.49)

Industry:

Information Technology (0.68)
Health & Medicine (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

#artificialintelligenceOct-9-2017, 20:46:25 GMT

Self Driven Data Science -- Issue #18 – Towards Data Science – Medium

A nearly exhaustive collection of all the different ways that we can visualize data, from bubble charts to histograms. You'll definitely want to bookmark this for future reference when deciding how to represent your insights. K-Means Clustering, one of the popular clustering algorithms is a type of unsupervised learning that is often used when you don't have labeled data. In this post, the author walks through implementing K-Means in Python from scratch. Food for thought when your preparing for your next presentation.

artificial intelligence, machine learning, self driven data science, (1 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.63)

Ferrarotti, Marco Jacopo, Decherchi, Sergio, Rocchia, Walter

Distributed Kernel K-Means for Large Scale Clustering

arXiv.org Machine LearningOct-9-2017

Clustering samples according to an effective metric and/or vector space representation is a challenging unsupervised learning task with a wide spectrum of applications. Among several clustering algorithms, k-means and its kernelized version have still a wide audience because of their conceptual simplicity and efficacy. However, the systematic application of the kernelized version of k-means is hampered by its inherent square scaling in memory with the number of samples. In this contribution, we devise an approximate strategy to minimize the kernel k-means cost function in which the trade-off between accuracy and velocity is automatically ruled by the available system memory. Moreover, we define an ad-hoc parallelization scheme well suited for hybrid cpu-gpu state-of-the-art parallel architectures. We proved the effectiveness both of the approximation scheme and of the parallelization method on standard UCI datasets and on molecular dynamics (MD) data in the realm of computational chemistry. In this applicative domain, clustering can play a key role for both quantitively estimating kinetics rates via Markov State Models or to give qualitatively a human compatible summarization of the underlying chemical phenomenon under study. For these reasons, we selected it as a valuable real-world application scenario.

algorithm, artificial intelligence, machine learning, (17 more...)

doi: 10.5121/csit.2017.71015

1710.03013

Country: Europe > Italy (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

#artificialintelligenceOct-5-2017, 06:10:15 GMT

9 Off-the-beaten-path Statistical Science Topics with Interesting Applications

You will find here nine interesting topics that you won't learn in college classes. Most have interesting applications in business and elsewhere. They are not especially difficult, and I explain them in simple English. Yet they are not part of the traditional statistical curriculum, and even many experienced data scientists with a PhD degree have not heard about some of these concepts. This is a well known model, used as a base stochastic process to model the logarithm of stock prices, yet it has interesting properties (depending on dimension) that few people know about.

algorithm, artificial intelligence, machine learning, (16 more...)

#artificialintelligence

Industry: Education > Educational Setting > Higher Education (0.55)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.31)

Saas, Alain, Guitart, Anna, Periáñez, África

Discovering Playing Patterns: Time Series Clustering of Free-To-Play Game Data

The classification of time series data is a challenge common to all data-driven fields. However, there is no agreement about which are the most efficient techniques to group unlabeled time-ordered data. This is because a successful classification of time series patterns depends on the goal and the domain of interest, i.e. it is application-dependent. In this article, we study free-to-play game data. In this domain, clustering similar time series information is increasingly important due to the large amount of data collected by current mobile and web applications. We evaluate which methods cluster accurately time series of mobile games, focusing on player behavior data. We identify and validate several aspects of the clustering: the similarity measures and the representation techniques to reduce the high dimensionality of time series. As a robustness test, we compare various temporal datasets of player activity from two free-to-play video-games. With these techniques we extract temporal patterns of player behavior relevant for the evaluation of game events and game-business diagnosis. Our experiments provide intuitive visualizations to validate the results of the clustering and to determine the optimal number of clusters. Additionally, we assess the common characteristics of the players belonging to the same group. This study allows us to improve the understanding of player dynamics and churn behavior.

artificial intelligence, data mining, machine learning, (15 more...)

doi: 10.1109/CIG.2016.7860442

1710.02268

Country:

Europe (0.46)
Asia > Japan (0.28)

Genre: Research Report (0.84)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Najafi, Amir, Motahari, Abolfazl, Rabiee, Hamid R.

Reliable Learning of Bernoulli Mixture Models

In this paper, we have derived a set of sufficient conditions for reliable clustering of data produced by Bernoulli Mixture Models (BMM), when the number of clusters is unknown. A BMM refers to a random binary vector whose components are independent Bernoulli trials with cluster-specific frequencies. The problem of clustering BMM data arises in many real-world applications, most notably in population genetics where researchers aim at inferring the population structure from multilocus genotype data. Our findings stipulate a minimum dataset size and a minimum number of Bernoulli trials (or genotyped loci) per sample, such that the existence of a clustering algorithm with a sufficient accuracy is guaranteed. Moreover, the mathematical intuitions and tools behind our work can help researchers in designing more effective and theoretically-plausible heuristic methods for similar problems.

artificial intelligence, bernoulli model, machine learning, (17 more...)

1710.02101

Country: Asia > Middle East (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

Chen, Pin-Yu, Wu, Lingfei

Revisiting Spectral Graph Clustering with Generative Community Models

The methodology of community detection can be divided into two principles: imposing a network model on a given graph, or optimizing a designed objective function. The former provides guarantees on theoretical detectability but falls short when the graph is inconsistent with the underlying model. The latter is model-free but fails to provide quality assurance for the detected communities. In this paper, we propose a novel unified framework to combine the advantages of these two principles. The presented method, SGC-GEN, not only considers the detection error caused by the corresponding model mismatch to a given graph, but also yields a theoretical guarantee on community detectability by analyzing Spectral Graph Clustering (SGC) under GENerative community models (GCMs). SGC-GEN incorporates the predictability on correct community detection with a measure of community fitness to GCMs. It resembles the formulation of supervised learning problems by enabling various community detection loss functions and model mismatch metrics. We further establish a theoretical condition for correct community detection using the normalized graph Laplacian matrix under a GCM, which provides a novel data-driven loss function for SGC-GEN. In addition, we present an effective algorithm to implement SGC-GEN, and show that the computational complexity of SGC-GEN is comparable to the baseline methods. Our experiments on 18 real-world datasets demonstrate that SGC-GEN possesses superior and robust performance compared to 6 baseline methods under 7 representative clustering metrics.

artificial intelligence, data mining, machine learning, (17 more...)

1709.04594

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology (0.46)
Energy (0.46)
Education (0.34)
Health & Medicine (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.85)

Rujeerapaiboon, Napat, Schindler, Kilian, Kuhn, Daniel, Wiesemann, Wolfram

Size Matters: Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

Plain vanilla K-means clustering is prone to produce unbalanced clusters and suffers from outlier sensitivity. To mitigate both shortcomings, we formulate a joint outlier detection and clustering problem, which assigns a prescribed number of datapoints to an auxiliary outlier cluster and performs cardinality-constrained K-means clustering on the residual dataset. We cast this problem as a mixed-integer linear program (MILP) that admits tractable semidefinite and linear programming relaxations. We propose deterministic rounding schemes that transform the relaxed solutions to feasible solutions for the MILP. We also prove that these solutions are optimal in the MILP if a cluster separation condition holds.

artificial intelligence, data mining, machine learning, (16 more...)

1705.07837

Country: Europe > United Kingdom (0.28)

Genre: Research Report (0.81)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Yu, Ming, Thompson, Addie M., Ramamurthy, Karthikeyan Natesan, Yang, Eunho, Lozano, Aurélie C.

Multitask Learning using Task Clustering with Applications to Predictive Modeling and GWAS of Plant Varieties

arXiv.org Machine LearningOct-4-2017

Inferring predictive maps between multiple input and multiple output variables or tasks has innumerable applications in data science. Multi-task learning attempts to learn the maps to several output tasks simultaneously with information sharing between them. We propose a novel multi-task learning framework for sparse linear regression, where a full task hierarchy is automatically inferred from the data, with the assumption that the task parameters follow a hierarchical tree structure. The leaves of the tree are the parameters for individual tasks, and the root is the global model that approximates all the tasks. We apply the proposed approach to develop and evaluate: (a) predictive models of plant traits using large-scale and automated remote sensing data, and (b) GWAS methodologies mapping such derived phenotypes in lieu of hand-measured traits. We demonstrate the superior performance of our approach compared to other methods, as well as the usefulness of discovering hierarchical groupings between tasks. Our results suggest that richer genetic mapping can indeed be obtained from the remote sensing data. In addition, our discovered groupings reveal interesting insights from a plant science perspective.

application, artificial intelligence, machine learning, (15 more...)

1710.01788

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.68)

Industry:

Energy (1.00)
Health & Medicine (0.70)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

#artificialintelligenceOct-3-2017, 18:42:24 GMT

K-Means Clustering in Python

Clustering is a type of Unsupervised learning. This is very often used when you don't have labeled data. K-Means Clustering is one of the popular clustering algorithm. The goal of this algorithm is to find groups(clusters) in the given data. In this post we will implement K-Means algorithm using Python from scratch.

artificial intelligence, k-means clustering, machine learning, (8 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)