AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Practical Coreset Constructions for Machine Learning

Bachem, Olivier, Lucic, Mario, Krause, Andreas

arXiv.org Machine LearningJun-4-2017

Over the last years, the world has witnessed the emergence of data sets of an unprecedented size across different scientific disciplines. The large volume of such data sets presents new challenges as gathering, storing, and analyzing them becomes expensive. In the context of millions or even billions of data points, existing proven algorithms "suddenly" become computationally infeasible while data sets may not fit on single machines anymore but must be stored on clusters of machines. As a consequence, new algorithms are required to scale to this massive data setting. While one could focus on single machine learning problems and come up with endless new algorithms, we focus on a more general approach: we investigate coresets -- succinct, small summaries of large data sets -- so that solutions found on the summary are provably competitive with solution found on the full data set.

artificial intelligence, coreset, machine learning, (14 more...)

arXiv.org Machine Learning

1703.06476

Country: Europe (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

Anti-Money Laundering and AI at HSBC Ayasdi

#artificialintelligenceJun-2-2017, 03:40:30 GMT

HSBC and Ayasdi used Topological Data Analysis (TDA) and machine learning (ML) to automatically assemble self-similar groups of customers and customers-of-customers. This exercise was done entirely unsupervised, with Ayasdi's software making the selection of the appropriate algorithms, creating candidate groups and tuning the scenario thresholds within those groups until the optimal ones were identified. In this case, the platform automatically normalized the data columns and combined multi-dimensional scaling and single linkage clustering algorithms to create the topological model. This was then passed through an agglomerative hierarchical clustering algorithm which was optimized to produce balanced segments.

anti-money laundering and ai, artificial intelligence, machine learning, (3 more...)

#artificialintelligence

Industry:

Law Enforcement & Public Safety > Fraud (0.40)
Banking & Finance > Financial Services (0.40)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Rank-One NMF-Based Initialization for NMF and Relative Error Bounds under a Geometric Assumption

Liu, Zhaoqiang, Tan, Vincent Y. F.

arXiv.org Machine LearningJun-2-2017

We propose a geometric assumption on nonnegative data matrices such that under this assumption, we are able to provide upper bounds (both deterministic and probabilistic) on the relative error of nonnegative matrix factorization (NMF). The algorithm we propose first uses the geometric assumption to obtain an exact clustering of the columns of the data matrix; subsequently, it employs several rank-one NMFs to obtain the final decomposition. When applied to data matrices generated from our statistical model, we observe that our proposed algorithm produces factor matrices with comparable relative errors vis-\`a-vis classical NMF algorithms but with much faster speeds. On face image and hyperspectral imaging datasets, we demonstrate that our algorithm provides an excellent initialization for applying other NMF algorithms at a low computational cost. Finally, we show on face and text datasets that the combinations of our algorithm and several classical NMF algorithms outperform other algorithms in terms of clustering performance.

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Machine Learning

doi: 10.1109/TSP.2017.2713761

1612.08549

Country: North America > United States (0.92)

Genre: Research Report (0.81)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Alternatives to algebraic modeling for complex data: topological modeling via Gunnar Carlsson

@machinelearnbotJun-1-2017, 22:11:03 GMT

For many, mathematical modeling is exclusively about algebraic models, based on one form or another of regression or on differential equation modeling in the case of dynamical systems. However, this is too restrictive a point of view. For example, a clustering algorithm can be regarded as a modeling mechanism applicable to data where linear regression simply isn't applicable. Hierarchical clustering can also be regarded as a modeling mechanism, where the output is a dendrogram and contains information about the behavior of clusters at different levels of resolution. Kohonen self-organizing maps can similarly be regarded in this way.

artificial intelligence, machine learning, modeling, (18 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.71)

Add feedback

Direct Mapping Hidden Excited State Interaction Patterns from ab initio Dynamics and Its Implications on Force Field Development

Liu, Fang, Du, Likai, Zhang, Dongju, Gao, Jun

arXiv.org Machine LearningMay-28-2017

The excited states of polyatomic systems are rather complex, and often exhibit meta-stable dynamical behaviors. Static analysis of reaction pathway often fails to sufficiently characterize excited state motions due to their highly non-equilibrium nature. Here, we proposed a time series guided clustering algorithm to generate most relevant meta-stable patterns directly from ab initio dynamic trajectories. Based on the knowledge of these meta-stable patterns, we suggested an interpolation scheme with only a concrete and finite set of known patterns to accurately predict the ground and excited state properties of the entire dynamics trajectories. As illustrated with the example of sinapic acids, the estimation error for both ground and excited state is very close, which indicates one could predict the ground and excited state molecular properties with similar accuracy. These results may provide us some insights to construct an excited state force field with compatible energy terms as traditional ones.

artificial intelligence, stable pattern, upstream oil & gas, (20 more...)

arXiv.org Machine Learning

doi: 10.1038/s41598-017-09347-2

1705.09919

Country:

North America > United States (0.28)
Asia > China (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.68)
Energy > Oil & Gas > Upstream (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Introduction to K-means Clustering: A Tutorial

@machinelearnbotMay-25-2017, 21:30:10 GMT

Dr. Andrea Trevino presents a beginner introduction to the widely-used K-means clustering algorithm in this tutorial. K-means clustering is a type of unsupervised learning, which is used when the resulting categories or groups in the data are unknown. This algorithm finds the groups that exist organically in the data and the results allow the user to label new data quickly. Clustering, in general, is a key tool for understanding your data. This algorithm can be used in a number of applications, including behavioral segmentation, inventory categorization, sorting sensor measurements, and detecting bots or anomalies, to name a few. This tutorial covers the iterative algorithm that determines the clusters and works through a delivery fleet data example in Python.

artificial intelligence, k-means clustering, machine learning, (2 more...)

@machinelearnbot

Genre: Instructional Material > Course Syllabus & Notes (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Real-Time Background Subtraction Using Adaptive Sampling and Cascade of Gaussians

Kiran, B Ravi, Yogamani, Senthil

arXiv.org Machine LearningMay-25-2017

Background-Foreground classification is a fundamental well-studied problem in computer vision. Due to the pixel-wise nature of modeling and processing in the algorithm, it is usually difficult to satisfy real-time constraints. There is a trade-off between the speed (because of model complexity) and accuracy. Inspired by the rejection cascade of Viola-Jones classifier, we decompose the Gaussian Mixture Model (GMM) into an adaptive cascade of classifiers. This way we achieve a good improvement in speed without compensating for accuracy. In the training phase, we learn multiple KDEs for different durations to be used as strong prior distribution and detect probable oscillating pixels which usually results in misclassifications. We propose a confidence measure for the classifier based on temporal consistency and the prior distribution. The confidence measure thus derived is used to adapt the learning rate and the thresholds of the model, to improve accuracy. The confidence measure is also employed to perform temporal and spatial sampling in a principled way. We demonstrate a speed-up factor of 5x to 10x and 17 percent average improvement in accuracy over several standard videos.

machine learning, pixel, real time system, (16 more...)

arXiv.org Machine Learning

1705.09339

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)

Add feedback

Fuzzy Approach Topic Discovery in Health and Medical Corpora

Karami, Amir, Gangopadhyay, Aryya, Zhou, Bin, Kharrazi, Hadi

arXiv.org Machine LearningMay-25-2017

The majority of medical documents and electronic health records (EHRs) are in text format that poses a challenge for data processing and finding relevant documents. Looking for ways to automatically retrieve the enormous amount of health and medical knowledge has always been an intriguing topic. Powerful methods have been developed in recent years to make the text processing automatic. One of the popular approaches to retrieve information based on discovering the themes in health & medical corpora is topic modeling, however, this approach still needs new perspectives. In this research we describe fuzzy latent semantic analysis (FLSA), a novel approach in topic modeling using fuzzy perspective. FLSA can handle health & medical corpora redundancy issue and provides a new method to estimate the number of topics. The quantitative evaluations show that FLSA produces superior performance and features to latent Dirichlet allocation (LDA), the most popular topic model.

flsa, machine learning, natural language, (16 more...)

arXiv.org Machine Learning

doi: 10.1007/s40815-017-0327-9

1705.00995

Country: North America > United States > Maryland (0.29)

Genre:

Overview (0.88)
Research Report (0.84)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.93)
Health & Medicine > Therapeutic Area > Neurology (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Identifying the number of clusters: finally a solution

@machinelearnbotMay-24-2017, 00:37:52 GMT

It optimizes the number of the cluster when the clustering method is maximizing the variance among the clusters. If you are using for example K-means as clustering algorithm, your method will fail for every number of cluster you try to use! As you can see doesn't exist the right number of clusters, for this problem using the "naive" kmeans. BTW I've seen for kmeans and density based clustering algo, methods based on EM (expectation and maximizazion) and Bayesian information criterion (BIC) that are a little bit more robust than this method. Could you share the table of the points...just to play a little bit with them:)

artificial intelligence, identifying, machine learning, (1 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Analytics training courses

#artificialintelligenceMay-23-2017, 12:16:55 GMT

Includes key concepts of statistical analysis - Probability theory, Types of distribution, Central limit theorem, Hypothesis testing, Statsistical inference.

artificial intelligence, case study, machine learning, (16 more...)

#artificialintelligence

Country: North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.09)

Genre: Instructional Material > Course Syllabus & Notes (0.40)

Industry: Education > Curriculum > Subject-Specific Education (0.40)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.55)

Add feedback