Data Science with Python & R: Dimensionality Reduction and Clustering

Mar-28-2016, 02:20:10 GMT–@machinelearnbot

An important step in data analysis is data exploration and representation. In this tutorial we will see how by combining a technique called Principal Component Analysis (PCA) together with Cluster Analysis we can represent in a two-dimensional space data defined in a higher dimensional one while, at the same time, being able to group this data in similar groups or clusters and find hidden relationships in our data. More concretely, PCA reduces data dimensionality by finding principal components. These are the directions of maximum variation in a dataset. By reducing a dataset original features or variables to a reduced set of new ones based on the principal components, we end up with the minimum number of variables that keep the maximum amount of variation or information about how the data is distributed. If we end up with just two of these new variables, we will be able to represent each sample in our data in a two-dimensional chart (e.g. a scatterplot). As an unsupervised data analysis technique, clustering organises data samples by proximity based on its variables.

artificial intelligence, dataset, machine learning, (15 more...)

@machinelearnbot

Mar-28-2016, 02:20:10 GMT

News Web Page

Add feedback

Country:
- Europe > Russia (0.04)
- South America
  - Peru (0.04)
  - Brazil (0.04)
  - Bolivia (0.04)
- Asia
  - China (0.05)
  - Russia (0.04)
  - India (0.04)

Industry:
- Health & Medicine > Therapeutic Area
  - Infections and Infectious Diseases (0.69)
  - Immunology (0.47)

Technology:
- Information Technology
  - Data Science (1.00)
  - Artificial Intelligence > Machine Learning
    - Statistical Learning > Dimensionality Reduction (0.41)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found