Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimiza…

Nov-19-2016, 22:00:17 GMT–#artificialintelligence

Contributes Intel Apache Spark* Spark Users *Other names and brands may be claimed as the property of others 3. Sparse data is almost everywhere • Data Source: – Movie ratings – Purchase history • Feature engineering: – NLP: CountVectorizer, HashingTF – Categorical: OneHotEncoder – Image, video 0 1 2 3 4 5 6 7 8 9 10 0 2 4 6 8 10 Customers products Purchase History 4. Sparse data support in MLlib new DenseVector( values Array(1.0, Sparse data support in MLlib • Supporting Sparse data since v1.0 – Load / Save, Sparse Vector, LIBSVM – Supporting sparse vector is one of the primary review focus. KMeans • Pick initial cluster centers – Random – KMeans • Iterative training – Points clustering, find nearest center for each point – Re-compute center in each cluster (avg.) MLlib iteration 2. Compute a sum table for each partition of data val sum new Array[Vector](k) for (each point in the partition) { val bestCenter traverse() sum(bestCenter) point } Training dataset Executor 1 Executor 2 Executor 3 Sums: 16G Centers: 16G *Other names and brands may be claimed as the property of others 14. Analysis: Data • Are the cluster centers dense?

artificial intelligence, data mining, machine learning, (13 more...)

#artificialintelligence

Nov-19-2016, 22:00:17 GMT

News Web Page

Add feedback

Industry:
- Law (0.71)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (0.31)
  - Artificial Intelligence > Machine Learning
    - Statistical Learning (0.31)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found