AITopics | mllib

Collaborating Authors

mllib

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Enriching the Machine Learning Workloads in BigBench

Polag, Matthias, Ivanov, Todor, Eichhorn, Timo

arXiv.org Artificial IntelligenceJun-16-2024

In the era of Big Data and the growing support for Machine Learning, Deep Learning and Artificial Intelligence algorithms in the current software systems, there is an urgent need of standardized application benchmarks that stress test and evaluate these new technologies. Relying on the standardized BigBench (TPCx-BB) benchmark, this work enriches the improved BigBench V2 with three new workloads and expands the coverage of machine learning algorithms. Our workloads utilize multiple algorithms and compare different implementations for the same algorithm across several popular libraries like MLlib, SystemML, Scikit-learn and Pandas, demonstrating the relevance and usability of our benchmark extension.

algorithm, implementation, library, (15 more...)

arXiv.org Artificial Intelligence

2406.10843

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Germany > Hesse (0.04)
Europe > Germany > Berlin (0.04)

Genre: Research Report (0.86)

Industry: Information Technology (0.68)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

Movie Recommendations with Spark Collaborative Filtering - KDnuggets

#artificialintelligenceDec-1-2021, 16:17:22 GMT

Collaborative filtering (CF) based on the alternating least squares (ALS) technique is another algorithm used to generate recommendations. It produces automatic predictions (filtering) about the interests of a user by collecting preferences from many other users (collaborating). The underlying assumption of the CF approach is that if a person A has the same opinion as a person B on an issue, A is more likely to have B's opinion on a different issue than a randomly chosen person. This algorithm gained a lot of traction in the data science community after it was used by the team winner of the Netflix Prize. The CF algorithm has also been implemented in Spark MLlib with the aim of addressing fast execution on very large datasets.

movie, node, workflow, (10 more...)

#artificialintelligence

Country:

North America > United States > California > San Mateo County > Menlo Park (0.05)
North America > United States > California > Alameda County > Berkeley (0.05)
Europe > Switzerland > Zürich > Zürich (0.05)
Europe > Italy > Tuscany > Florence (0.05)

Industry:

Media > Film (0.91)
Leisure & Entertainment (0.69)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)

Add feedback

PySpark Tutorial

#artificialintelligenceOct-13-2021, 13:27:27 GMT

Pyspark is an Apache Spark which is an open-source cluster-computing framework for large-scale data processing written in Scala.

dataframe, pyspark, sparksession, (16 more...)

#artificialintelligence

Industry: Information Technology (0.49)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Architecture (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
(4 more...)

Add feedback

Spark MLlib on AWS Glue

#artificialintelligenceJun-29-2021, 01:16:15 GMT

AWS pushes Sagemaker as its machine learning platform. However, Spark's MLlib is a comprehensive library that runs distributed ML natively on AWS Glue -- and provides a viable alternative to their primary ML platform. One of the big benefits of Sagemaker is that it easily supports experimentation via its Jupyter Notebooks. But operationalising your Sagemaker ML can be difficult, particularly if you need to include ETL processing at the start of your pipeline. In this situation, Apache Spark's MLlib running on AWS Glue can be a good option -- by its very nature, it is immediately operationalised, integrated with ETL pre-processing and ready to be used in production for an end-to-end machine learning pipeline.

aw glue, custom transform, glue, (12 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.32)

Add feedback

Machine learning with PySpark

#artificialintelligenceOct-30-2020, 23:45:24 GMT

In this article, I am going to share a few machine learning work I have done in spark using PySpark. Machine Learning is one of the hot application of artificial intelligence (AI). AI is a much bigger ecosystem with many amazing applications. Machine learning in simple terms is the ability to automatically learn by the machine and improve from experience without explicitly programmed. The learning process starts with observation of data, then it finds the pattern in date and makes a better decision on learning from data.

artificial intelligence, machine learning, regression model, (5 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.39)

Add feedback

14 open source tools to make the most of machine learning

#artificialintelligenceSep-29-2020, 18:25:36 GMT

Spam filtering, face recognition, recommendation engines -- when you have a large data set on which you'd like to perform predictive analysis or pattern recognition, machine learning is the way to go. The proliferation of free open source software has made machine learning easier to implement both on single machines and at scale, and in most popular programming languages. These open source tools include libraries for the likes of Python, R, C, Java, Scala, Clojure, JavaScript, and Go. Apache Mahout provides a way to build environments for hosting machine learning applications that can be scaled quickly and efficiently to meet demand. Mahout works mainly with another well-known Apache project, Spark, and was originally devised to work with Hadoop for the sake of running distributed applications, but has been extended to work with other distributed back ends like Flink and H2O. Mahout uses a domain specific language in Scala.

data mining, machine learning, programming language, (20 more...)

#artificialintelligence

Country: Oceania > New Zealand > North Island > Waikato (0.05)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.37)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.32)

Add feedback

14 open source tools to make the most of machine learning

#artificialintelligenceSep-26-2020, 01:40:34 GMT

data mining, machine learning, programming language, (20 more...)

#artificialintelligence

Country: Oceania > New Zealand > North Island > Waikato (0.05)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.37)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.32)

Add feedback

Machine_Learning_with_Spark

#artificialintelligenceSep-17-2020, 03:36:26 GMT

This is a comprehensive tutorial on using the Spark distributed machine learning framework to build a scalable ML data pipeline. I will cover the basic machine learning algorithms implemented in Spark MLlib library and through this tutorial, I will use the PySpark in python environment. Machine learning is getting popular in solving real-world problems in almost every business domain. It helps solve the problems using the data which is often unstructured, noisy, and in huge size. With the increase in data sizes and various sources of data, solving machine learning problems using standard techniques pose a big challenge.

algorithm, artificial intelligence, machine learning, (16 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.71)

Add feedback

Distributing the Singular Value Decomposition with Apache Spark

#artificialintelligenceSep-4-2019, 06:28:50 GMT

The Singular Value Decomposition (SVD) is one of the cornerstones of linear algebra and has widespread application in many real-world modeling situations. Problems such as recommender systems, linear systems, least squares, and many others can be solved using the SVD. It is frequently used in statistics where it is related to principal component analysis (PCA) and to correspondence analysis, and in signal processing and pattern recognition. Another usage is latent semantic indexing in natural language processing. Decades ago, before the rise of distributed computing, computer scientists developed the single-core ARPACK package for computing the eigenvalue decomposition of a matrix.

artificial intelligence, natural language, singular value decomposition, (11 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

The 3 Biggest Mistakes on Learning Data Science

#artificialintelligenceJun-6-2019, 02:30:32 GMT

I've discussed parts of what I'm going to mention here in other articles, but now I want to give a few directions on what's not data science and how not to learn it. So let's start with the basics. Data science not just knowing some programming languages, math, statistics and have "domain knowledge". We've created a new field, or something like that. There's a lot of things to say and study in this field.

artificial intelligence, machine learning, social media, (14 more...)

#artificialintelligence

Genre: Instructional Material (0.49)

Industry: Education (0.32)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media (0.50)
Information Technology > Artificial Intelligence > Machine Learning (0.34)

Add feedback