AITopics | spark mllib

Collaborating Authors

spark mllib

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

First Steps in Machine Learning with Apache Spark

#artificialintelligenceJan-5-2023, 19:05:21 GMT

Apache Spark is one of the main tools for data processing and analysis in the BigData context. It's a very complete (and complex) data processing framework, with functionalities that can be roughly divided into four groups: SparkSQL & DataFrames, the all-purpose data processing needs; Spark Structured Streaming, used to handle data-streams; Spark MLlib, for machine learning and data science and GraphX, the graph processing API. I've already featured the first two in other posts: creating an ETL process for a Data Warehouse and integrating Spark and Kafka for stream processing. Today is the time for the third one -- Let's play with Machine Learning using Spark MLlib. Machine Learning has a special place in my heart, because it was my entrance door to the data science field and, as probably many of yours, I started it with the classic Scikit-Learn library.

apache spark, artificial intelligence, machine learning, (14 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.56)

Add feedback

Differential testing for machine learning: an analysis for classification algorithms beyond deep learning

Herbold, Steffen, Tunkel, Steffen

arXiv.org Artificial IntelligenceJul-25-2022

Context: Differential testing is a useful approach that uses different implementations of the same algorithms and compares the results for software testing. In recent years, this approach was successfully used for test campaigns of deep learning frameworks. Objective: There is little knowledge on the application of differential testing beyond deep learning. Within this article, we want to close this gap for classification algorithms. Method: We conduct a case study using Scikit-learn, Weka, Spark MLlib, and Caret in which we identify the potential of differential testing by considering which algorithms are available in multiple frameworks, the feasibility by identifying pairs of algorithms that should exhibit the same behavior, and the effectiveness by executing tests for the identified pairs and analyzing the deviations. Results: While we found a large potential for popular algorithms, the feasibility seems limited because often it is not possible to determine configurations that are the same in other frameworks. The execution of the feasible tests revealed that there is a large amount of deviations for the scores and classes. Only a lenient approach based on statistical significance of classes does not lead to a huge amount of test failures. Conclusions: The potential of differential testing beyond deep learning seems limited for research into the quality of machine learning libraries. Practitioners may still use the approach if they have deep knowledge about implementations, especially if a coarse oracle that only considers significant differences of classes is sufficient.

algorithm, differential testing, implementation, (14 more...)

arXiv.org Artificial Intelligence

2207.11976

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > Germany > Lower Saxony > Gottingen (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (0.46)
Energy (0.46)
Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Scalable Machine Learning with Spark

#artificialintelligenceJun-25-2021, 06:40:20 GMT

Since the early 2000s, the amount of data collected has increased enormously due to the advent of internet giants such as Google, Netflix, Youtube, Amazon, Facebook, etc. Near to 2010, another "data wave" had come about when mobile phones became hugely popular. In 2020s, we anticipate another exponential rise in data when IoT devices become all-pervasive. Given this backdrop, building scalable systems becomes a sine qua non for machine learning solutions. Pre-2005, parallel processing libraries like MPI and PVM were popular for compute heavy tasks, based on which TensorFlow was designed later. Hence, the design was aimed to reduce data redundancy, by dividing larger tables into smaller tables, and link them using relationships (Normalization).

artificial intelligence, data mining, machine learning, (19 more...)

#artificialintelligence

Industry: Information Technology > Services (0.54)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Data Science > Data Mining > Big Data (0.54)

Add feedback

Scalable Machine Learning on Spark

#artificialintelligenceOct-7-2020, 00:10:29 GMT

Here, we're observing the mean and variance of the features we have. This is helpful in determining if we need to perform normalization of features. It's useful to have all features on a similar scale. We are also taking a note of non-zero values, which can adversely impact model performance. Another important metric to analyze is the correlation between features in the input data - Matrix correlMatrix Statistics.corr(inputData.rdd(),

artificial intelligence, machine learning, println, (15 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.31)

Add feedback

Machine_Learning_with_Spark

#artificialintelligenceSep-17-2020, 03:36:26 GMT

This is a comprehensive tutorial on using the Spark distributed machine learning framework to build a scalable ML data pipeline. I will cover the basic machine learning algorithms implemented in Spark MLlib library and through this tutorial, I will use the PySpark in python environment. Machine learning is getting popular in solving real-world problems in almost every business domain. It helps solve the problems using the data which is often unstructured, noisy, and in huge size. With the increase in data sizes and various sources of data, solving machine learning problems using standard techniques pose a big challenge.

algorithm, artificial intelligence, machine learning, (16 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.71)

Add feedback

Best Python Libraries for Machine Learning and Deep Learning

#artificialintelligenceMar-27-2020, 12:24:21 GMT

To understand how to accomplish a specific task in TensorFlow, you can refer to the TensorFlow tutorials. Keras is one of the most popular and open-source neural network libraries for Python. Initially designed by a Google engineer for ONEIROS, short for Open-Ended Neuro Electronic Intelligent Robot Operating System, Keras was soon supported in TensorFlow's core library making it accessible on top of TensorFlow.

artificial intelligence, library, machine learning, (15 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Scaling Machine Learning from 0 to millions of users -- part 2

#artificialintelligenceFeb-8-2019, 16:25:55 GMT

In part 1, we broke out of the laptop, and decided to deploy our prediction service on a virtual machine. By doing so, we discussed a few simple techniques that helped with initial scalability… and hopefully with reducing manual ops. Since then, despite a few production hiccups due the lack of high availability, life has been pretty good. However, traffic soon starts to increase, data piles up, more models need to be trained, etc. Technical and business stakes are getting higher, and let's face it, the current architecture will go underwater soon. Yes, it can be a short-term solution to use a large server for training and prediction.

artificial intelligence, machine learning, training job, (16 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

5 Open Source Libraries to Aid in Your Machine Learning Endeavors

#artificialintelligenceApr-2-2018, 20:11:50 GMT

Machine learning is changing the way we do things, and it's becoming mainstream very quickly. While many factors have contributed to this increase in machine learning, one reason is that it's becoming easier for developers to apply it, thanks to open source frameworks. If you're not familiar with this technology, and feel confused about some of the terms used, such as "framework" and "library," here are the definitions: A vague term, to be sure; even those who regularly use it can't agree on its exact definition. However, in most cases, "framework" refers to a bunch of programs, libraries and languages you have built to use in application development. Think of a framework as a base for getting started.

artificial intelligence, library, machine learning, (12 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

How to train and deploy deep learning at scale

#artificialintelligenceMar-15-2018, 14:38:14 GMT

In five lines, you can describe how your architecture looks and then you can also specify what algorithms you want to use for training. There are a lot of other systems challenges associated with actually going end to end, from data to a deployed model. The existing software solutions don't really tackle a big set of these challenges. For example, regardless of the software you're using, it takes days to weeks to train a deep learning model. There's real open challenges of how to best use parallel and distributed computing both to train a particular model and in the context of tuning hyperparameters of different models. We also found out the vast majority of organizations that we've spoken to in the last year or so who are using deep learning for what I'd call mission-critical problems, are actually doing it with on-premise hardware.

artificial intelligence, deep learning, machine learning, (13 more...)

#artificialintelligence

Country: North America > United States > New York (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Machine learning benchmarks: Hardware providers (part 1)

@machinelearnbotFeb-10-2018, 22:52:33 GMT

Each cell indicates the training time (in hours) and peak memory (in GB). Long table -- scroll to the right to see other platforms.

artificial intelligence, machine learning, platform, (18 more...)

@machinelearnbot

Industry: Information Technology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback