AITopics | pyspark

Collaborating Authors

pyspark

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Run secure processing jobs using PySpark in Amazon SageMaker Pipelines

#artificialintelligenceApr-11-2023, 16:47:11 GMT

Amazon SageMaker Studio can help you build, train, debug, deploy, and monitor your models and manage your machine learning (ML) workflows. Amazon SageMaker Pipelines enables you to build a secure, scalable, and flexible MLOps platform within Studio. In this post, we explain how to run PySpark processing jobs within a pipeline. This enables anyone that wants to train a model using Pipelines to also preprocess training data, postprocess inference data, or evaluate models using PySpark. This capability is especially relevant when you need to process large-scale data.

configuration, pipeline, processing job, (14 more...)

#artificialintelligence

Industry: Retail > Online (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

PySpark for Data Science. From definition, the differences with…

#artificialintelligenceApr-11-2023, 15:15:24 GMT

The Differences Between PySpark and Pandas 2. What is PySpark? 3. Why PySpark and What is PySpark used for? Pandas is one of the Python libraries that we often hear about and use. That is commonly used for data manipulation and analysis. Besides that, it also uses in Machine Learning and Data Science projects. It is a fast and efficient library that allows you to work with data in a variety of formats, such as CSV, JSON, Excel, SQL databases, and more. Pandas is designed for working with small to medium-sized datasets that can fit into memory.

data science, machine learning, pyspark, (3 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (0.64)
Information Technology > Software (0.42)
Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Front-End Big Data Engineer - PySpark at Logic20/20 Inc. - Seattle, WA, United States

#artificialintelligenceApr-5-2023, 22:06:24 GMT

We're a seven-time "Best Company to Work For," where intelligent, talented people come together to do outstanding work--and have a lot of fun while they're at it. Because we're a full-service consulting firm with a diverse client base, you can count on a steady stream of opportunities to work with cutting-edge technologies on projects that make a real difference. Logic20/20's Global Delivery Model creates a connected experience for Logicians across geographies. You'll have access to projects in different locations, the technology to support Connected Teams, and in-person and online culture events in our Connected Hub cities. Bring your skillset to an exciting and meaningful initiative where we are leveraging data science, artificial intelligence, and machine learning to mitigate wildfires. This is a highly visible, highly impactful project with implications for millions of customers.

front-end big data engineer, logic20 20, united states, (5 more...)

#artificialintelligence

Country: North America > United States > Washington > King County > Seattle (0.40)

Industry: Information Technology > Security & Privacy (0.31)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.53)

Add feedback

NLP and Customer Funnel: Using PySpark to Weight Events

#artificialintelligenceDec-20-2022, 01:20:36 GMT

The customer funnel, also known as the marketing funnel or sales funnel, is a conceptual model that represents the journey a customer goes through as they move from awareness of a product or service to the point of purchase. The funnel is usually depicted as a wide top that narrows as it progresses downward, with each stage representing a different phase in the customer's journey. Understanding the customer funnel can help businesses understand how to effectively market and sell their products or services and identify areas where they can improve the customer experience. TF-IDF, which stands for "term frequency-inverse document frequency," is a statistical measure that can be used to assign weights to words or phrases in a document. It is commonly used in information retrieval and natural language processing tasks, including text classification, clustering, and search. In the context of the customer funnel, TF-IDF could be used to weigh different events or actions that a customer takes as they move through the funnel.

customer funnel, funnel, nlp and customer funnel, (8 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.61)

Add feedback

Building A Machine Learning Pipeline Using Pyspark - Analytics Vidhya

#artificialintelligenceJun-10-2022, 17:55:41 GMT

This article was published as a part of the Data Science Blogathon. Spark is an open-source framework for big data processing. It was originally written in scala and later on due to increasing demand for machine learning using big data a python API of the same was released. So, Pyspark is a Python API for spark. Pyspark can effectively work with spark components such as spark SQL, Mllib, and Streaming that lets us leverage the true potential of Big data and Machine Learning.

analytic vidhya, pipeline, pyspark, (12 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Data Science & Deep Learning for Business 20 Case Studies

#artificialintelligenceMay-12-2022, 07:20:55 GMT

Welcome to the course on Data Science & Deep Learning for Business 20 Case Studies! This course teaches you how Data Science & Deep Learning can be used to solve real-world business problems and how you can apply these techniques to 20 real-world case studies. Traditional Businesses are hiring Data Scientists in droves, and knowledge of how to apply these techniques in solving their problems will prove to be one of the most valuable skills in the next decade! "I'm only half way through this course, but i have to say WOW. It's so far, a lot better than my Business Analytics MSc I took at UCL. The content is explained better, it's broken down so simply. Some of the Statistical Theory and ML theory lessons are perhaps the best on the internet! "It is pretty different in format, from others.

case study, data science, data science & deep learning, (12 more...)

#artificialintelligence

Country: North America > United States (0.05)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (0.40)
Education > Educational Setting > Online (0.40)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

Programming

#artificialintelligenceFeb-24-2022, 12:12:49 GMT

Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. Creators of Julia language claims Julia to be very fast, performance-wise as it does not follow the two language theory like Python, it is a compiled language whereas Python is an amalgamation of both compilation and interpretation.

#artificialintelligence

Technology: Information Technology > Artificial Intelligence (0.54)

Add feedback

Data Science & Deep Learning for Business 20 Case Studies

#artificialintelligenceJan-29-2022, 22:25:09 GMT

Data Science & Deep Learning for Business 20 Case Studies - Use Python to solve problems in Retail, Marketing, Product Recommendation, Customer Clustering, NLP, Forecasting & more! Machine Learning from Linear Regressions (polynomial & multivariate), K-NNs, Logistic Regressions, SVMs, Decision Trees & Random Forests Unsupervised Machine Learning with K-Means, Mean-Shift, DBSCAN, EM with GMMs, PCA and t-SNE Build a Product Recommendation Tool using collaborative & item/content based Hypothesis Testing and A/B Testing - Understand t-tests and p values Natural Langauge Processing - Summarize Reviews, Sentiment Analysis on Airline Tweets & Spam Detection To use Google Colab's iPython notebooks for fast, relaible cloud based data science work Deploy your Machine Learning Models on the cloud using AWS Advanced Pandas techniques from Vectorizing to Parallel Processsng Statistical Theory, Probability Theory, Distributions, Exploratory Data Analysis Predicting Employee Churn, Insurance Premiums, Airbnb prices, credit card fraud and who to target for donations Big Data skills using PySpark for Data Manipulation and Machine Learning Cluster customers based on Exploratory Data Analysis, then using K-Means to detect customer segments Build a Stock Trading Bot using re-inforement learning Apply Data Science & Analytics to Retail, performing segementation, analyzing trends, determining valuable customers and more! To use Google Colab's iPython notebooks for fast, relaible cloud based data science work Welcome to the course on Data Science & Deep Learning for Business 20 Case Studies! This course teaches you how Data Science & Deep Learning can be used to solve real-world business problems and how you can apply these techniques to 20 real-world case studies. Traditional Businesses are hiring Data Scientists in droves, and knowledge of how to apply these techniques in solving their problems will prove to be one of the most valuable skills in the next decade!

case study, data science, learning, (13 more...)

#artificialintelligence

Country: North America > United States (0.05)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Banking & Finance (1.00)
Information Technology > Services (0.55)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PySpark Tutorial

#artificialintelligenceOct-13-2021, 13:27:27 GMT

Pyspark is an Apache Spark which is an open-source cluster-computing framework for large-scale data processing written in Scala.

dataframe, pyspark, sparksession, (16 more...)

#artificialintelligence

Industry: Information Technology (0.49)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Architecture (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
(4 more...)

Add feedback

PySpark for Data Science - Advanced ($89.99 to FREE)

#artificialintelligenceJul-20-2021, 06:10:16 GMT

This module in the PySpark tutorials section will help you learn about certain advanced concepts of PySpark. In the first section of these advanced tutorials, we will be performing a Recency Frequency Monetary segmentation (RFM). RFM analysis is typically used to identify outstanding customer groups further we shall also look at K-means clustering. Next up in these PySpark tutorials is learning Text Mining and using Monte Carlo Simulation from scratch. Pyspark is a big data solution that is applicable for real-time streaming using Python programming language and provides a better and efficient way to do all kinds of calculations and computations.

data science, pyspark, pyspark tutorial, (3 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.58)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (0.40)
Education > Educational Setting > Online (0.40)

Technology:

Information Technology > Software > Programming Languages (0.74)
Information Technology > Architecture > Real Time Systems (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.58)
(2 more...)

Add feedback