Goto

Collaborating Authors

Statistical Learning


Support Vector Machines -- the basics

#artificialintelligence

The important job that SVM's perform is to find a decision boundary to classify our data. This decision boundary is also called the hyperplane. Lets start with an example to explain it. Visually, if you look at figure 1, you will see that it makes sense for purple line to be a better hyperplane than the black line. The black line will also do the job, but skates a little to close to one of the red points to make it a good decision line.


Full cross-validation and generating learning curves for time-series models - KDnuggets

#artificialintelligence

Time series analysis is needed almost in any quantitative field and real-life systems that collect data over time, i.e., temporal datasets. Building predictive models on temporal datasets for the future evolution of systems in consideration are usually called forecasting. The validation of such models deviates from the standard holdout method of having random disjoint splits of train, test, and validation sets used in supervised learning. This stems from the fact that time series are ordered, and order induces all sorts of statistical properties that should be retained. For this reason, applying direct cross-validation to time-series model building is not possible and only restricted to out-of-sample (OOS) validation, using the end-tail of a temporal set as a single test set.


Skip-Gram Neural Network for Graphs

#artificialintelligence

This article will go into more details of node embeddings. If you lack intuition and understanding of node embeddings, check out this previous article that covered the intuition of node embeddings. But if you are ready. In the level-1 explanation of node embeddings I motivated why we need embeddings such that we have a vector form of graph data. Embeddings should capture the graph topology, relationships between nodes and further information.


What are Transformers models- part 3

#artificialintelligence

In the previous stories we discussed about Transformers models and their application and did some detailed discussion about the Encoder blocks architecture. In this article we are going to look more on Decoder blocks, the another main building block of the transformers. The architecture of the Decoder is similar to the Encoder model that we discussed previously. It consists of stack of decoders which are identical in structure. The output of encoder will pass it to the decoder as input as sequences and the process will continues until a specific symbol is reached that indicate that the output is completed eg: When we decode the sentence "Welcome to NYC." using decoder the each word will have a numerical representation or feature vectors as output by the decoder model and when the "." symbol passes to the decoder it identifies that the output is completed.


Mastering Machine Learning Explainability in Python

#artificialintelligence

For data scientists, a key part of interpreting machine learning models is understanding which factors impact predictions. In order to effectively use machine learning in their decision-making processes, companies need to know which factors are most important. For example, if a company wants to predict the likelihood of customer churn, it might also want to know what exactly drives a customer to leave a company. In this example, the model might indicate that customers who purchase products that rarely go on sale are much more likely to stop purchasing. Armed with this knowledge, a company can make smarter pricing decisions in the future.


Regression analysis using Python

#artificialintelligence

Purchasing a new or used automobile can be quite a tough approach should you not know what you will be undertaking. By educating yourself about auto shopping before you decide to visit the car dealership, you could make things easier for your self. The following advice can help the next store shopping journey be more enjoyable. Usually deliver a technician together when shopping for a brand new motor vehicle. Automobile sellers are popular for offering lemons and you do not want to be their after that sufferer.


Time series forecasting with random forest

#artificialintelligence

Benjamin Franklin said that only two things are certain in life: death and taxes. That explains why my colleagues at STATWORX were less than excited when they told me about their plans for the weekend a few weeks back: doing their income tax declaration. Man, I thought, that sucks, I'd rather spend this time outdoors. And then an idea was born. What could taxes and the outdoors possibly have in common?


Training data with Machine Learning and How it impacts Artificial Intelligence

#artificialintelligence

Machine Learning algorithms learn from data. They find relationships, develop understanding, make decisions, and evaluate their confidence from the training data they're given, and also, the better the training data is, the higher the model performs. The standard and quantity of your machine learning training data have the maximum amount to do with the success of your data project as the algorithms themselves. Firstly, it's important to possess a standard understanding of what we mean by the term dataset. The definition of a dataset is that it's both rows and columns, with each row containing one observation.


PySpark for Data Science - Advanced ($89.99 to FREE)

#artificialintelligence

This module in the PySpark tutorials section will help you learn about certain advanced concepts of PySpark. In the first section of these advanced tutorials, we will be performing a Recency Frequency Monetary segmentation (RFM). RFM analysis is typically used to identify outstanding customer groups further we shall also look at K-means clustering. Next up in these PySpark tutorials is learning Text Mining and using Monte Carlo Simulation from scratch. Pyspark is a big data solution that is applicable for real-time streaming using Python programming language and provides a better and efficient way to do all kinds of calculations and computations.


PySpark for Data Science - Intermediate ($89.99 to FREE)

#artificialintelligence

This module on PySpark Tutorials aims to explain the intermediate concepts such as those like the use of Spark session in case of later versions and the use of Spark Config and Spark Context in case of earlier versions. This will also help you in understanding how the Spark-related environment is set up, concepts of Broadcasting and accumulator, other optimization techniques include those like parallelism, tungsten, and catalyst optimizer. You will also be taught about the various compression techniques such as Snappy and Zlib. We will also understand and talk about the various Big data ecosystem related concepts such as HDFS and block storage, various components of Spark such as Spark Core, Mila, GraphX, R, Streaming, SQL, etc. and will also study the basics of Python language which is related and relevant to be used along with Apache Spark thereby making it Pyspark. We will learn the following in this course: -Regression -Linear Regression -Output Column -Test Data -Prediction -Generalized Linear Regression -Forest Regression -Classification -Binomial Logistic Regression -Multinomial Logistic Regression -Decision Tree -Random Forest -Clustering -K-Means Model Pyspark is a big data solution that is applicable for real-time streaming using Python programming language and provides a better and efficient way to do all kinds of calculations and computations.