data pipeline and machine learning
Creating a Fraud Risk Scoring Model Leveraging Data Pipelines and Machine Learning with Splunk
Now we will create a fraud risk scoring model based on anomaly detection in the different KPIs calculated in the previous section. To do that we will take 11 months of data and train the anomaly detection model. The ML tool to be used will be Splunk's Machine Learning Toolkit. The anomaly detector will be created for each KPI and each league based on its probability density function. The probability density function determines the probability of a value being in a certain range based on past information. Basically, it generates a baseline for your data. This makes it a great tool for finding anomalies as it allows you to quickly determine if data sits in an expected range or not and you can find out more about this algorithm at this blog about finding anomalies with Splunk.