Goto

Collaborating Authors

Results


Credit Card Fraud Detection

#artificialintelligence

Fraud detection is the most important step for a risk management process to prevent a recurrence. High volumes of fraud can be damaging revenue and reputation. Fortunately, it is possible to deal with fraud before it happens. Therefore, I would like to investigate the performance of the machine learning algorithms on a credit card fraud data set. The dataset contains transactions made by credit cards in September 2013 by European cardholders.


Modelling Credit Card Fraud Detection

#artificialintelligence

Credit card frauds are a "still growing" problem in the world. Losses in frauds were estimated in more than US$27 billion in 2018 and are still projected to grow significantly for the next years as this article shows. With more and more people using credit cards in their daily routine, also increased the interest of criminals in opportunities to make money from that. The development of new technologies puts both criminals and credit card companies in a constant race to improve their systems and techniques. With that amount of money at stake, Machine Learning is surely not a new word for credit card companies, which have been investing on that long before it was a trend, to create and optimize models of risk and fraud management.


Fraud detection: the problem, solutions and tools

#artificialintelligence

"Fraud is a billion-dollar business There are many formal definitions but essentially a fraud is an "art" and crime of deceiving and scamming people in their financial transactions. Frauds have always existed throughout human history but in this age of digital technology, the strategy, extent and magnitude of financial frauds is becoming wide-ranging -- from credit cards transactions to health benefits to insurance claims. Fraudsters are also getting super creative. Who's never received an email from a Nigerian royal widow that she's looking for trusted someone to hand over large sums of her inheritance? No wonder why is fraud a big deal.


Self-paced Ensemble for Highly Imbalanced Massive Data Classification

arXiv.org Artificial Intelligence

--Many real-world applications reveal difficulties in learning classifiers from imbalanced data. The rising big data era has been witnessing more classification tasks with large-scale but extremely imbalance and low-quality datasets. Most of existing learning methods suffer from poor performance or low computation efficiency under such a scenario. T o tackle this problem, we conduct deep investigations into the nature of class imbalance, which reveals that not only the disproportion between classes, but also other difficulties embedded in the nature of data, especially, noises and class overlapping, prevent us from learning effective classifiers. T aking those factors into consideration, we propose a novel framework for imbalance classification that aims to generate a strong ensemble by self-paced harmonizing data hardness via under-sampling. Extensive experiments have shown that this new framework, while being very computationally efficient, can lead to robust performance even under highly overlapping classes and extremely skewed distribution. Note that, our methods can be easily adapted to most of existing learning methods (e.g., C4.5, SVM, GBDT and Neural Network) to boost their performance on imbalanced data. I NTRODUCTION The development of information technology brings the explosion of massive data in our daily life. However, many real applications usually generate very imbalanced datasets for corresponding key classification tasks. For instance, online advertising services can give rise to a high amount of datasets, consisting of user views or clicks on ads, for the task of click-through rate prediction [1]. Commonly, user clicks only constitute a small rate of user behaviors . For another example, credit fraud detection [2] relies on the dataset containing massive real credit card transactions where only a small proportion are frauds. Similar situations also exist in the tasks of medical diagnosis, record linkage and network intrusion detection etc [3]-[5]. In addition, real-world datasets are likely to contain other difficulty factors, including noises and missing values.


Towards automated feature engineering for credit card fraud detection using multi-perspective HMMs

arXiv.org Machine Learning

Machine learning and data mining techniques have been used extensively in order to detect credit card frauds. However, most studies consider credit card transactions as isolated events and not as a sequence of transactions. In this framework, we model a sequence of credit card transactions from three different perspectives, namely (i) The sequence contains or doesn't contain a fraud (ii) The sequence is obtained by fixing the card-holder or the payment terminal (iii) It is a sequence of spent amount or of elapsed time between the current and previous transactions. Combinations of the three binary perspectives give eight sets of sequences from the (training) set of transactions. Each one of these sequences is modelled with a Hidden Markov Model (HMM). Each HMM associates a likelihood to a transaction given its sequence of previous transactions. These likelihoods are used as additional features in a Random Forest classifier for fraud detection. Our multiple perspectives HMM-based approach offers automated feature engineering to model temporal correlations so as to improve the effectiveness of the classification task and allows for an increase in the detection of fraudulent transactions when combined with the state of the art expert based feature engineering strategy for credit card fraud detection. In extension to previous works, we show that this approach goes beyond ecommerce transactions and provides a robust feature engineering over different datasets, hyperparameters and classifiers. Moreover, we compare strategies to deal with structural missing values.


Detecting Credit Card Fraud Using Machine Learning

#artificialintelligence

This article describes my machine learning project on credit card fraud. If you are interested in the code, you can find my notebook here. Ever since starting my journey into data science, I have been thinking about ways to use data science for good while generating value at the same time. Thus, when I came across this data set on Kaggle dealing with credit card fraud detection, I was immediately hooked. The data set has 31 features, 28 of which have been anonymized and are labeled V1 through V28.


Dataset shift quantification for credit card fraud detection

arXiv.org Artificial Intelligence

Machine learning and data mining techniques have been used extensively in order to detect credit card frauds. However purchase behaviour and fraudster strategies may change over time. This phenomenon is named dataset shift or concept drift in the domain of fraud detection. In this paper, we present a method to quantify day-by-day the dataset shift in our face-to-face credit card transactions dataset (card holder located in the shop) . In practice, we classify the days against each other and measure the efficiency of the classification. The more efficient the classification, the more different the buying behaviour between two days, and vice versa. Therefore, we obtain a distance matrix characterizing the dataset shift. After an agglomerative clustering of the distance matrix, we observe that the dataset shift pattern matches the calendar events for this time period (holidays, week-ends, etc). We then incorporate this dataset shift knowledge in the credit card fraud detection task as a new feature. This leads to a small improvement of the detection.


Detecting Credit Card Fraud Using Machine Learning – Towards Data Science

#artificialintelligence

This article describes my machine learning project on credit card fraud. If you are interested in the code, you can find my notebook here. Ever since starting my journey into data science, I have been thinking about ways to use data science for good while generating value at the same time. Thus, when I came across this data set on Kaggle dealing with credit card fraud detection, I was immediately hooked. The data set has 31 features, 28 of which have been anonymized and are labeled V1 through V28.


Adapting to Concept Drift in Credit Card Transaction Data Streams Using Contextual Bandits and Decision Trees

AAAI Conferences

Credit card transactions predicted to be fraudulent by automated detection systems are typically handed over to human experts for verification. To limit costs, it is standard practice to select only the most suspicious transactions for investigation. We claim that a trade-off between exploration and exploitation is imperative to enable adaptation to changes in behavior (concept drift). Exploration consists of the selection and investigation of transactions with the purpose of improving predictive models, and exploitation consists of investigating transactions detected to be suspicious. Modeling the detection of fraudulent transactions as rewarding, we use an incremental Regression Tree learner to create clusters of transactions with similar expected rewards. This enables the use of a Contextual Multi-Armed Bandit (CMAB) algorithm to provide the exploration/exploitation trade-off. We introduce a novel variant of a CMAB algorithm that makes use of the structure of this tree, and use Semi-Supervised Learning to grow the tree using unlabeled data. The approach is evaluated on a real dataset and data generated by a simulator that adds concept drift by adapting the behavior of fraudsters to avoid detection. It outperforms frequently used offline models in terms of cumulative rewards, in particular in the presence of concept drift.


Data Science with Python: Exploratory Analysis with Movie-Ratings and Fraud Detection with Credit-Card Transactions

@machinelearnbot

The following problems are taken from the projects / assignments in the edX course Python for Data Science and the coursera course Applied Machine Learning in Python (UMich).