Machine learning and data mining techniques have been used extensively in order to detect credit card frauds. However, most studies consider credit card transactions as isolated events and not as a sequence of transactions. In this framework, we model a sequence of credit card transactions from three different perspectives, namely (i) The sequence contains or doesn't contain a fraud (ii) The sequence is obtained by fixing the card-holder or the payment terminal (iii) It is a sequence of spent amount or of elapsed time between the current and previous transactions. Combinations of the three binary perspectives give eight sets of sequences from the (training) set of transactions. Each one of these sequences is modelled with a Hidden Markov Model (HMM). Each HMM associates a likelihood to a transaction given its sequence of previous transactions. These likelihoods are used as additional features in a Random Forest classifier for fraud detection. Our multiple perspectives HMM-based approach offers automated feature engineering to model temporal correlations so as to improve the effectiveness of the classification task and allows for an increase in the detection of fraudulent transactions when combined with the state of the art expert based feature engineering strategy for credit card fraud detection. In extension to previous works, we show that this approach goes beyond ecommerce transactions and provides a robust feature engineering over different datasets, hyperparameters and classifiers. Moreover, we compare strategies to deal with structural missing values.
The majority of speech recognition systems today commonly use Hidden Markov Models (HMMs) as acoustic models in systems since they can powerfully train and map a speech utterance into a sequence of units. Such systems perform even better if the units are context-dependent. Analogously, when HMM techniques are applied to the problem of articulatory feature extraction, contextdependent articulatory features should definitely yield a better result. This paper shows a possible strategy to extend a typical HMM-based articulatory feature extraction system into a context-dependent version which exhibits higher accuracy.
We covered various feature engineering strategies for dealing with structured continuous numeric data in the previous article in this series. In this article, we will look at another type of structured data, which is discrete in nature and is popularly termed as categorical data. Dealing with numeric data is often easier than categorical data given that we do not have to deal with additional complexities of the semantics pertaining to each category value in any data attribute which is of a categorical type. We will use a hands-on approach to discuss several encoding schemes for dealing with categorical data and also a couple of popular techniques for dealing with large scale feature explosion, often known as the "curse of dimensionality".
Lots of people have different definitions for feature engineering and preprocessing, so how does HyperparameterHunter define it? We're working with a very broad definition for "feature engineering", hence the blurred line between itself and "preprocessing". We consider "feature engineering" to be any modifications applied to data before model fitting -- whether performed once on Experiment start, or repeated for every fold in cross-validation. Technically, though, HyperparameterHunter lets you define the particulars of "feature engineering" for yourself, which we'll see soon. Here are a few things that fall under our umbrella of "feature engineering": A fair question since Feature Engineering is rarely a topic in hyperparameter optimization.
Predictive modeling is a formula that transforms a list of input fields or variables into some output of interest. Feature engineering is simply a thoughtful creation of new input fields from existing input fields, either in an automated fashion or manually, with valuable inputs from domain expertise, logical reasoning, or intuition. The new input fields could result in better inferences and insights from data and exponentially increase the performance of predictive models. Feature engineering is one of the most important parts of the data preparation process, where deriving new and meaningful variables takes place. Feature engineering enhances and enriches the ingredients needed for creating a robust model.