One week into my Research Science role at Lyft, I merged my first pull request into the Fraud team's code repository and deployed our fraud decision service. No, it wasn't to launch a groundbreaking user behavior activity-based convolutional recurrent neural network trained in a semi-supervised, adversarial fashion that challenges a user to prove her identity -- it would be a couple of years before that. Embarrassingly, it was to remove a duplicate line of feature coefficients in a hand-coded logistic regression model rolled out a little less than a year before. This small bug exposed a number of limitations of a system built primarily for a different type of usage -- that of business rules that encapsulate simple, human-readable handcrafted logic. In our old worldview, models were simply extensions of business rules.
Last week I mentioned I would be working through the new IEEE-CIS Fraud Detection Kaggle Competition as a mechanism for exploring some Data Science concepts. The first video walked you through setting up a Python environment and using version control for an ML project. This week, I have a new video that dives into the data and builds the first model using the Gradient Boosting library, Catboost.
Machine learning has become an invaluable tool in the fight against fraud. It combines computational statistics, artificial intelligence, signal processing, optimisation, and other methods to identify patterns. Machine learning has been a significant breakthrough in helping companies move from reactive to predictive by highlighting suspicious attributes or relationships that may be invisible to the naked eye but indicate a larger pattern of fraud. The great value of machine learning is the sheer volume of data that computers can analyse that humans cannot, thanks to a variety of pattern recognition algorithms. With this you can add exponentially more data to your analysis -- but selecting the right data and approach to model the problems is critical.
Not really, but ML and AI have been around longer than you think. Any look back at analytics in 2017 makes it clear that machine learning and artificial intelligence appear to be the'next big things' that can solve just about any problem, from writing new hit songs to curing disease. Not one to buy into the hype, I became curious as to why these topics have become the new darlings of the analytics world. Perhaps it's because nowadays everything has analytics, from niche solutions to management consultants and cloud service providers, analytics seem to be available everywhere in everything. What isn't so obvious, however, is how not new these types of analytics actually are.
The credit risk associated with repayment of loan is one of the highly significant risks that commercial banks are plagued with. It turns out to be even more significant as almost 40% of total revenue of commercial banks is generated from the bank's credit related assets. Therefore the importance of a credit analytics lies in building statistical models on available past data to forecast key parameters based on which business decisions can be taken. A significant amount of time is spent on collection of data. This data is used to refine statistical model to predict outcomes of various business decisions under different scenarios.