One week into my Research Science role at Lyft, I merged my first pull request into the Fraud team's code repository and deployed our fraud decision service. No, it wasn't to launch a groundbreaking user behavior activity-based convolutional recurrent neural network trained in a semi-supervised, adversarial fashion that challenges a user to prove her identity -- it would be a couple of years before that. Embarrassingly, it was to remove a duplicate line of feature coefficients in a hand-coded logistic regression model rolled out a little less than a year before. This small bug exposed a number of limitations of a system built primarily for a different type of usage -- that of business rules that encapsulate simple, human-readable handcrafted logic. In our old worldview, models were simply extensions of business rules.
Last week I mentioned I would be working through the new IEEE-CIS Fraud Detection Kaggle Competition as a mechanism for exploring some Data Science concepts. The first video walked you through setting up a Python environment and using version control for an ML project. This week, I have a new video that dives into the data and builds the first model using the Gradient Boosting library, Catboost.
Not really, but ML and AI have been around longer than you think. Any look back at analytics in 2017 makes it clear that machine learning and artificial intelligence appear to be the'next big things' that can solve just about any problem, from writing new hit songs to curing disease. Not one to buy into the hype, I became curious as to why these topics have become the new darlings of the analytics world. Perhaps it's because nowadays everything has analytics, from niche solutions to management consultants and cloud service providers, analytics seem to be available everywhere in everything. What isn't so obvious, however, is how not new these types of analytics actually are.
We at Sift Science provide fraud detection for hundreds of customers spanning many industries and use cases. To do this, we have devised a specialized modeling stack that is able to adapt to individual customers while simultaneously delivering a great out-of-box experience for new customers, achieved by mixing the output from a "global" model – trained on our entire network of data – with the output from a customer's individualized model. Prior to decision forests, we used a custom-built logistic regression classifier combined with highly specialized feature engineering for our global model. While logistic regression has many great attributes, it is fundamentally limited by its inability to model non-linear interactions between features. At Sift, we tend to think of our modeling stack primarily as an enabler of our feature engineering; more powerful modeling allows us to extract the most insight from our features and can even lead to new classes of features.
When I first spotted Shift Technology with their focus on fraud detection for insurance, I assumed I would find a venture in Israel (which is known for smarts in finding the bad guys in cyberspace, as we outlined when we went to Israel on our Fintech global tour). So I was surprised to find that Shift Technology is a Paris based venture. There is a lot more tech innovation in France than the image of economic sclerosis would lead you to assume. The next thing that jumps out at you is that they recently closed a 10m Series A round in a tough market from a top tier VC (Accel Partners). So they must be doing something right.