single prediction
Reviews: Explaining Deep Learning Models -- A Bayesian Non-parametric Approach
I think the rebuttal is prepared very well. Although the assumption of a single component approximating the local decision boundary is quite strong, the paper nonetheless offers a good, systematic approach to interpreting black box ML systems. It is an important topic and I don't see a lot of studies in this area. Overview In an effort to improve scrutability (ability to extract generalizable insight) and explainability of a black box target learning algorithm the current paper proposes to use infinite Dirichlet mixture models with multiple elastic nets (DMM-MEN) to map the inputs to the predicted outputs. Any target model can be approximated by a non-parametric Bayesian regression mixture model.
Gradient Boosting with Scikit-Learn, XGBoost, LightGBM, and CatBoost
Gradient boosting is a powerful ensemble machine learning algorithm. It's popular for structured predictive modeling problems, such as classification and regression on tabular data, and is often the main algorithm or one of the main algorithms used in winning solutions to machine learning competitions, like those on Kaggle. There are many implementations of gradient boosting available, including standard implementations in SciPy and efficient third-party libraries. Each uses a different interface and even different names for the algorithm. In this tutorial, you will discover how to use gradient boosting models for classification and regression in Python. Standardized code examples are provided for the four major implementations of gradient boosting in Python, ready for you to copy-paste and use in your own predictive modeling project.
Building a Serverless Machine Learning API using ML.NET and Azure Functions
With the release of ML.NET, a API that C# developers can use to infuse their applications with machine learning capability, I've been keen to combine my knowledge of Azure Functions with the API to build some wacky serverless machine learning applications that would allow me to enhance my GitHub profile and cater to all the buzzword enthusiasts out there! This post won't be a tutorial. I'm writing this more as a retrospective of the design decisions I took while building the application and the things I learnt about how different components work. Should you read this and decide to build upon it for your real world applications, hopefully you can apply what I've learnt in your projects or better yet, expand on the ideas and scenarios I was working with. I'll be focusing more on what I learnt about the ML.NET API itself rather than spending too much time about how Azure Functions work.
Leveraging Clinical Time-Series Data for Prediction: A Cautionary Tale
Sherman, Eli, Gurm, Hitinder, Balis, Ulysses, Owens, Scott, Wiens, Jenna
In healthcare, patient risk stratification models are often learned using time-series data extracted from electronic health records. When extracting data for a clinical prediction task, several formulations exist, depending on how one chooses the time of prediction and the prediction horizon. In this paper, we show how the formulation can greatly impact both model performance and clinical utility. Leveraging a publicly available ICU dataset, we consider two clinical prediction tasks: in-hospital mortality, and hypokalemia. Through these case studies, we demonstrate the necessity of evaluating models using an outcome-independent reference point, since choosing the time of prediction relative to the event can result in unrealistic performance. Further, an outcome-independent scheme outperforms an outcome-dependent scheme on both tasks (In-Hospital Mortality AUROC .882 vs. .831; Serum Potassium: AUROC .829 vs. .740) when evaluated on test sets that mimic real-world use.
Automated Topic Modeling Workflows Done Right
In our previous blog posts of this series, we have introduced Topic Models, BigML's latest resource that helps you find thematically related terms in your unstructured text data, explained how to use it through the BigML Dashboard and the API, and lastly showed how to apply Topic Models in a real-life use case. This post will focus on automating LDA workflows by using WhizzML, a DSL for Machine Learning that provides programmatic support for all the resources you work with in our platform. Let's dive in by creating a Topic Model and making a prediction with it. In BigML, you can perform single instance predictions (referred to as a Topic Distribution) or in batch mode, which is called Batch Topic Distribution. Firstly, we will create a Topic Model without specifying any particular configuration option, that is, relying on default settings.
Random Forest – The Bayesian Quest
In the first part of this series we set the context for Random Forest algorithm by introducing the tree based algorithm for classification problems. In this post we will look at some of the limitations of the tree based model and how they were overcome paving the way to a powerful model – Random Forest. Two major methods that were employed to overcome those pitfalls are Bootstrapping and Bagging. We will discuss them first before delving into random forest. When we discussed the tree based model we saw that such models are very intuitive i.e. they are easy to interpret.