We can also see this visually. We can verify the convergence of the chains formally using the Gelman Rubin test. Values close to 1.0 mean convergence. We can also test for correlation between samples in the chains. We are aiming for zero auto-correlation to get "random" samples from the posterior distribution. From these plots we see that the auto-correlation is not problematic.

In a supervised learning setting, we have a yardstick or plumbline to judge how well we are doing: the response itself. A frequent question in biological and biomedical applications is whether a property of interest (say, disease type, cell type, the prognosis of a patient) can be "predicted", given one or more other properties, called the predictors. Often we are motivated by a situation in which the property to be predicted is unknown (it lies in the future, or is hard to measure), while the predictors are known. The crucial point is that we learn the prediction rule from a set of training data in which the property of interest is also known. Once we have the rule, we can either apply it to new data, and make actual predictions of unknown outcomes; or we can dissect the rule with the aim of better understanding the underlying biology. Compared to unsupervised learning and what we have seen in Chapters 5, 7 and 9, where we do not know what we are looking for or how to decide whether our result is "right", we are on much more solid ground with supervised learning: the objective is clearly stated, and there are straightforward criteria to measure how well we are doing. The central issues in supervised learning151151 Sometimes the term statistical learning is used, more or less exchangeably. Or did our rule indeed pick up some of the pertinent patterns in the system being studied, which will also apply to yet unseen new data? An example for overfitting: two regression lines are fit to data in the \((x, y)\)-plane (black points). We can think of such a line as a rule that predicts the \(y\)-value, given an \(x\)-value. Both lines are smooth, but the fits differ in what is called their bandwidth, which intuitively can be interpreted their stiffness. The blue line seems overly keen to follow minor wiggles in the data, while the orange line captures the general trend but is less detailed. The effective number of parameters needed to describe the blue line is much higher than for the orange line. Also, if we were to obtain additional data, it is likely that the blue line would do a worse job than the orange line in modeling the new data. We'll formalize these concepts –training error and test set error– later in this chapter. Although exemplified here with line fitting, the concept applies more generally to prediction models. See exemplary applications that motivate the use of supervised learning methods.

Last night on the train I read this nice paper by David Duvenaud and colleagues. So I thought it's time for a David Duvenaud birthday special (don't get too excited David, I won't make it an annual tradition...) I recently covered iMAML: the meta-learning algorithm that makes use of implicit gradients to sidestep backpropagating through the inner loop optimization in meta-learning/hyperparameter tuning. The method presented in (Lorraine et al, 2019) uses the same high-level idea, but introduces a different - on the surface less fiddly - approximation to the crucial inverse Hessian. I won't spend a lot of time introducing the whole meta-learning setup from scratch, you can use the previous post as a starting point. Many - though not all - meta-learning or hyperparameter optimization problems can be stated as nested optimization problems.

For understanding the concept of regularization and its link with Machine Learning, we first need to understand why do we need regularization. We all know Machine learning is about training a model with relevant data and using the model to predict unknown data. By the word unknown, it means the data which the model has not seen yet. We have trained the model, and are getting good scores while using training data. But during the process of prediction, we found that the model is underperforming when compared to the training part. Now, this may be a case of over-fitting(about which I will be explaining below) which is causing incorrect prediction by the model.

Lee, Dae Hyun (University of Washington) | Horvitz, Eric (Microsoft Research)

Patients in intensive care units (ICU) are acutely ill and have the highest mortality rates for hospitalized patients. Predictive models and planning system could forecast and guide interventions to prevent the hazardous deterioration of patients’ physiologies, thereby giving the opportunity of employing machine learning and inference to assist with the care of ICU patients. We report on the construction of a prediction pipeline that estimates the probability of death by inferring rates of hazard over time, based on patients’ physiological measurements. The inferred model provided the contribution of each variable and information about the influence of sets of observations on the overall risks and expected trajectories of patients.