to

### Making Sense of Random Forest Probabilities: a Kernel Perspective

A random forest is a popular tool for estimating probabilities in machine learning classification tasks. However, the means by which this is accomplished is unprincipled: one simply counts the fraction of trees in a forest that vote for a certain class. In this paper, we forge a connection between random forests and kernel regression. This places random forest probability estimation on more sound statistical footing. As part of our investigation, we develop a model for the proximity kernel and relate it to the geometry and sparsity of the estimation problem. We also provide intuition and recommendations for tuning a random forest to improve its probability estimates.

### What Is Probability?

Uncertainty involves making decisions with incomplete information, and this is the way we generally operate in the world. Handling uncertainty is typically described using everyday words like chance, luck, and risk. Probability is a field of mathematics that gives us the language and tools to quantify the uncertainty of events and reason in a principled manner. In this post, you will discover a gentle introduction to probability. Photo by Emma Jane Hogbin Westby, some rights reserved.

### Clustering methods for unsupervised machine learning

Now we have the probability that each data point belongs to each cluster. If we need hard cluster assignments, we can just choose for each data point to belong to the cluster with the highest probability. But the nice thing about EM is that we can embrace the fuzziness of the cluster membership. We can look at a data point and consider the fact that while it most likely belongs to Cluster B, it's also quite likely to belong to Cluster D. This also takes into account the fact that there may not be clear cut boundaries between our clusters. These groups consist of overlapping multi-dimensional distributions, so drawing clear cut lines might not always be the best solution.

### MQLV: Modified Q-Learning for Vasicek Model

In a reinforcement learning approach, an optimal value function is learned across a set of actions, or decisions, that leads to a set of states giving different rewards, with the objective to maximize the overall reward. A policy assigns to each state-action pairs an expected return. We call an optimal policy a policy for which the value function is optimal. QLBS, Q-Learner in the Black-Scholes(-Merton) Worlds, applies the reinforcement learning concepts, and noticeably, the popular Q-learning algorithm, to the financial stochastic model described by Black, Scholes and Merton. However, QLBS is specifically optimized for the geometric Brownian motion and the pricing of vanilla options. Consequently, it suffers from the traditional over-estimation of the Q-values reflected by an over-estimation of the vanilla option prices. Furthermore, its range of application is limited to vanilla option pricing within the financial markets. We propose MQLV, Modified Q-Learner for the Vasicek model, a new reinforcement learning approach that limits the Q-values over-estimation observed in QLBS and extends the simulation to mean reverting stochastic diffusion processes. Additionally, MQLV uses a digital function to estimate the future probability of an event, thus widening the scope of the financial application to any other domain involving time series. Our experiments underline the potential of MQLV on generated Monte Carlo simulations, particularly representative of the retail banking time series. In particular, MQLV is able to determine the optimal policy of money management based on the aggregated financial transactions of the clients, unlocking new frontiers to establish personalized credit card limits or loans. Finally, MQLV is the first methodology compatible with the Vasicek model capable of an event probability estimation targeting simulation of event probabilities in retail banking.

### Direct Uncertainty Estimation in Reinforcement Learning

Optimal probabilistic approach in reinforcement learning is computationally infeasible. Its simplification consisting in neglecting difference between true environment and its model estimated using limited number of observations causes exploration vs exploitation problem. Uncertainty can be expressed in terms of a probability distribution over the space of environment models, and this uncertainty can be propagated to the action-value function via Bellman iterations, which are computationally insufficiently efficient though. We consider possibility of directly measuring uncertainty of the action-value function, and analyze sufficiency of this facilitated approach.