We provide a brief tutorial on the use of concentration inequalities as they apply to system identification of state-space parameters of linear time invariant systems, with a focus on the fully observed setting. We draw upon tools from the theories of large-deviations and self-normalized martingales, and provide both data-dependent and independent bounds on the learning rate. I. INTRODUCTION A key feature in modern reinforcement learning is the ability to provide high-probability guarantees on the finite-data/time behavior of an algorithm acting on a system. The enabling technical tools used in providing such guarantees are concentration of measure results, which should be interpreted as quantitative versions of the strong law of large numbers. This paper provides a brief introduction to such tools, as motivated by the identification of linear-time-invariant (LTI) systems.

Follow the steps below to understand the algorithm - Create duplicate copies of all independent variables. When the number of independent variables in the original data is less than 5, create at least 5 copies using existing variables. Shuffle the values of added duplicate copies to remove their correlations with the target variable. It is called shadow features or permuted copies. Combine the original ones with shuffled copies Run a random forest classifier on the combined dataset and performs a variable importance measure (the default is Mean Decrease Accuracy) to evaluate the importance of each variable where higher means more important.

Probability mass function is recognized as a probability that is distributed over discrete variables. First, probability mass function is always denoted with the capital P. Second, each random variable with a different probability mass function will be identified by the random variable. P(x) is not the same as P(y). Third, P(X x) is the same as P(x). Fourth, probability mass functions can act on many variables all that the same time, this is called joint probability distribution: P(X x, Y y) means that the probability that X x and Y y at the same time.

It's a known fact that bagging (an ensemble technique) works well on unstable algorithms like decision trees, artificial neural networks and not on stable algorithms like Naive Bayes. The well known ensemble algorithm Random forest thrives on the ability of bagging technique which leverages the'instability' of decisions trees, to help build a better classifier. Even though, random forest attempts to handle the issues caused by highly correlated trees, does it completely solve the issue? Can the decision trees be made more unstable than what random forest does, so that the learner be even more accurate? If trees are sufficiently deep, they have very low bias.