Nethra Sambamoorthi, PhD on LinkedIn: #datasciences #machinelearning #artificialintelligence
Productive Analytics managers are investigative analytics managers. Top 5 models one should try are (1) XGboost, (2) L1, L2, glmnet regressions, (3) random forest, (4) SVM, (5) regularized NN, if time permits. Usually XGBoost and glmnet (hybrid of L1 and L2), and SVM should be good for most of the purposes that also provides interpretable models, while NN is good for voice/txt/images/videos. - Ask to verify "Be very clear as to which one is "1" or " " - is it target or reference in binary target variable? Also whether confusion matrix takes care of that correctly and hence whether the "key" ratios of row percentages or column percentages . For various internal programming setup in a software, different software or the modeling specific API deals may deal with this differently. Top models will use the leaky data rather than be good general model of the underlying problem. When you are a company providing your data. Reversing an anonymization and obfuscation can result in a privacy breach that you did not expect. It is a problem when you are developing your own predictive models. You may be creating overly optimistic models that are practically useless and cannot be used in production. " The solution is to use proper verification of actual distributional values of validation data set and the inherent data distributions of the features - For more, see: https://lnkd.in/gyWhs2sg
Sep-1-2022, 16:31:39 GMT