How to balance transformation decisions, feature selection, and model tuning vs time in text analytics?
Being to new text analytics, I haven't gotten the hang of my typical ML workflow given how long processes take to run in the commonly large feature space of text analytics. I would like to know what the typical strategy is to balance effort/time in terms of optimizing transformation decision, feature down-selection, and model tuning. In an effort to get a sense of which of the decision points above I should run further tuning on, I ran untuned RF, Logistic, Naive Bayes, SGD, and KNN models on (with cross validation). No clear decision point was commonly "better" in the resulting f-1 scores, and the difference is often noteworthy. As I have no bias towards a particular algorithm type (only the best f-1 score), I'm stuck in a quandry-- I have not successfully narrowed my decision space enough.
Oct-27-2020, 19:10:04 GMT
- Technology: