How Concerned Should You be About Predictor Collinearity? It Depends…

#artificialintelligence 

This past Northern Hemisphere summer, I gave several talks (some in the Southern Hemisphere) in which one of the Q&A topics was the problem of collinearity between predictor variables (also known as multicollinearity). My stock response to a question on this topic was (and is) to reply with the clarifying question, "How many rows do you have to develop the model?" If the follow-up response was in the tens of thousands, my counter-response was "Don't worry about collinearity." In contrast, if the audience member's response was a few hundred rows or less, my response was "Very!" While these two different responses may seem contradictory, they actually are not.