My Favorite Ten Dollar Statistics Word ©2022, George J. Irwin. All rights reserved. Way back when, a particularly long and not very well known word was called a “ten dollar word.” I’m not sure where this phrase originated, nor if it’s complementary to the person who is said to be using “ten dollar words.” I have a favorite of these from the statistics world: multicollinearity. I could give the textbook definition of this, but it’s easy enough to look up, although not as easy to understand. I know what it is and I don’t think it’s obvious from the definitions I’ve seen online. So let’s try an example, from an actual Statistics class assignment, a course about which I remember absolutely nothing else, by the way. You’re tasked with assisting in salary arbitration for a star sports athlete, who insists that when he plays, which is not in every game, there is a corresponding increase in attendance, hot dog sales, beer consumption, and souvenir revenue. Said player has conveniently provided to you all of the data that, according to him, clearly shows in a stastically valid manner that his being there made all the difference and makes him worth a seven figure salary offer. However, you are opposed by the owners of the team for which he competes, who have their own data including weather, day of the week, the game results, and a proxy for the popularity (or lack of) of the opposing team, who say that these are the true factors with respect to the same results and therefore this player’s incremental contribution is approximately nil. Both you and your opponent crunch all the available data and... ...it’s basically useless. By looking at the types of data involved, you might have already intuited this. What control does the star player have over the weather? Isn’t there a rather large relationship between how hot it is and how much beer is consumed? And isn’t there also a relationship between beer and hot dogs? What about if the weather is terrible, doesn’t that have an impact on attendance, which then affects beer and hot dog consumption? What if the game is a blowout and people leave early without bothering to pick up souvenirs on the way out? What if the opponent was mathematically eliminated on the first day of the season? (OK, I’m stretching it.) And does who’s playing in the game really matter in that situation? The net of it is that extricating all of these factors and their effect on each other in order to isolate just one “dependent variable,” as we in Black Belt Land would call it, is really, really difficult, or it leads to an answer that is of no practical value. With respect to the “statistically valid” amount to which the fictional player was entitled after doing the math: it was between three hundred thousand and six million dollars. I remember the opponents being on board: “Well, we’re ninety five percent confident of that too!” And thus, you have multicollinearity in action. Just another pothole to look out for on the road to becoming an effective Six Sigma Practicioner. That’ll be ten dollars, please. ... |