I just finished a fabulous book,
Everybody Lies, written by Seth Stephens-Davidowitz. From the Amazon description of the book:
Everybody Lies offers fascinating, surprising, and sometimes laugh-out-loud insights into everything from economics to ethics to sports to race to sex, gender and more, all drawn from the world of big data. What percentage of white voters didn’t vote for Barack Obama because he’s black? Does where you go to school effect how successful you are in life? Do parents secretly favor boy children over girls? Do violent films affect the crime rate? Can you beat the stock market? How regularly do we lie about our sex lives and who’s more self-conscious about sex, men or women?
I particularly liked the metaphors that Stephens-Davidowitz uses to describe his results. For example, in describing why it is easy to come up with variables that correlate with the stock market, but hard to find ones that can make accurate predictions, he uses the metaphor of coin flipping:
Suppose your strategy for predicting the stock market is to find a lucky coin -- but one that will be found through careful testing. Here's your methodology: You label one thousand coins - 1 to 1,000. Every morning, for two years, you flip each coin, record whether it came up heads or tails, and then note whether the Standard & Poor's Index went up or down that day. You pore through all your data. And voila! You've found something. It turns out that 70.3 percent of the time when Coin 391 came up heads the S&P Index rose. The relationship is statistically significant! Highly so! You have found your lucky coin!
Just flip Coin 391 every morning and buy stocks whenever it comes up heads. Your days of Target T-shirts and ramen noodle dinners are over. Coin 391 is your ticket to the good life!
Every statistics user should know that when running 1000 hypothesis tests, on average 50 of them will show statistically significant results, even when there is no relationship. This is the size of Type I error (5%) in classical hypothesis testing.
Instead, split your sample in two and use half the data to "find" (estimate) one lucky coin; and the other half to test it.
BOTTOM LINE: the more tests you run, the more likely it is that at least one of them will show a statistically significant relationship, even if there is none. This is likely behind what has become known as the
replication crisis, that has hit the field of psychology particularly hard as only one third of the results from the most cited articles could be replicated. It is likely that academics are testing lots of hypotheses and publishing the few that turn out to be statistically significant. This is analogous to finding a lucky coin, as it only appears to be lucky. Once you test it outside the sample, the luck disappears.
TRUTH IN BLOGGING: the field of economics has its own replication crisis, only two thirds of top results could be replicated.