The Economist highlights the problems with peer review:
“Consider 1,000 hypotheses being tested of which just 100 are true (see chart). Studies with a power of 0.8 will find 80 of them, missing 20 because of false negatives. Of the 900 hypotheses that are wrong, 5%—that is, 45 of them—will look right because of type I errors. Add the false positives to the 80 true positives and you have 125 positive results, fully a third of which are specious. If you dropped the statistical power from 0.8 to 0.4, which would seem realistic for many fields, you would still have 45 false positives but only 40 true positives. More than half your positive results would be wrong.”
Kahnemann’s predicted crisis pertains to priming studies that purport to show a variety of small, almost unnoticeable environmental factors can have an impact on people’s behavior. One of the best examples is the research that showed that a subject exposed to lots of words pertaining to old age (e.g., grey, slow, addled) would walk more slowly afterward, presumably because they were primed to act older. (Hasn’t been replicated, btw.)
Russ Roberts also takes on the difficulties of deriving life tips from the peer reviewed literature in his discussion with Emily Oster about her book debunking many pregnancy myths. As Russ points out, a p value < .05 says nothing about the size of the effect that was found. (An extremely low p value might apply to an observed effect that is so small it has no implications for anything.)
And that’s on top of all the mathematical problems with the .05 threshold.