Back when I was doing the science research thing, the false allure of multiple comparisons testing was a well-known snare. We all knew that if you search through your data enough, you will always find statistically significant correlations, but those relationships are quite possibly a one-time fluke; that is, they just happened by chance. The only way to correct for that (and it's still not that good) is to substantially increase the stringency of the significance test for all comparisons you do, to make sure no spurious conclusions are drawn. That means some truly significant results may be missed, but that's your own damn fault for going on a significance fishing expedition, rather than constructing a proper hypothesis and testing it in the first place.
The large majority of the medical research results (especially anything nutrition-related) that I see reported in the news seem to have this problem. At best, they really mean something may be worthy of closer, controlled study.
(Via.)
David, David, David. Surely you know that statistical significance is a function of the use to which information will be put. For example, if a politician does not receive any phone calls from constituents or donations from lobbyists concerned about an issue, then that issue is statistically insignificant. Similarly, most news organizations will not report the findings of scientific studies that are uncertain or not entertaining or are presented in a measured way that is not readily spun into hysterical black/white conclusions about the manufactured controversy of the day, because such findings are statistically insignificant. And, of course, in a legal setting scientific information is only statistically significant if an attorney can convince twelve ignorant people to believe it in their hearts. It is long past time to update our understanding of statistical significance.
Posted by: Jonathan on March 4, 2007 07:44 AM
Yes, I suppose I should get with the program. A year or two ago, I engaged a Semi-Famous Internet Personality in a polite yet fruitless exchange on the proper meaning of significance and confidence intervals, and he seemed quite sure that all such trifling concerns could be simply and magically dealt with by liberal use of the term "Bayesian", and if I didn't see that, it only meant he hadn't explained himself smugly and condescendingly enough.
Posted by: David Fleck on March 4, 2007 07:18 PM
In the first Lancet study of Iraqi civilian war deaths, the result given in the report's summary (and of course the only result quoted in the press) was 100k. This value turned out to be the midpoint of a 95% confidence interval whose range was something like 8k-210k. (I may be off on the actual numbers but the point is still valid.) As I recall, many people defended the validity of the 100k figure on the grounds that the investigators had used the proper procedures, never mind that the data, methodology and results were all highly questionable.
The second Lancet study, building on the PR success of the first, reported something like 600k civilian deaths, but this figure was so obviously wacko (where did all the bodies go?) that it achieved relatively little traction.
Posted by: Jonathan on March 4, 2007 10:14 PM
Not just medical researchers. You find the same kinds of errors in many other fields. (And I once saw a horrifying report on research in psychology that found very high rates of fraud.)
BTW, though you are right to say that there are ways to cope with this problem statistically, I think that -- in the long run -- a broader attack is the better approach. That is, it is better to construct a theory that can be tested in a number of ways, rather than just relying on significance tests.
Granted, this is not always possible in some fields, at least for now, but that is what we should aim for.
Posted by: Jim Miller on March 5, 2007 07:49 AM
JM-
it is better to construct a theory that can be tested in a number of ways, rather than just relying on significance tests.Absolutely. For the same reason, it also is important that results, even if significant, make sense – that is, that they can correspond in some way with other, independent, results. Otherwise, they just float out there, weird and anomalous.
Posted by: David Fleck on March 6, 2007 06:32 AM