Math Mutation 89: Improbable Probabilities You may recall that way back in Episode 7, I talked about the fact that homeopathy, a strange European form of medicine that seems to be making a comeback in the U.S., violates basic laws of chemistry and mathematics. Yet I continue to hear otherwise educated people make statements like, "I read about a study that showed statistically significant benefits from homeopathy, so there must be something to it." But what does statistical significance mean? Can a form of medical treatment that is completely ridiculous still manage to get statistically significant results, publishable in peer-reviewed studies? Here we're ignoring other well-known factors, such as the known issues of researchers unconsciously influencing their data-collecting in the direction they want: you can check out the link to homeowatch.org in the show notes for many detailed scientific crtitiques of homeopathy. For this podcast, I'm just looking at the mathematical issue of statistical significance. Let's start by taking a step back and looking at what the phrase "statistically significant" means. Basically, it means you have calculated the probability that the results of your study would occur purely by chance, and they are small. For example, let's say you believe you have discovered that listening to Math Mutation grants you amazing mental powers, and you now believe you have the telekinetic ability to make all coins you flip land on heads. You flip four coins to test this, and indeed they are all heads! Does that prove your point? You might say yes , since you have only a 1/2 * 1/2 * 1/2 * 1/2, or 1 in 16 chance, of getting four heads in a row purely by luck. Perhaps you will publish a paper on this amazing experiment and use it on your website to sell magical Math Mutation CDs. But what if you have been absent-mindedly flipping sets of four coins in your living room all day. You're pretty sure your brain-improvement method works, but you think your cat staring at you can throw off your mental powers, so sometimes it doesn't work, for reasons totally beyond your control. In fact, on sixteen separate occasions you have tried this four-coin experiment. The fifteen times it didn't work, you blamed your cat. But the one time it did work, it supposedly "proved" your powers. That one time, you wrote down the results and published a paper on it. Now is your proof really valid? Surely it isn't, because with all those attempts, you were bound to get lucky at some point. But you probably won't go around telling everyone about all the failed attempts, because that was your cat's fault, so they really shouldn't count. Medical experiments can work bascially the same way. A bunch of trials are run, trying to cure people with some new treatment or a placebo. Using some standard statistical formulas, described in more detail at links in the show notes, the probablility of the results occuring by chance can be calculated, and typically a researcher checks that the results only had a 5% or 1% chance of occurring randomly. I think now you can see the problem. Suppose you have a crazy but emotionally satisfying therapy like homeopathy, and advocates all over the world are testing it, just like you with the many coin-flipping trials in your living room. If a 5% chance of random results makes it signficant, then you expect on average one in twenty studies to show random good results, purely by luck. If *all* studies are actually published, that might not be an issue. In general, though, studies are much more likely to be published if they show positive, rather than negative, results. Often the negative studies might be blamed on external factors or sloppy methdology, especially if organized by advocates of the treatment being tested. So you might never find out about the 19 negative studies that were done for every study with "statistically significant" results! How do we guard against this issue in general when doing some kind of statistical test? There are a few important things to look for. One is that the effect size should be large, reducing the chance that you are observing random fluctuations. Another is that they should have large sample sizes, again to significantly reduce the chance of pure luck. The experiments should be repeatable: other institutions should be able to repeat the same experiments with similar results. And probably most importantly, you should look for studies done by neutral, reputable institutions, that would be likely to report negative as well as positive results. You must also be sure to keep in mind that statistical significance alone is rarely enough to confirm a phenomenon, especially if it contradicts known scientific laws. Think about it: to become a "known scientific law", something must usually have been confirmed in hundreds or thousands of statistically significant experiments all over the world. This is certainly true of chemistry's molecular theory of matter, which directly contradicts the basic principles of homeopathy. So in the case of theories which violate known scientific laws, you need to compare a small set of supposedly significant experiments of a new phenomenon against the full weight of existing knowledge. Skeptics often like to summarize this principle as "extraordinary claims require extraordinary proof". One final thought on this: how sure can we be that conventional medicine is not contaminated by this same methodology issue? With increasing relationships between researchers and pharmaceutical companies these days, it's hard to always be sure. When I see TV commercials talking about how I should ask my doctor about using some hemmheroid pill to treat the newly discovered Wiggly Nose Syndrome, I do have to wonder whether they just ran lots and lots of studies on vaguely defined diseases, and latched on to the occasional statistically significant results they got by luck. And this has been your math mutation for today. References: