Statistical failure of A Population-Based Case–Control Study of Extreme Summer Temperature and Birth
Guest Post by Willis Eschenbach
The story of how global warming causes congenital cataracts in newborns babies has been getting wide media attention. So I thought I’d take a look at the study itself. It’s called A Population-Based Case–Control Study of Extreme Summer Temperature and Birth Defects, and it is available from the usually-scientific National Institutes of Health here.
Figure 1. Dice with various numbers of sides. SOURCE
I have to confess, I laughed out loud when I read the study. Here’s what I found so funny.
When doing statistics, one thing you have to be careful about is whether your result happened by pure random chance. Maybe you just got lucky. Or maybe that result you got happens by chance a lot.
Statisticians use the “p-value” to estimate how likely it is that the result occurred by random chance. A small p-value means it is unlikely that it occurred by chance. The p-value is the odds (as a percentage) that your result occurred by random chance. So a p-value less than say 0.05 means that there is less than 5% odds of that occurring by random chance.
This 5% level is commonly taken to be a level indicating what is called “statistical significance”. If the p-value is below 0.05, the result is deemed to be statistically significant. However, there’s nothing magical about 5%, some scientific fields more commonly use a stricter criteria of 1% for statistical significance. But in this study, the significance level was chosen as a p-value less than 0.05.
Another way of stating this same thing is that a p-value of 0.05 means that one time in twenty (1.0 / 0.05), the result you are looking for will occur by random chance. Once in twenty you’ll get what is called a “false positive”—the bell rings, but it is not actually significant, it occurred randomly.
Here’s the problem. If I have a one in twenty chance of a false positive when looking at one single association (say heat with cataracts), what are my odds of finding a false positive if I look at say five associations (heat with spina bifida, heat with hypoplasia, heat with cataracts, etc.)? Because obviously, the more cases I look at, the greater my chances are of hitting a false positive.
To calculate that, the formula that gives the odds of finding at least one false positive is
FP = 1 – (1 – p)N
where FP is the odds of finding a false positive, p is the p-value (in this case 0.05), and N is the number of trials. For my example of five trials, that gives us
FP = 1 – (1 – 0.05)5 = 0.22
So about one time in five (22%) you’ll find a false positive using a p-value of 0.05 and five trials.
How does this apply to the cataract study?
Well, to find the one correlation that was significant at the 0.05 level, they compared temperature to no less than 28 different variables. As they describe it (emphasis mine):
Outcome assessment. Using International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM; Centers for Disease Control and Prevention 2011a) diagnoses codes from the CMR records, birth defect cases were classified into the 45 birth defects categories that meet the reporting standards of the National Birth Defects Prevention Network (NBDPN 2010). Of these, we selected the 28 groups of major birth defects within the six body systems with prior animal or human studies suggesting an association with heat: central nervous system (e.g., neural-tube defects, microcephaly), eye (e.g., microphthalmia, congenital cataracts), cardiovascular, craniofacial, renal, and musculoskeletal defects (e.g., abdominal wall defects, limb defects).
So they are looking at the relationship between temperature and no less than 28 independent variables.
Using the formula above, if we look at the case of N = 28 different variables, we will get a false positive about three times out of four (76%).
So it is absolutely unsurprising, and totally lacking in statistical significance, that in a comparison with 28 variables, someone would find that temperature is correlated with one of them at a p-value of 0.05. In fact, it is more likely than not that they would find one with a p-value equal to 0.05.
They thought they found something rare, something to beat skeptics over the head with, but it happens three times out of four. That’s what I found so funny.
Next, a simple reality check. The authors say:
Among 6,422 cases and 59,328 controls that shared at least 1 week of the critical period in summer, a 5-degree [F] increase in mean daily minimum UAT was significantly associated with congenital cataracts (aOR = 1.51; 95% CI: 1.14, 1.99).
A 5°F (2.75°C) increase in summer temperature is significantly associated with congenital cataracts? Really? Now, think about that for a minute.
This study was done in New York. There’s about a 20°F difference in summer temperature between New York and Phoenix. That’s four times the 5°F they claim causes cataracts in the study group. So by their claim that if you heat up your kids will be born blind, we should be seeing lots of congenital cataracts, not only in Phoenix, but in Florida and in Cairo and in tropical areas, deserts, and hot zones all around the world … not happening, as far as I can tell.
Like I said, reality check. Sadly, this is another case where the Venn diagram of the intersection of the climate science fraternity and the statistical fraternity gives us the empty set …
UPDATE: Statistician William Briggs weighs in on this train wreck of a paper.