Keep doing that and you'll go blind

Statistical failure of A Population-Based Case–Control Study of Extreme Summer Temperature and Birth

Guest Post by Willis Eschenbach

The story of how global warming causes congenital cataracts in newborns babies has been getting wide media attention. So I thought I’d take a look at the study itself. It’s called A Population-Based Case–Control Study of Extreme Summer Temperature and Birth Defects, and it is available from the usually-scientific National Institutes of Health here.

two-way radiation between lightsFigure 1. Dice with various numbers of sides. SOURCE 

I have to confess, I laughed out loud when I read the study. Here’s what I found so funny.

When doing statistics, one thing you have to be careful about is whether your result happened by pure random chance. Maybe you just got lucky. Or maybe that result you got happens by chance a lot.

Statisticians use the “p-value” to estimate how likely it is that the result occurred by random chance. A small p-value means it is unlikely that it occurred by chance. The p-value is the odds (as a percentage) that your result occurred by random chance. So a p-value less than say 0.05 means that there is less than 5% odds of that occurring by random chance.

This 5% level is commonly taken to be a level indicating what is called “statistical significance”. If the p-value is below 0.05, the result is deemed to be statistically significant. However, there’s nothing magical about 5%, some scientific fields more commonly use a stricter criteria of 1% for statistical significance. But in this study, the significance level was chosen as a p-value less than 0.05.

Another way of stating this same thing is that a p-value of 0.05 means that one time in twenty (1.0 / 0.05), the result you are looking for will occur by random chance. Once in twenty you’ll get what is called a “false positive”—the bell rings, but it is not actually significant, it occurred randomly.

Here’s the problem. If I have a one in twenty chance of a false positive when looking at one single association (say heat with cataracts), what are my odds of finding a false positive if I look at say five associations (heat with spina bifida, heat with hypoplasia, heat with cataracts, etc.)? Because obviously, the more cases I look at, the greater my chances are of hitting a false positive.

To calculate that, the formula that gives the odds of finding at least one false positive is

FP = 1 – (1 – p)N

where FP is the odds of finding a false positive, p is the p-value (in this case 0.05), and N is the number of trials. For my example of five trials, that gives us

FP = 1 – (1 – 0.05)5 = 0.22

So about one time in five (22%) you’ll find a false positive using a p-value of 0.05 and five trials.

How does this apply to the cataract study?

Well, to find the one correlation that was significant at the 0.05 level, they compared temperature to no less than 28 different variables. As they describe it (emphasis mine):

Outcome assessment. Using International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM; Centers for Disease Control and Prevention 2011a) diagnoses codes from the CMR records, birth defect cases were classified into the 45 birth defects categories that meet the reporting standards of the National Birth Defects Prevention Network (NBDPN 2010). Of these, we selected the 28 groups of major birth defects within the six body systems with prior animal or human studies suggesting an association with heat: central nervous system (e.g., neural-tube defects, microcephaly), eye (e.g., microphthalmia, congenital cataracts), cardiovascular, craniofacial, renal, and musculoskeletal defects (e.g., abdominal wall defects, limb defects).

So they are looking at the relationship between temperature and no less than 28 independent variables.

Using the formula above, if we look at the case of N = 28 different variables, we will get a false positive about three times out of four (76%).

So it is absolutely unsurprising, and totally lacking in statistical significance, that in a comparison with 28 variables, someone would find that temperature is correlated with one of them at a p-value of 0.05. In fact, it is more likely than not that they would find one with a p-value equal to 0.05.

They thought they found something rare, something to beat skeptics over the head with, but it happens three times out of four. That’s what I found so funny.

Next, a simple reality check. The authors say:

Among 6,422 cases and 59,328 controls that shared at least 1 week of the critical period in summer, a 5-degree [F] increase in mean daily minimum UAT was significantly associated with congenital cataracts (aOR = 1.51; 95% CI: 1.14, 1.99).

A 5°F (2.75°C) increase in summer temperature is significantly associated with congenital cataracts? Really? Now, think about that for a minute.

This study was done in New York. There’s about a 20°F difference in summer temperature between New York and Phoenix. That’s four times the 5°F they claim causes cataracts in the study group. So by their claim that if you heat up your kids will be born blind, we should be seeing lots of congenital cataracts, not only in Phoenix, but in Florida and in Cairo and in tropical areas, deserts, and hot zones all around the world … not happening, as far as I can tell.

Like I said, reality check. Sadly, this is another case where the Venn diagram of the intersection of the climate science fraternity and the statistical fraternity gives us the empty set …

w.

UPDATE: Statistician William Briggs weighs in on this train wreck of a paper.

Advertisements

  Subscribe  
newest oldest most voted
Notify of

Scientists find a way to distinguish the aerosol particle signal from the weather noise
“(Phys.org)—Scientists developed a modeling shortcut to dial in a clearer atmospheric particle signal. A research team from the Scripps Institute of Oceanography, the University of Washington, and Pacific Northwest National Laboratory fine-tuned the winds simulated in a global climate model to better represent the winds measured in the atmosphere. Their technique increased the signal’s clarity by greatly reducing the signal noise. Their work produced shorter, more efficient simulations of the global aerosol particle effects on clouds and a better reception of the atmospheric particle signal.”
http://phys.org/news/2012-12-scientists-distinguish-aerosol-particle-weather.html

No wonder the peer reviewers want to be anonymous.

Just Amazing…
Another case of “Climate Science” done by folks who took one Stats class, then forgot most of it.
I grew up in an area where summer temps typically were over 95 F to 100 F in all of July and August. Sometimes we’d say (accurately) “It’s 110 in the shade, and there aint no shade”.
We did not have air conditioning when I was a kid, nor did much of anyone else.
That said: I’ve never heard of “congenital cataracts”. Heard of a whole lot of other diseases in the area. Polio (a sisters friend walked with a gimpy gate from it). Even Malaria ( only one case every few years). Oh, and plague is endemic in the rodents. But no cataracts in kids. Sorry…
By their reasoning, most of Africa is blind…
(Oh, that place where I grew up? Northern California… Yeah, we have plague, malaria, and more… come on down! Lucky for us, not many cases. Lots of DDT used for a long time at the right times. We used to play in the fog of pesticide behind the “mosquito trucks”… )

D Böehm

“Keep doing that and you’ll go blind”
That’s what our priest warned us about.
•••
Maybe with a big enough grant, a computer model could determine the number of cataracts caused by the 0.7ºC global warming over the past century and a half. The number might be alarming!
More public funding required.

John Blake

Try applying this ridiculous statistical hogwash to short-term stock market transactions, and watch your portfolio evaporate to zero in about five trading-days.

jbutzi

Well, when you put it that way, it seems pretty obvious. Marvelous thinking BTW, but I can’t help wondering how something so obvious gets past so many people to allow this to be anywhere near to being written or published and especially when those people are are supposed experts, in positions of authority or at least educated. No wonder I am leery of the pronouncemenst of ‘experts’ in any field.

And the saddest part – paid for by two grants from the CDC. In other words we are borrowing our grandkids dollars for this tripe.
The 2nd saddest part is the disclaimer “The authors declare they have no actual or potential competing financial interests.” Other than coming up with anything creative to get grants the wave a danger flag of agw to keep those paychecks coming.

Gunga Din

We didn’t have AC either. Maybe that’s why I need glasses?
I did have tubercular meningitis as an infant. Maybe it was raining that day?
More research funds are needed.

Bill H

Roger Knights says:
December 19, 2012 at 8:49 pm
No wonder the peer reviewers want to be anonymous.
=====================================
Isn’t anonymous a HACKING organization ??

wayne

Hey, PEER REVIEWED science! And you have the audacity to even question it ??? lol
That’s not far from climate “science” is it? They must be speaking of anomaly temperatures. Wait, are these scientists climate trained?

Lark

One sees a lot of this in medical studies. “Power lines cause cancer!”
I note there was also a 76% chance of a “statistically significant” _reduction_ in birth defects in at least one category. Naturally they reported that too, right?

HaroldW
JQ

That means everyone in Queensland is legally blind…..
We want money…

michaeljmcfadden

I think they call it “data dredging.” Funny thing is that just earlier this evening I had almost the SAME argument used against me on some board: a claim that some study showed that childhood secondhand smoke exposure gave them cataracts as adults. Basically, if you perform enough studies looking at enough variables for enough conditions enough times… you can almost always count on finding at least SOMEthing out there to blame your favorite bete noir on.
– MJM

noaaprogrammer

I heard of study for women who were planning to become pregnat, warning them from taking long, hot, soaking baths, as there was a high correlation between doing that and various birth defects. Now I’m wondering how many urban myths are out there, having been created by defective studies.

AndyG55

Down here, getting “blind” has a different reason, and that reason is far more connected to CO2 than the climate reason !! hic !!!

AndyG55

D Böehm says:
“That’s what our priest warned us about.”
The blind priest ??

AndyG55

Lark says:
One sees a lot of this in medical studies. “Power lines cause cancer!”
Thank goodness they are not very edible or smokable, then !!

TomE

Sadly, a lot more research grants are going to determine “the effects of climate change” rather than to studying if climate change is anything but normal. Cut off the money supply and you cut off the BS.

Streetcred

This is why we Australians enjoy a good party … blinded by the heat !
But, “congenital cataracts” ? More likely the ‘researchers have contracted so other “congenital” disease and are struggling with the embarrassment of it. Don’t worry boys and girls, there is a cure !

D Böehm

Regarding the ‘power lines cause cancer’ scare, I recall reading a paper many years ago that pointed out that the power lines [in Sweden, IIRC] were along highways. Higher rates of cancer were attributed to the power lines. But later investigation determined that the exhaust emissions from thousands of cars and trucks every day was the cause of the local cancer spike.
Our bodies are as transparent to cell phone frequencies as a pane of glass [otherwise you would have trouble receiving a transmission if your body was between the phone and the tower]. Being transparent to RF frequencies means that RF energy is not felt by our bodies’ cells. The cell phone/cancer scare is as fake as the AGW scare.

AndyG55

“Cut off the money supply and you cut off the BS.”
Hey, not yet….. once I finish my current task, I was sort of hoping to get myself some of those funds.
I could have issues if they ask what my position is on climate, but I’m sure I can manipulate my way around theat, been watching and learning from the ‘climate scientists’ 😉
Just have to learn to lie and distort the thruth, is all, then I’ll fit right in. 🙂

AndyG55

Streetcred says..
“More likely the ‘researchers have contracted some other “congenital” disease…
once you get rid of all the con……………..

AndyG55

“That means everyone in Queensland is legally blind…..”
and in Darwin.. just “blind”
http://www.dailytelegraph.com.au/archive/national-old/northern-territory-drinks-most-alcohol-in-the-world/story-e6freuzr-1225735132584

Willis Eschenbach

HaroldW says:
December 19, 2012 at 9:23 pm

http://xkcd.com/882/

Dang, and he did it without calculations … nice.
w.

Congenital cataracts? It’s worse then we thought!
Sorry, couldn’t resist.

John F. Hultquist

We assigned meteorologic data based on maternal residence at birth . . .
While, they . . .
summarized universal apparent temperature (UAT; degrees Fahrenheit) across the critical period of embryogenesis . . .
and
particularly for exposures during weeks 4–7
This isn’t likely a big issue but folks do move — so residence at birth and residence at 4-7 weeks may not be the same.
Then there is the “maternal fever” thing. Is it possible that some women had a fever during weeks 4-7? Would they remember months later even if asked? Were they asked?
Again, maybe this isn’t too important but I didn’t see how they controlled for either of these things.

Mindert Eiting

The use of multiple significance tests is a well known problem in statistics. The use of many variables is problematic if they are correlated. Else use the binomial distribution. Its expected value Np is already telling. With p = 0.05 and N = 28 we expect 1.4 false positives among independent trials. I once tried to explain that to a researcher who used 60 variables in 60 significance tests and found 3 ‘significant’ results. I did not succeed and the same BS is still around. Problem seems to be very difficult.

John West

Forgive me Willis for not sharing in your amusement. Some people take these people seriously. Even worse, a lot of people take the people seriously that drool over the prospect of using something like this to advance the cause of climate alarmism.

2kevin

Pnce again poor science gets ‘Eschenbached.’

Joe Prins

I am sorry. I was born at night, when the temp was so much lower. Maybe that is why I am not quite blind, yet? If Anthony has to watch his time of observation, don’t these folks with all sorts of degrees and perhaps even cataracts?

Someone I know very well and has an important role interpreting published science in a medical area for public policy makers, tells me that 90% of the published papers in his field are worthless because of problems like this. He tells me, most of it comes from doctors, who think they are scientists.
I googled 2 of the authors and both are epidemiologists. A discipline one would expect to know something about statistics.

Scarface

I always thought people were homeotherm.
And that an unborn baby would be growing in a steady 37C environment.
How could a baby in the womb notice any change in temperature outside?
And how would it affect him?
And going from outside -20C in winter to +20C in your own house would be bad too?

Kev-in-Uk

I won’t bother to read the daft paper then! More seriously, this is exactly the kind of crap science we have come to expect. What bothers me is that this passes muster and is published.
Also, in relation to say, something that is non-linear and semi-chaotic in nature, I dunno, let’s say, something like ‘climate’ – it would be relatively easy to pick a variable and any number of the vast number of climatic ‘events’ and ‘interdependent variables’ and probably make a case for one ‘important’ variable being the sole cause of said events……Oh wait, now I get it, that’s why CO2 is repsonsible for everything!

Lance Wallace

To correct for multiple comparisons, some people use the Bonferroni correction, which divides your desired significance level (e.g. 0.05) by the number of comparisons. So 0.05/28 = 0.0018, which becomes the new threshold for finding any single comparison significant.

Eric H.

First off they used regressive analysis which can show a false correlation with one variable and they had several. Second, they cannot prove causation as this was not a controlled study. Maybe I don’t have the background to understand these statistical methods? The Rahmstorf and Foster paper also used a method similar to this one to get their “temps match IPCC” graph and I recently read a study on second hand smoke that was touted in the media as proof of the dangers and it was actually a meta-analysis which really blew my mind. I just finished my first stats course and from just playing around with some regressive analysis and sampling I was able to see the inconsistencies with these methods. If you don’t like your results, take another sample, remove outliers and try again, change the scale of your scatter plot. Lots of room to introduce bias…

Mike McMillan

Warmer temperatures will increase the growing range of golden rice, which prevents blindness in children, but not in politicians.

Patrick

When I was growing up I had access to AC, it was called a window in summer and a fire in winter. But this study is just rediculous. Of course this had no effect on internal body temperature.
Clearly they didn’t bother with Wikipedia; A congenital disorder, or congenital disease, is a condition existing at birth and often before birth, or that develops during the first month of life (neonatal disease), regardless of causation. Of these diseases, those characterized by structural deformities are termed “congenital anomalies” and involve defects in or damage to a developing fetus.
More tripe just in time for Christmas!

Alan the Brit

Great post, Willis. Readers may like to visit Prof John Brignall’s wonderful Numberwatch site, where he has done an excellent demonstration on the statistical analysis carried out to day, where large numbers of studies display a Relative Risk ratio of less than 2, when 20 years ago, & before the invoking of the Precautionary Principle, any paper with an RR ratio of less than 3, was cut into small rectangles, bundled together with a small hole pierced in the top left-hand corner, hung on a looped length of string, & placed in the smallest room in the laboratory!!!!! I find it sad that as “ALL CHEMICALS” cause cancer especially by PDREU/UESR standards, scientists pump poor old lab rats full of a substance in volumes beyond a realistic level to supposedly simulate prolonged exposure over time, & the rat develops a tumour, & little mention seems to be made of a likely toxic shock to the system the rat may have suffered from chemical overload in a very short time frame! The press release is usually done before peer review has taken place & the press/media have a field day with yet another “we’re all going to die if we don’t do as we’re told” scare story!

D.M.

I remember getting taught about the Bonferroni correction (and other options) for this multiple testing problems in my first-year undergrad general stats course about 20 years ago. It’s a very well known problem. Surprised (well, not really) the researches and none of the reviewers picked it up.
I hate to be picky, but 2 points on statistical terminology. In statistical significance testing, the p-value isn’t the probability it happened by random chance, but the probability of observing such a extreme result if the null hypothesis is true. If the null hypothesis is ‘random chance’ then it’s the same thing, but if I’m comparing two groups, 1 null hypothesis might be there’s no difference between them, but an alternative null hypothesis is that there is a difference of a specified amount. For the same data, I’d get a different p-value depending on which null hypothesis I’d identified a-priori as the one to test.
Secondly, a p-value is a probability, not an odds. An odds is the ratio of the probaility of success to the probability of failure, ie p/(1-p). While people often use them interchangeably, in statistics they’re 2 different things. Admittedly there’s not much difference if you’re talking about a probability of 0.05, but if the probability is 0.75, the odds is 3 (ie 0.75/0.25).
Here endeth the lesson. 😉

Farmer Charlie

.’The p-value is the odds (as a percentage) that your result occurred by random chance. So a p-value less than say 0.05 means that there is less than 5% odds of that occurring by random chance.’
That’s not right,is it? If the p-value were a percentage, it would be ‘5%’, not ‘0.05’. . It can be converted to a percentage, but it is expressed as a decimal fraction.

Kasuha

As usual, Mr. Eschenbach got things wrong.
There’s actually no problem with his article, all the math he presents is perfectly correct.
What’s wrong are his assumptions on methods used in the PubMed paper which he apparently didn’t read carefully enough.
First of all, while the paper is inspired by IPCC conclusions, it is not about ambient temperatures at all. It is about heat waves and it even gives detailed description about what is considered a heat wave. And that definition makes pretty good sense both in New York and in Phoenix.
Second, while Mr. Eschenbach is calculating how likely it is that such result may come out of totally random data, he might have also calculated it for the results they actually present. That means, what is the total statistical significance of one result out of 28 having p<0.05 at three separate criteria at once, one result having p<0.05 at two at once, and remaining 26 being not statistically significant in any of the three criteria. Because done correctly, that state coming out of random data is actually pretty unlikely.
That article does not deserve such kind of treatment from Mr. Eschenbach, it is an interesting study. It may not be relevant for global warming for all of the reasons which were discussed on these pages thousands of times already but it still may be relevant to what people may want to care about during pregnancy.

Willis Eschenbach

Kasuha says:
December 20, 2012 at 1:48 am

As usual, Mr. Eschenbach got things wrong.

Gosh, Kasuha, that’s a charming way to enter a discussion. You do realize that it reduces your chances of having people care about your opinion, don’t you?

… First of all, while the paper is inspired by IPCC conclusions, it is not about ambient temperatures at all. It is about heat waves and it even gives detailed description about what is considered a heat wave. And that definition makes pretty good sense both in New York and in Phoenix.

Oh, please. How is a “heat wave” different from “heat”? Are you saying that the heat is a problem only if it doesn’t last long? I don’t understand this objection at all.
For example, a 5° summer heat wave in New York might be 10° colder than an average summer day in Phoenix, and last for a much shorter time … how can a cooler shorter “heat wave” in NY cause congenital cataracts, but not longer lasting, hotter temperatures in Phoenix?

Second, while Mr. Eschenbach is calculating how likely it is that such result may come out of totally random data, he might have also calculated it for the results they actually present. That means, what is the total statistical significance of one result out of 28 having p<0.05 at three separate criteria at once, one result having p<0.05 at two at once, and remaining 26 being not statistically significant in any of the three criteria. Because done correctly, that state coming out of random data is actually pretty unlikely.

Since you haven’t had the courage to present your mathematics, I fear that I can’t respond to that.
The odds that I have given are for one or more occurrences of p equal to 0.05. So it includes the other “significant” result already.
In addition, you have fallen into another trap that I didn’t discuss in the head post. This is that they compared the various congenital problems to several different measures of temperature. The measures were the mean, minimum, and maximum temperatures.
What neither they nor you seem to have thought about, Kasuha, is that we would expect the mean, minimum, and maximum temperature to be highly correlated with each other. If something is wellcorrelated with one of them, it is likely to be correlated with the other two.
As a result, your claim that they are significant “at three criteria at once” is not meaningful. Because the three criteria are well correlated, that is no more significant than temperature being correlated with any one of them.
Finally, you list out the peculiarities of the dataset. You have one condition significantly correlated with three measures, one condition being significantly correlated with two measures, and 26 with no correlation. You say that the chances of “that state coming out of random data is actually pretty unlikely.” And you are right, the chances of any particular specified state are low.
But we’re not interested in just that state. We are interested in all possible states that have one or more results of p = 0.05. There are many, many more of them then just the particular one that occurred in this dataset.
So while you are correct that the odds of that particular state are small, the odds that we’ll have at least one result of p = 0.05 are quite large, as I calculated above
w.

Pierre-Normand

I concur with Kasuha’s comment above. This is a hypothesis-generating study and the authors stress the need to confirm the possible association in further studies. The authors seem to be well aware of the issue raised by Willis Eschenbach since they discussed it themselves: “Last, because we performed multiple tests to examine the relationships between 28 birth defects groups and various heat exposure indicators in this hypothesis-generating study, statistically significant findings may have been attributable to chance. Under the null hypothesis, we would expect 4 of the 84 effect estimates displayed in Table 3 to be statistically significant at the p = 0.05 level. […] However, the associations with congenital cataracts are biologically plausible, particularly given stronger associations during the relevant developmental window of lens development, and associations were consistent across exposure metrics, making chance a less likely explanation for these findings.”

Willis
You are right. There is no Bonferroni correction. The authors seem to think that performing multiple logistic regression will take care of their ‘confounders’, an additional indication the authors think anything that turns up in their univariate analysis must automatically be meaningful.
Cataract incidence is, yet, associated with three different temperature indices in their data. It would be worth applying the correction and then checking if these association still stand.
Where is the check for association between cataracts and urban heat islands? One may even swallow that heat islands don’t affect gridded anomalies in well-constructed climate datasets, but heat islands definitely affect individual people living them! Heat islands are definitive confounders, accepting the authors’ own findings.
Did the authors correct for Rubella immunization status? I don’t see it.
The incidence of cataracts may well be affected by confounders the authors list (for e.g., alcohol consumption). But importantly, cataracts occur in association with other congenital birth defects, as part of syndromes, a good proportion of which may be undiagnosed and therefore unreported. The authors don’t appear to correct for these.
The authors perform sub-analysis looking for association between congenital defects, but, only among pre-term defects. A reasonable guess is that the authors do this, simply because they are able to. Pre-term deliveries with congenital defects have such data recorded. But cataracts can be congenital and be associated with other defects, and yet not be diagnosed at birth.
The study design is possibly not the best for answering questions of the kind the authors raise. Their controls are people who were exposed to the same temperatures and gave birth to babies with no defects. How are they controls?! You’ve excluded the very effect you are trying to study from the controls. A better design would have been to randomly select individuals who gave birth during non-heat wave periods, irrespective of occurrence of birth defects.

Jay

These people are no more scientists than some people in the banking sector are wealth creators; where wealth = something of value and not just a symbol.

JazzyT

[MOD–Oh, bloody hell. Up late, on a netbook that I’m not used to, when the screen went dim–instead of trying to fix te resulting mess, it’s easiest to just put up this version instead.]
Actually, this is nothing like as bad as it looks at first blush. They got some hits, then looked closer, recognized the potential problem chance correlation, and said that their result looked interesting but would need confirmation.
In correlation tests of heat against 28 birth defects, they found statistical significance at the 5% level for thre:, congenital cataracts, renal agenesis/hypoplasia, and also a reduced occurrence of anophthalmia/microphthalmia. I’d expect the latter to be rare, and so have pretty poor sample size. But for cataracts, figure 2 in the paper shows the association as statistically significant during weeks 4,6, and 7. They state weeks 4-7 to be the time that the developing lens is most susceptible (as shown, e.g., by data from mothers with Rubella during pregnancy**.) That’s a bit less random than just getting a hit for statistical significance somewhere within weeks 3-8. Not startling, just noteworthy. They ran correlations against a few different criteria for hot weather,i.e., max, min, and mean temperature and got hits for cataracts with all of these. Since these ought to be fairly well correlated, it’s hard to tell how much that means.
I’ve heard of researchers taking a group of variables, running 100 cross-correlations, getting five hits at the 5% level, and presenting those, with a straight face, as though that were meaningful. These authors were aware of, and addressed, the problem of getting statistical significance by chance:

Last, because we performed multiple tests to examine the relationships between 28 birth defects groups and various heat exposure indicators in this hypothesis-generating study, statistically significant findings may have been attributable to chance. Under the null hypothesis, we would expect 4 of the 84 effect estimates displayed in Table 3 to be statistically significant at the p = 0.05 level. Thus, significant positive and negative associations with cataracts, renal agenesis, and anophthalmia may have been chance findings. Bonferroni adjustment to the p = 0.05 level of significance (0.05/84 = 0.0006) would yield approximate adjusted CIs for congenital cataracts that include the null value (95% CI: 0.93, 2.44). However, the associations with congenital cataracts are biologically plausible, particularly given stronger associations during the relevant developmental window of lens development, and associations were consistent across exposure metrics, making chance a less likely explanation for these findings.

And, they say explicitly that these preliminary results need confirmation, with their last sentence stating:

However, our findings for congenital cataracts must be confirmed in other study populations.

**From radiation studies with mice and rats, they can actually pick a day to irradiate the pregnant animal, and get different birth defects depending on which organs are most susceptible that day. Couldn’t find a good web ref, but it’s in here:
http://books.google.com/books/about/Primer_of_medical_radiobiology.html?id=XylrAAAAMAAJ

Willis Eschenbach

Oh, yeah. Kasuha, please note that I have treated the three temperature conditions (max, min, and mean) as one, because they are well correlated.
If we include each of them as a separate temperature condition, however, we no longer have 28 possibilities. We have three times that many, 84 possibilities … and with N = 84, we are very, very likely to get false positives.
w.

Oatley

As is typical, Willis provides excellent analysis. Still, I fully expect to read this story in the Sunday papers. Snip, snip…cut and paste. Big headlines.
Newspapers are in a death spiral and desperately need headlines that “break through”. The mindless readers (50% don’t get past the first paragraph) scan and digest. The information agrees with all the other flawed reporting, therefore they are comfortable with the story line, i.e., it is symmetrical with their level one thinking.

Willis Eschenbach

Pierre-Normand says:
December 20, 2012 at 2:46 am

I concur with Kasuha’s comment above. This is a hypothesis-generating study and the authors stress the need to confirm the possible association in further studies. The authors seem to be well aware of the issue raised by Willis Eschenbach since they discussed it themselves: “Last, because we performed multiple tests to examine the relationships between 28 birth defects groups and various heat exposure indicators in this hypothesis-generating study, statistically significant findings may have been attributable to chance. Under the null hypothesis, we would expect 4 of the 84 effect estimates displayed in Table 3 to be statistically significant at the p = 0.05 level. […] However, the associations with congenital cataracts are biologically plausible, particularly given stronger associations during the relevant developmental window of lens development, and associations were consistent across exposure metrics, making chance a less likely explanation for these findings.”

Thanks, Pierre. Yes, indeed they did comment on that … but unfortunately, they didn’t seem to understand what it meant. They still claimed that their results are significant.
But what their own calculation means is that their results are not statistically significant. I don’t care if the outcome is “biologically plausible”. Not significant is not significant, and it should not be written up.
At least on my planet, results that are not statistically significant are not worth of a paper. These results have been hyped around the planet by the media, which clearly thinks they are significant and is reporting them as settled fact.
Now, why in the world would the media think that the results are significant? Here is the claim from their abstract (my emphasis):

Results: Among 6,422 cases and 59,328 controls that shared at least 1 week of the critical period in summer, a 5-degree increase in mean daily minimum UAT was significantly associated with congenital cataracts (aOR = 1.51; 95% CI: 1.14, 1.99).

Perhaps you could comment on the ethics of writing up a paper to present results that the authors know for a fact are not statistically significant, Pierre, and despite that, claiming in the abstract that the results are significant.
Because in my world, that is agenda-driven deception, and it has no place in science.
All the best,
w.