Guest Post by Willis Eschenbach
There is a new paper in Nature magazine that claims that the tropics are expanding. This would be worrisome because it could push the dry zones further north and south, moving the Saharan aridity into Southern Europe. The paper is called “Recent Northern Hemisphere tropical expansion primarily driven by black carbon and tropospheric ozone”, by Robert Allen et al. (paywalled here , supplementary information here , hereinafter A2012). Their abstract says:
Observational analyses have shown the width of the tropical belt increasing in recent decades as the world has warmed. This expansion is important because it is associated with shifts in large-scale atmospheric circulation and major climate zones. Although recent studies have attributed tropical expansion in the Southern Hemisphere to ozone depletion the drivers of Northern Hemisphere expansion are not well known and the expansion has not so far been reproduced by climate models. Here we use a climate model with detailed aerosol physics to show that increases in heterogeneous warming agents—including black carbon aerosols and tropospheric ozone—are noticeably better than greenhouse gases at driving expansion, and can account for the observed summertime maximum in tropical expansion.
Setting aside the question of their use of a “climate model with detailed aerosol physics“, they use several metrics to measure the width of the tropics—the location of the jet stream (JET), the mean meridional circulation (MMC), the minimum precipitation (PMIN), the cloud cover minimum (CMIN), and the precipitation-evaporation (P-E) balance. Figure 1 shows their observations and model results for how much the tropics have expanded, in degrees of latitude per decade.
FIGURE 1. ORIGINAL CAPTION FROM A2012: Figure 2 | Observed and modelled 1979–1999 Northern Hemisphere tropical expansion based on five metrics. a, Annual mean poleward displacement of each metric, as well as the combined ALL metric. … CMIP3 models are grouped into nine that included time-varying black carbon and ozone (red); three that included time-varying ozone only (green); and six that included neither time-varying black carbon nor ozone (blue). Boxes show the mean response within each group (centre line) and its 2σ uncertainty. Observations are in black. In the case of one observational data set, trend uncertainty (whiskers) is estimated as the 95% confidence level according to a standard t-test.
I note in passing that the error bars of the observations are very wide. In fact, they barely establish the change as being different from zero, and in a couple cases are not statistically significant.
Now, several people have asked me recently how I can analyze a paper so quickly. There are some indications that set off alarms, or that tell me where to look. In this case, the wide error bars set off the alarms. I also didn’t like that instead of giving the claimed expansion per decade, they reported the total expansion over the 28 years of the study … that’s a second red flag, as it visually exaggerates their results. Finally, the following paragraph in A2012 told me where to look:
We quantify tropical width using a variety of metrics5,11: (1) the latitude of the tropospheric zonal wind maxima (JET); (2) the latitude where the Mean Meridional Circulation (MMC) at 500 hPa becomes zero on the poleward side of the subtropical maximum; (3) the latitude where precipitation minus evaporation (P-E) becomes zero on the poleward side of the subtropical minimum; (4) the latitude of the subtropical precipitation minimum (PMIN); and (5) the latitude of the subtropical cloud cover minimum over oceans (CMIN). To obtain an overall measure of tropical expansion, we also average the trends of all five metrics into a combined metric called ‘ALL’. Expansion figures quoted in the text will be based on ALL unless otherwise specified.
What told me where to look? Well, the sloppy citation. Note that they have not given citations for each of the 5 claims. Instead, they have put no less than seven citations at the head of the list of the five groups of observations and model results. That, to me, is a huge red flag. It means that there is no way to find out the source of each of the five individual observational results in A2012. So I went to look at the citations. They are as follows:
5. Zhou, Y. P., Xu, K.-M., Sud, Y. C. & Betts, A. K. Recent trends of the tropical hydrological cycle inferred from Global Precipitation Climatology Project and International Satellite Cloud Climatology Project data. J. Geophys. Res. 116, D09101 (2011).
6. Bender, F., Ramanathan, V. & Tselioudis, G. Changes in extratropical storm track cloudiness 1983–2008: observational support for a poleward shift. Clim. Dyn. http://dx.doi.org/10.1007/s00382-011-1065-6 (2011).
7. Son, S.-W., Tandon, L. M., Polvani, L. M. & Waugh, D. W. Ozone hole and Southern Hemisphere climate change. Geophys. Res. Lett. 36, L15705 (2009).
8. Polvani, L. M., Waugh, D. W., Correa, G. J. P. & Son, S.-W. Stratospheric ozone depletion: the main driver of twentieth-century atmospheric circulation changes in the Southern Hemisphere. J. Clim. 24, 795–812 (2011).
9. Son,S.-W. et al. Impact of stratospheric ozone on Southern Hemisphere circulation change: a multimodel assessment. J. Geophys. Res. 115, D00M07 (2010).
10. Kang, S. M., Polvani, L. M., Fyfe, J. C.& Sigmond, M. Impact of polar ozone depletion on subtropical precipitation. Science 332, 951–954 (2011).
11. Johanson, C. M. & Fu, Q. Hadley cell widening: model simulations versus observations. J. Clim. 22, 2713–2725 (2009).
For no particular reason other than that it was available and first in the list, I decided to look at the Zhou paper, “Recent trends of the tropical hydrological cycle inferred from Global Precipitation Climatology Project and International Satellite Cloud Climatology Project data”. Also, that was a citation that refers to the minimum precipitation (PMIN) for both hemispheres, as used in A2012. Figure 2 shows results from the Zhou paper:
Figure 2. ORIGINAL CAPTION FROM ZHOU: Figure 4. Time‐latitude cross sections of zonal mean seasonal precipitation and the corresponding linear trend with latitude. Solid orange lines mark the 2.4 mm d−1 precipitation threshold which is used as the boundaries of subtropical dry band. The boundary at the high and low latitude of the dry band is used as a proxy of the boundary of Hadley cell and ITCZ, respectively. Solid black lines indicate latitude with minimum precipitation. Dashed red lines mark the Hadley cell boundary determined by the 250 Wm−2 threshold using HIRS OLR data.
Now, the black line in these four frames show the minimum precipitation, so that must be where they got the PMIN data. So I went to look at what the Zhou paper says about the trend in the minimum precipitation PMIN. That’s shown in their Figure 5:
Figure 3. ORIGINAL CAPTION FROM ZHOU: Figure 5. Linear trends of the latitude of minimum precipitation, ITCZ, and Hadley cell boundaries inferred from GPCP for each season and the year marked on the horizontal axis for (a) the Northern Hemisphere and (b) the Southern Hemisphere. … Leftmost, middle, and rightmost bars in each group are for minimum precipitation, Hadley cell, and ITCZ boundary, respectively. For quantities significant at the 90% level, bars are shaded green, blue, and orange, respectively.
Now, let me stop here and discuss these results. I’m interested in the “Year” category for minimum precipitation (green), since that’s what they used in the A2012 paper. Note first that the minimum precipitation results that they are using are not even significant at the 90% level, which is very weak. But it’s worse than that. This paper shows one and only one result that is significant at the 90% level out of a total of six “YEAR” results.
This brings up a very important and routinely overlooked problem with this kind of analysis. While we know that one of these six “YEAR” results appears to be (weakly) significant at the 90% level, they’ve looked at six different categories to find this one result. What is often ignored is that the real question is not whether that one result is significant at the 90% level. The real question is, what are the odds of finding one 90% significant result purely by chance when you are looking at six different datasets?
The answer to this is calculated by taking the significance level to the sixth power, namely 0.96, which is 0.53 … and that means that the odds of finding a single result significant at the 90% level in six datasets are about fifty/fifty.
And that, in turn, means that their results are as meaningless as flipping a coin to determine whether the tropics are expanding on an annual basis. None of their results are significant.
It also means that the data from the Zhou paper which are being used in the A2012 paper are useless.
Finally, I couldn’t reproduce either the average value, or the error bars on that average, in the A2012 “ALL” data. Here are the “ALL” values from my Figure 1 (the A2012 Figure 2):
Item, Value, Error JET, 0.45, 1.09 P-E, 0.75, 0.29 MMC, 0.24, 0.08 PMIN, 0.17, 0.51 CMIN, 0.33, 0.06 ALL, 0.33, 0.12
When I average the five values, I get 0.39, compared to their 0.33 … and the problem is even greater with the error bars. The error of an average is the square root of the sum of the squares of the errors, divided by the number of data points N. This calculates out to an error of 0.25 … but they get 0.12.
Does this mean that the tropics are not expanding? Well, no. It tells us nothing at all about whether the tropics are expanding. But what it does mean is that their results are not at all solid. They are based at least in part on meaningless data, and they haven’t even done the arithmetic correctly. And for me, that’s enough to discard the paper entirely.
w.
PS: I suppose it is possible that they simply ignored the results from the Zhou paper and used the results from another of their citations for the minimum precipitation PMIN … but that just exemplifies the problems with their sloppy citations. In addition, it brings up the specter of data shopping, where you look at several papers and just use the one that finds significant results. And that in turn brings up the problem I discussed above, where you find one significant result in looking at several datasets.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
Willis Eschenbach says: “…Thanks, Rob, but you are so intent on finding some error in what I’ve done that you neglect to read what I’ve quoted. The authors say, as I quoted above (emphasis mine)”.
I agree with you that the authors did not make that part clear enough – meaning whether they are using the simple arithmetic mean or the weighted average, which is unfortunate. I agree that part. But it took me only couple of minutes to figure that out – there are only two options for the average in this case, so it is one or the other.
Willis says: “..You may claim all you want that that doesn’t make much sense, or that they did something and perhaps it doesn’t and they did … but it’s what they say that they did…”
When there are fewer data points in one group and larger observed data in another group, taking the average of the mean from each group will introduce a bias in the final result towards the set with fewer data points. That is why we use weighted averages, it accounts for the number of data points in each group, thus more representative of the whole experiment. That is what I was saying.
Willis says: “..“Ardent followers”? This is science, please, leave the side commentary aside, it damages your case.”
I do not have any problems in a neutral scientific discussion – but many of the comments I saw above are puzzling because they are supporting your views without even looking at your calculations and logic. It is a parenthetic comment – nothing major.
Willis said: “..If you think that Zhou’s input is largely irrelevant, then please explain why the authors concluded that they should include it. They must think it is not irrelevant. If you think that the “glaringly large error bars … are also immaterial”, you’ll have to explain that as well…”
They have larger error bars but fewer data points in some groups and smaller error bars but larger data points in other group (that is the reason why the net error bar is smaller) – thus more weight is going to groups with larger data points with smaller standard errors. That is why I think the above mentioned groups have only a smaller impact in the final result.
Rob G.:
I am following the discussion between you and Willis with interest.
You conclude your post to him at May 23, 2012 at 6:35 am saying;
OK. I understand that, but you do not provide any quantification of that “smaller impact”. Hence, I would be grateful if you were to state what significance you are claiming for that “smaller impact”.
I would appreciate your answer to my request because I look forward to reading a response from Willis to your arguments, and a rational debate of your point (that I quote) has to be about that significance.
Richard
Rob G. says:
May 23, 2012 at 6:35 am
Thanks again, Rob. I fear that the Zhou paper doesn’t give the number of data points that they have used to arrive at their error estimate. As a result, I see no way of doing the calculation that you propose. If you have data regarding your claim, please bring it forwards.
In addition, as I have pointed out above, the Zhou paper finds one result which is valid at the 90% level in six datasets. This means that it is only valid at the 50% level.
Now, you can use that result in further calculations. But if you wish to do so, you need to adjust the results so that they reflect the true p-value at the 50% level. The only way to do that is to widen the error bars.
The same is true for the result, which is not significant at the 90% level but in fact is not statistically different from zero at the 90% level. The error bars need to be increased to account for the fact that it was found only by examining six datasets.
Finally, I am generally suspicious of results which depend on an average of five different results, two of which are statistically not different from zero. How many studies did they have to go through to find those five different results? How many studies which found no change in the tropical area did they examine?
And finally, an unfortunate consequence of our current system of doing science is that negative results are rarely published. If my research shows that for example the tropics are indeed expanding, I can likely get that published. But if my equally valid research shows that there is no change in the tropical area, the odds of that getting published are miniscule. Despite the great importance of negative results in science, journals are not interested in publishing such findings.
As a result, what they have done is search through a bunch of papers from which negative results have already been excluded, chosen those few positive results which support their case (including two results which are statistically no different from zero), averaged them, and declared victory.
I find that singularly unconvincing.
I also note in passing that it is common in climate science that when model results disagree with observations, the model results are declared to be superior … except as in this case, when the models do not give the desired result.
My best to you,
w.
Rob G. says:
May 23, 2012 at 6:35 am
Actually, two people upthread (here and here) proposed entirely different explanations and methods for calculating the average … so there are obviously more than “two options for the average”, and there is no reason to assume that it is “one or the other”.
In addition, there is one other option, because failure is always an option—this is the option that they have simply made an arithmetical mistake. So your idea that it is “one or the other” doesn’t fit the facts, because we already have five different possibilities for the average.
w.
richardscourtney says: “OK. I understand that, but you do not provide any quantification of that “smaller impact”. Hence, I would be grateful if you were to state what significance you are claiming for that “smaller impact”.”
Richard, I would very much like to do this as well, but as I was telling Willis earlier, time is a big problem for me now – and I was going to depend on Willis to do the quantification part (he is certainly prolific at quantifying such things), but he already said he cannot find the number of data points in Zhou (and I have not read Zhou yet). So I will go through all the papers this weekend, and I will post what I can find on quantification. Please check back after the weekend. I am also very curious what exactly they did here, and your question is highly relevant.
Willis,
Now I am also very curious what they did with the averaging, so I will go through the papers and see whether I can come up with some useful data. I see that you have checked on the number of data points in Zhou, but I will read those references as well.
Willis said: “Now, you can use that result in further calculations. But if you wish to do so, you need to adjust the results so that they reflect the true p-value at the 50% level. The only way to do that is to widen the error bars.” “Finally, I am generally suspicious of results which depend on an average of five different results, two of which are statistically not different from zero. How many studies did they have to go through to find those five different results? ”
Even with wider error bars, its significance will be small if they are doing a weighted average. On the second part, the results are significant if the three remaining groups have large enough data points and have a uniform trend (without large error bars) – that seems to be the case here.
Willis said: “And finally, an unfortunate consequence of our current system of doing science is that negative results are rarely published. If my research shows that for example the tropics are indeed expanding, I can likely get that published. But if my equally valid research shows that there is no change in the tropical area, the odds of that getting published are miniscule. Despite the great importance of negative results in science, journals are not interested in publishing such findings.”
I do not know this, I would expect negative results can be published, if you have the quantitative indicators to to suggest that. Are you aware of any such data to show that tropics are more or less in an equilibrium or any negative results? That would be interesting.
Willis says: “I also note in passing that it is common in climate science that when model results disagree with observations, the model results are declared to be superior … except as in this case, when the models do not give the desired result.”
Observations are always more reliable than models, they have to be used as benchmarks to verify models, and models are useful only if it can capture the current trend/mechanisms and thus can be used for other boundary conditions or to predict future events.
On averages, the geometric average is not probably not very useful (that is useful when the range is different, I believe), the specific weighted average HaroldW has proposed is certainly an option – but not likely.
Over the weekend I will see what data I can find to give you more input. So we will see you soon.
ps. Willis, As I mentioned in my posts in the other thread, I have been so busy recently, otherwise I would really like to go through the details, but I will work on it over the weekend (I have about eight manuscripts to review for journals in the next several days, and as you very well know, if I do not do a good job, someone like you is going to criticize the reviewer, although they are not in climate science). So, for example, I was essentially saving time by not writing the details of your criticisms of the paper in the other thread (http://wattsupwiththat.com/2012/05/03/icy-arctic-variations-in-variability/ ), I can deal with logical stuff much faster – but I certainly was not trying to be a vampire (your comment: “Thus far, you have said nothing about my work. You have not raised a single objection to my claims in the head post. You have not criticized my math, my logic, or my data. Instead, you want to talk about consensus, logical fallacies, the theory of science, hypothetical questions, theoretical dilemmas, anything but the actual subject under consideration which you treat like a vampire treats garlic … and frankly, Scarlett, I don’t give a damn.”).
I hope we will have very productive discussions in the future, when I can add something useful here. I also hope some of the bloggers here, from both sides, are more friendly to each other – as most of the bloggers are. I do not have anything against skeptics, although I get a bit unhappy when scientists are portrayed as dishonest or stupid – most of them are concerned about their reputation and do not belong to that group, although there are many exceptions.
All the best.
Rob
Clicked link to paper…
1.)IF:
Competing financial interests
The authors declare no competing financial interests.
2.)THEN:
Tropical expansion primarily driven by black carbon and tropospheric ozone
The authors declare Recent Northern Hemisphere tropical expansion primarily driven by black carbon and tropospheric ozone.
3.)PROFIT
I downloaded the U-Wind (i.e. east-west) data from the 20th Century Reanalysis Project v2. I then interpolated the latitudes which separate the easterly Trade Winds from the Westerlies. I consider this the border of the “meteorological tropics” in the Horse Latitudes. In the Northern Hemisphere for the years 1979-1999 I found a poleward 0.18 +/- 0.31 decade rate. This looks compatible with the MMC figure above which is also based on circulation.
For the period 1911-2010 the decade rate is poleward 0.007 +/- 0.025 degrees.
For the period 1951-2010 the decade rate is *equatorward* 0.020 +/- 0.049 degrees.
It all looks insignificant.
Some plots and source code can be found here:
https://sites.google.com/site/climateadj/tropical-expansion