I’d like to highlight one oddity in the Shakun et al. paper, “Global warming preceded by increasing carbon dioxide concentrations during the last deglaciation” (Shakun2012), which I’ve discussed here and here. They say:
The data were projected onto a 5°x5° grid, linearly interpolated to 100-yr resolution and combined as area-weighted averages.
The oddity I want you to consider is the area-weighting of the temperature data from a mere 80 proxies.
Figure 1. Gridcells of latitude (North/South) and longitude (East/West)
What is area-weighting, and why is it not appropriate for this data?
“Area-weighting” means that you give more weight to some data than others, based on the area of the gridcell where the data was measured. Averaging by gridcell and then area-weighting attempts to solve two problems. The first problem is that we don’t want to overweight an area where there are lots of observations. If some places have 3 observations and others have 30 observations in the same area, that’s a problem if you simply average the data. You will overweight the places with lots of data.
I don’t like the usual solution, which is to use gridcells as shown in Figure 1, and then take a distance-weighted average from the center of the gridcell for each gridcell. This at least attenuates some of the problem of overweighting of neighboring proxies by averaging them together in gridcells … but like many a solution, it introduces a new problem.
The next step, area-averaging, attempts to solve the new problem introduced by gridcell averaging. The problem is that, as you can see from Figure 1, gridcells come in all different sizes. So if you have a value for each gridcell, you can’t just average the gridcell values together. That would over-weight the polar regions, and under-weight the equator.
So instead, after averaging the data into gridcells, the usual method is to do an “area-weighted average”. Each gridcell is weighted by its area, so a big gridcell gets more weight, and a small gridcell gets less weight. This makes perfect sense, and it works fine, if you have data in all of the gridcells. And therein lies the problem.
For the Shakun 2012 gridcell and area-averaging, they’ve divided the world into 36 gridcells from Pole to Pole and 72 gridcells around the Earth. That’s 36 times 72 equals 2592 gridcells … and there are only 80 proxies. This means that most of the proxies will be the only observation in their particular gridcell. In the event, the 80 proxies occupy 69 gridcells, or about 3% of the gridcells. No less than 58 of the gridcells contain only one proxy.
Let me give an example to show why this lack of data is important. To illustrate the issue, suppose for the moment that we had only three proxies, colored red, green, and blue in Figure 2.
Figure 2. Proxies in Greenland, off of Japan, and in the tropical waters near Papua New Guinea (PNG).
Now, suppose we want to average these three proxies. The Greenland proxy (green) is in a tiny gridcell. The PNG proxy (red) is in a very large gridcell. The Japan proxy (blue) has a gridcell size that is somewhere in between.
But should we give the Greenland proxy just a very tiny weight, and weight the PNG proxy heavily, because of the gridcell size? No way. There is no ex ante reason to weight any one of them.
Remember that area weighting is supposed to adjust for the area of the planet represented by that gridcell. But as this example shows, that’s meaningless when data is sparse, because each data point represents a huge area of the surface, much larger than a single gridcell. So area averaging is distorting the results, because with sparse data the gridcell size has nothing to do with the area represented by a given proxy.
And as a result, in Figure 2, we have no reason to think that any one of the three should be weighted more heavily than another.
All of that, to me, is just more evidence that gridcells are a goofy way to do spherical averaging.
In Section 5.2 of the Shakun2012 supplementary information, they authors say that areal weighting changes the shape of the claimed warming, but does not strongly affect the timing. However, they do not show the effect of areal weighting on their claim that the warming proceeds from south to north.
My experiments have shown me that the use of a procedure I call “cluster analysis averaging” gives better results than any gridcell based averaging system I’ve tried. For a sphere, you use the great-circle distance between the various datapoints to define the similarity of any two points. Then you just use simple averaging at each step in the cluster analysis. This avoids both the inside-the-gridcell averaging and the between-gridcell averaging … I suppose I should write that analysis up at some point, but so many projects, so little time …
One final point about the Shakun analysis. The two Greenland proxies show a warming over the transition of ~ 27°C and 33°C. The other 78 proxies show a median warming of about 4°C, with half of them in the range from 3° to 6° of warming. Figure 3 shows the distribution of the proxy results:
Figure 3. Histogram of the 80 Shakun2012 proxy warming since the most recent ice age. Note the two Greenland ice core temperature proxies on the right.
It is not clear why the range of the Greenland ice core proxies should be so far out of line with the others. It seems doubtful that if most of the world is warming by about 3°-6°C, that Greenland would warm by 30°C. If it were my study, I’d likely remove the two Greenland proxies as wild outliers.
Regardless of the reason that they are so different from the others, the authors areal-weighting scheme means that the Greenland proxies will be only lightly weighted, removing the problem … but to me that feels like fortuitously offsetting errors, not a real solution.
A good way to conceptualize the issue with gridcells is to imagine that the entire gridding system shown in Figs. 1 & 2 were rotated by 90°, putting the tiny gridcells at the equator. If the area-averaging is appropriate for a given dataset, this should not change the area-averaged result in any significant way.
But in Figure 2, you can see that if the gridcells all came together down by the red dot rather than up by the green dot, we’d get a wildly different answer. If that were the case, we’d weight the PNG proxy (red) very lightly, and the Greenland proxy (green) very heavily. And that would completely change the result.
And for the Shakun2012 study, with only 3% of the gridcells containing proxies, this is a huge problem. In their case, I say area-averaging is an improper procedure.
w.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
Willis, I am beginning to think “the more you try to debunk Shakun’s paper (and you are doing a great job of it on the one hand – but on the other – the more you write the more Shakun likes it”.
By now it is beginning to look (to me) like we have lost sight of the basic question as to what came first; – was it CO2 or was it global warming?
It looks like the Shakun et al. paper is only telling us what we already knew, i.e. that the Earth did recover from glaciation. –
Can we from now on expect to have “skeptics” who believe that not only is CO2 capable of causing warming, on a global scale of about 1°C but also that it was responsible for the “Full Blown Global Warming” (FBGW) – feed-backs and all – at least in 80 places on the planet?
Or what am I missing here?
@Steven Mosher says:
April 9, 2012 at 1:33 pm
…whether you use regular gridding, verroni tesselation, EOFs, or kridging the answer comes out the same….
Is that good or bad? 🙂 I think you missed that my comment related to sparse data. If you have some proof that 80 data points correlate well with current global temperature using any gridding scheme, that would be good for Shakun. However, I’m also skeptical that gridding works if you don’t pick the grid boundaries judiciously. But, you appear to believe that “improved” gridding procedures don’t improve anything. True?
There was an IgNobel awarded recently based on the idea that random promotions would improve a company, rather than performance evaluations: (http://www.guardian.co.uk/education/2010/nov/01/random-promotion-research). Do you also feel that gridding using randomly sized cells would be just as good as regular gridding? It would be hilarious, if true.
Oh and by the way CO2 released in melting “Antarctic Sea Ice” this year is only what was trapped there last year and should, as Willis say, make no difference
On a rotating planet I would have thought this analysis pretty stupid. Why not just do lines of latitude?
A very clear explanation of yet another problem in the way Shakun et al. calculated their global temperatures. I think writing up your idea of “cluster analysis averaging” would be very useful. Great-circle formulae for air navigation seemed very complex to me.
I appreciate why an equal-area map projection would continue to pose problems. But just for interest there is a great Java tool demonstrating different map projections from the Instituto de Matemática e Estatística da Universidade Federal Fluminense in Brazil here:
http://www.uff.br/mapprojections/mp_en.html
From the description of how to use it:
“To rotate the globe, press the left button mouse over its surface, keep the button pressed and, then, drag the mouse. To zoom in or to zoom out the globe, keep the key “s” pressed, click with the left button mouse over the globe and, then, drag the mouse.
“To mark a point on the Earth’s surface, keep the key “i” pressed, press the left button over the globe and, then, drag the mouse. The latitude and longitude of this point will be displayed in the tab “Position” on the right side of the applet. In this same tab, there is a tool that computes the distance between capitals. The corresponding geodesical arc is drawn on the globe’s surface. The applet also draws the loxodrome curve joining the two places. “
Dyspeptic Curmudgeon says:
April 9, 2012 at 3:21 pm
I read Willis post and was thinking the same thing regarding stereo nets, before reading through the comments and coming across yours – but anyways, my memory is probably worse than yours, and it’s far to late (12.30 am here) after a long weekend to try and engage brain!
Interesting you mention the old Fortran (77?)and HP’s too – I spent some time programing those blighters….still it was marginally easier than ZX80 or 650 m/code! LOL
vukcevic says:
April 9, 2012 at 12:12 pm
‘Nul points’ au soleil !
http://www.vukcevic.talktalk.net/img3.htm
Ha, Classic! Quick!, someone get the shovel.
I can see a 30°C rise in temperature at the poles as feasable. Why? Because it’s arid!
Why are you all so concentrated on temperature?
It’s not alone relevant!
Haven’t any of you heard of humidity & enthalpy?
I give up! you’re fighting on their terms and so will never win!
Leif Svalgaard says:
April 9, 2012 at 3:16 pm
Steve from Rockwood says:
April 9, 2012 at 2:37 pm
Up to 90% of the cells have NO DATA at all. Only a few cells have more than one point. So perhaps Mosher can repeat his averaging discussion with 1 sample in one cell and no data in the surrounding 50 cells.
Here is the distribution of proxies [from Steve]
Sorry: looked at the distribution. Didn’t look TOO bad, all things considered. But the weighting could screw it up, I suppose ….
What conclude, thee?
Sparks says:
April 9, 2012 at 4:46 pm
“‘Nul points’ au soleil !”
Ha, Classic! Quick!, someone get the shovel.
Days with no spots are expected in a low cycle even at solar maximum, e.g. compare cycle 14 and 24:
http://www.leif.org/research/SC14-and-24.png
oh no!
http://arctic-roos.org/observations/satellite-data/sea-ice/observation_images/ssmi1_ice_ext.png
*****
pochas says:
April 9, 2012 at 1:47 pm
Steven Mosher says:
April 9, 2012 at 1:34 pm
“Added C02 will warm the earth and the ocean will respond by outgassing more C02.’
Very good, Steven!
*****
Except, regarding the current situation, he only scored 50%…….
Chris Colose says:
April 9, 2012 at 12:35 pm
“Some” indicates more than two. Please provide citations to three previous papers arguing that the CO2 went up before the temperatures, as their paper title claims. Remember, the other papers have to argue that the global warming was preceded by increasing carbon dioxide concentrations during the last deglaciation. I await your citations.
w.
vukcevic says, April 9, 2012 at 12:12 pm
Montrez-moi de nouveau quelques jours plus tard que le 23 avril. C’est l’anniversaire de ma mère, le soleil scintille toujours alors 🙂
The data (including the data used in the Shakun paper) says that temperatures increased BEFORE CO2 in Antarctica AND in the southern hemisphere.
They are trying to argue that “global temperatures” lagged behind the CO2 numbers (because the “northern hemisphere” temperatures lagged WAY behind the CO2 numbers).
I’m not sure that is really true.
The northern hemisphere is more complex because there was a lot of ice that needed to melt first before temperatures could increase and the northern hemisphere just has more variability than the south.
I see lots of “lagging” of CO2 behind Greenland temperatures, for example, but the Dansgaard Oeschger events and the Older (14,500 years ago) and Younger Dryas (12,800 years ago)events makes it hard to tell.
Shakun 2012 southern hemisphere temperature stack, northern hemisphere and CO2. Southern hemisphere leading CO2 by 1,400 years.
http://img585.imageshack.us/img585/6038/shnhco2shakun2012.png
Extend Greenland and Antarctica out to 30,000 years rather than cutting off the data between 22,000 and 6,500 years ago and a different perspective emerges. Now the northern hemisphere variability is also leading CO2.
http://img580.imageshack.us/img580/1096/transitionliaco2.png
And go back through the whole last ice age. Northern hemisphere variability is pronounced and is not responding to CO2 at all.
http://img638.imageshack.us/img638/995/liaco2.png
Nick Stokes says:
April 9, 2012 at 2:50 pm
Thanks, Nick. I’d looked at that before. Actually, section (4) has nothing about monte carlo analysis. They discuss that in section 3. In section 4 they use subsampling to determine how well the proxy sites represent the globe … but they subsample gridcells, not individual temperature stations. They also sub-sample them at random … seems like you’d want to pick 80 individual temperature records near the proxy locations, not gridcell averages, to match against the 80 individual proxies.
In addition, as I commented elsewhere, they did not give enough detail in their monte carlo section (3) to determine if it is valid. A proper Monte Carlo analysis is quite hard to do, you have to be very careful with your assumptions. If all they did is add gaussian random noise, or autocorrelated random noise, I wouldn’t expect much difference … the problem is systematic error, not random error.
But then, I hardly expect more from a group that doesn’t even mention autocorrelation, and shows results with 1 sigma errors. Nick, you do realize that their figure 5a showing the changes in trends as you go northwards is 100% statistically insignificant? If you put in the proper 2 sigma error bars, not one of their findings is significant … and that says something very bad about either their knowledge of basic statistics, or their willingness to promote statistically meaningless results.
Not sure which one is worse …
w.
vukcevic says: April 9, 2012 at 3:05 pm
“Least flawed way to set the area weighting is triangulation.”
True. I think Willis is right here because I’ve been through this sequence myself. I did a calc of temperature based on 61 stations worldwide. It worked pretty well in terms of reproducing indices calculated with much larger samples. But I used 5×5 cells weighted as Willis described, and ran up against the same difficulty that the weighting really isn’t helping.
So I weighted by triangulation, again for about 60 stations. It made some difference (better), but not a lot.
Doug Proctor says:
Trying not to find fault with Shakun et al for the sport of it, let’s assume that distribution at the poles is not relevant to their paper. After all they compare the Antarctic proxies and Greenland proxies to global (all) proxies. So you should look at mid-latitude distribution.
But I have one problem with the paper I can’t get over. They acknowledge that the SH warmed first up to 2,000 years before the NH and that SH temperatures led CO2 increases. If this is the case, why do we need their complex theory that the earth wobbled, warmed the NH, melted the ice sheets, cut off circulation of the oceans, leading to SH ocean warming, leading to SH CO2 release, finally leading to SH warming. Why can’t we just have SH warmed, CO2 was then released, NH warmed much later and lagged the CO2 release of the SH?
1) One of the great weaknesses of the tree ring theory of thermometers is that they only reflect conditions during the growing season while proxies like O18 don’t seem to depend on seasons as much. How many of these proxies are organic in nature and – possibly – impacted by season? Shouldn’t these be less valuable than the ones not so tied to seasonal temps?
2) Couldn’t we get a good idea of globally descriptive these proxies are by comparing current instrumental readings from the same points, manipulating them the same way, and seeing how well they describe our current climate?
Why don’t they do it with hexagons?
Surely the best way to see if a gridded mean actualy has any realistic decription of the datasets is to examine the correlation of the equitorial datasets? The proxies at plus15 to minus 15 should all give pretty much the same results. Do they?
David A. Evans says:
April 9, 2012 at 4:53 pm
Oh, please. Not only have I heard of enthalpy, I’ve actually done the calculations on how much difference humidity makes. Have you?
The answer I got is, it doesn’t make all that much difference … if you got some other answer, please show us.
Also, the 30°C rise is not at the poles as you fatuously assume. It’s on the Greenland ice cap, which is not arid at all, but is rather moist compared to say Antarctica, which indeed is arid as you mention.
Finally, let me suggest that you cut down on the attitude, my friend. I may be wrong, but I and most of the commenters are not fools. Yes, we’ve heard of enthalpy. Like they say … it’s not the heat that matters, it’s the humility …
w.
I am with Mosher on this one. Pick eighty good trees in the right places and forget about the surface stations. How much money would that save us?
Steven Mosher: when you say veronni do you mean Voronoi? And does that imply each cell is assumed to have a constant temperature in it, each point on Earth has the temperature of the closest proxy? Whereas triangulating means computing the Delaunay and linearly interpolating in each triangle?
In any case, it certainly looks like the original authors addressed this class of objection already. Certainly more proxies would be better, and I’m sure we’ll see followup papers doing exactly that. From that to claiming it was a horrible mistake to publish is quite a stretch.
Leif Svalgaard says:
April 9, 2012 at 5:00 pm
“Days with no spots are expected in a low cycle even at solar maximum, e.g. compare cycle 14 and 24:”
http://www.leif.org/research/SC14-and-24.png
Leif,
Not everyone expected this low solar cycle that we’re witnessing.
We haven’t had such a low cycle compared to SC14 in our life time, it is a bit exciting, don’t you think?
How many of these low cycles do you think we can expect in the future?