John Graham-Cumming has posted an interesting analysis, he could benefit from some reader input at his blog.
See here and below: http://www.jgc.org/blog/
Adjusting for coverage bias and smoothing the Met Office data
As I’ve worked through Uncertainty estimates in regional and global observed temperature changes: a new dataset from 1850 to reproduce the work done by the Met Office I’ve come up against something I don’t understand. I’ve written to the Met Office about it, but until I get a reply this blog post is to ask for opinions from any of my dear readers.
In section 6.1 Brohan et al. talk about the problem of coverage bias. If you read this blog post you’ll see that in the 1800s there weren’t many temperature stations operating and so only a small fraction of the Earth’s surface was being observed. There was a very big jump in the number of stations operating in the 1950s.
That means that when using data to estimate the global (or hemispheric) temperature anomaly you need to take into account some error based on how well a small number of stations act as a proxy for the actual temperature over the whole globe. I’m calling this the coverage bias.
To estimate that Brohan et al. use the NCEP/NCAR 40-Year Reanalysis Project data to get an estimate of the error for the groups of stations operating in any year. Using that data it’s possible on a year by year basis to calculate the mean error caused by limited coverage and its standard deviation (assuming a normal distribution).
I’ve now done the same analysis and I have two problems:
1. I get much wider error range for the 1800s than is seen in the paper.
2. I don’t understand why the mean error isn’t taken into account.
Note that in the rest of this entry I am using smoothed data as described by the Met Office here. I am applying the same 21 point filter to the data to smooth it. My data starts at 1860 because the first 10 years are being used to ‘prime’ the filter. I extend the data as described on that page.
First here’s the smooth trend line for the northern hemisphere temperature anomaly derived from the Met Office data as I have done in other blog posts and without taking into account the coverage bias.
And here’s the chart showing the number of stations reporting temperatures by year (again this is smoothed using the same process).
Just looking at that chart you can see that there were very few stations reporting temperature in the mid-1800s and so you’d expect a large error when trying to extrapolate to the entire northern hemisphere.
This chart shows the number of stations by year (as in the previous chart), it’s the green line, and then the mean error because of the coverage bias (red line). For example, in 1860 the coverage bias error is just under 0.4C (meaning that if you use the 1860 stations to get to the northern hemisphere anomaly you’ll be too hot by about 0.4C. You can see that as the number of stations increases and global coverage improves the error drops.
And more interesting still is the coverage bias error with error bars showing one standard deviation. As you might expect the error is much greater when there are fewer stations and settles down as the number increases. With lots of stations you get a mean error near 0 with very little variation: i.e. it’s a good sample.
Now, to put all this together I take the mean coverage bias error for each year and use it to adjust the values from the Met Office data. This causes a small downward change which emphasizes that warming appears to have started around 1900. The adjusted data is the green line.
Now if you plot just the adjusted data but put back in the error bars (and this time the error bars are 1.96 standard deviations since the published literature uses a 95% confidence) you get the following picture:
And now I’m worried because something’s wrong, or at least something’s different.
1. The published paper on HadCRUT3 doesn’t show error bars anything like this for the 1800s. In fact the picture (below) shows almost no difference in the error range (green area) when the coverage is very, very small.
2. The paper doesn’t talk about adjusting using the mean.
So I think there are two possibilities:
A. There’s an error in the paper and I’ve managed to find it. I consider this a remote possibility and I’d be astonished if I’m actually right and the peer reviewed paper is wrong.
B. There’s something wrong in my program in calculating the error range from the sub-sampling data.
If I am right and the paper is wrong there’s a scary conclusion… take a look at the error bars for 1860 and scan your eyes right to the present day. The current temperature is within the error range for 1860 making it difficult to say that we know that it’s hotter today than 150 years ago. The trend is clearly upwards but the limited coverage appears to say that we can’t be sure.
So, dear readers, is there someone else out there who can double check my work? Go do the sub-sampling yourself and see if you can reproduce the published data. Read the paper and tell me the error of my ways.
UPDATE It suddenly occurred to me that the adjustment that they are probably using isn’t the standard deviation but the standard error. I’ll need to rerun the numbers to see what the shape looks like, but it should reduce the error bounds a lot.
WUWT readers please go to http://www.jgc.org/ to discuss and see the latest updates.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.






Gerlich and Tscheuschner (in their debunking of the CO2 greenhouse effect) argue that “there are no calculations to determine an average surface temperature of a planet” because there are too many localized random temperature variations to know which are and aren’t accounted.
I note the NASA GIStemp procedure includes “eliminination of outliers”. G&T would probably argue there are no outliers that should be eliminated, since they are all part of local variations over the globe.
“”” Jordan (00:49:01) :
George
There was no doubt some misunderstandings and dead ends in the above discussion, and the sine wave might be one of them. But I wouldn’t be too dismissive of Toho’s position.
Toho also makes this very good point (for a non-periodic signal):
” the theorem does not say the inverse, that you can’t have complete information of the signal without all the samples, or with a smaller number of sample points. All your arguments seem to be based on this inverse of the theorem, which isn’t generally true.”
Well Jordan, I’m not dismissive of Toho’s position; maybe I just don’t understand it; so I’m here to learn.
But he asserted that one sample suffices to completely define an unknown sinusoidal signal; I’d like to see how that is done.
The text books point out that the case of exact two sample per cycle of a single frequency signal is degenerate, and even though it fully complies with Nyquist, the signal can’t be recovered in that case. It’s not of practical importance since any lack of phase lock of the sampling allows for slueing through the complete waveform (which by definition is repetitive since it is a sine wave) so it has to be infinite in extent.
Now if Toho wants to add other information to the sample that is a different situation. The min/max strategy for the daily cycle at least establishes much of the information, and it would be complete for the simple case of the sinusoid, and yield the correct average; but that is not the real case.
My signal processing colleagues who do sampled data processing all day long assure me, that absent additional special information (which is only applicable in special cases) the sampling theorem is both necessary and sufficient. Yes you can construct special cases, that permit undersampled signals to be recovered because of other information available in those cases.
Weather and climate are basically chaotic; there is no way they are likely to conform to any special case that can eschew full Nyquist compliance.
And I reiterate, that I am unconcerned about the lack of ability to reconstruct the original continuous signal; but I am concerned when the Nyquist violation is serious enough to corrupt even the average with aliassing noise.
And when the time sampling strategy clearly excludes consideration of cloud variations; then nobody is going to convince me, that any GCMs which also don’t properly model clouds, can be made to track observational data that does likewise.
I’m not suggesting that the network of land based weather stations simply be abandoned. but it needs to be recognized that many of those stations exist for the benefit of pilots who have a real pressing need for up to the minute data on real runway weather conditions, principally temperature, atmospheric pressure, and humidity, as well as the obvious like wind speed and direction.
You haven’t lived on the edge (as a pilot) if you have never made that mistake, of landing a plane on a short runway in the wrong downwind direction. I did it precisely once and on a quite long runway for the plane I was flying. Believe you me I got religion before the plane rolled to a stop.
But when that network of “weather” stations is conscripted to try and observe the mean surface temperature of the entire earth, where over 70% of the surface has no long term observational stations; I get less than impressed with the methodology.
George
I see more agreement in our various discussions than disagreement. Particularly on the question of whether we have an adequate sample of the climate system to support the single line which is supposed to represent the global trend in temperature.
(I might add that the concept of a global temperature is about as meaningful to me as the average one-breasted, one testacled person mentioned in a previous comment here).
Toho makes a fair point that other information can reduce the demands we would otherwise have to make on data sampling. The generality of this point should not be understated.
His (her?) first example took ihe point to an extreme – perhaps unhelpfully. However … if we know the signal is a simple sinusoid, and we also know the amplitude (or phase), it would only take one sample to give us the last unknbown to fully define the signal for an indefinite period.
Theat’s an extreme example of Toho’s (fair) point that there is a “sufficiency” versus “necessary” angle to sampling.
In response to further comment, Toho then dealt with a situation where we have less knowledge. If we only know that the signal is a sinusoid, it would only take three samples to fully define the signal. (OK, we also need to know that the samples are all within one cycle – so we would need to have at least some idea of the frequency or phase).
This is a leap that deserves acknowledgement.
I mentioned my background in control engineering, where we frequently have the luxury of a framework of “a priori” knowledge. One of my first posts on this thread talks about sampling as a design problem – the issue being to have enough initial knowledge of the system/signal to design the sampling methodology. Well that’s the sort of approach that comes naturally to tackling a control problem.
I think Toho makes much the same point from a different direction – if we know something about what we are trying to sample, we can then make decisions about how to sample it. (Toho – I hope that’s fair to what you were saying.)
At the end of all of that a question: can I get comfortable that the sparse and erratic sampling of temperatures over the last 150 years gives us the information to support the analysis at the top of this thread. Frankly? No!
Well Jordan, you won’t get any argument form me, as to the benefits of a priori knowledge other than the samples. But all of the cases I am familiar with apply only to certain special situations. In the case of a perfectly general signal, it is not clear to me that there are any a priori snippets of information that can substitute for a proper set of samples.
But I am of a like mind, in that I think the whole concept of a “global mean temperature” is quite fallacious, even though one can define such a thing and in principle, can measure it, and I do mean in principle, since it is quite impractical in practice.
But after you have determined that, you still have exactly no knowledge of the direction of net energy flow into or out of planet earth, which is what will really determine the long term outcome; and the lack of any differential information (you only have the average) means you can’t even discuss the weather which depends on temperature differences (at the same time).
Temperature alone; without knowledge of the nature of the terrain, tells you nothing about energy flux, since the processes happening over the oceans, are quite different from those occurring over tropical deserts or arboreal forests, or snow covered landscapes, and are quite differently related to the local temperature.
I have no problem with GISStemp as a historical record of GISStemp, although it has many problems; but extending that to global significance doesn’t cut it with me.
George
The point about a priori information appeals to me because it has parallels in the procedure of “identification” in control system design. In reasonably well defined situations, we can understand the linkages between different parts of the controlled process to determine where measurements are required. We can often use knowledge of dynamic parameters to determine sample rate for discrete controllers, and therefore to design an observable and controllable solution. Where things are not so well defined in advance, we may need to set up some form of test to determine the required parameters empirically. I know that these are luxuries which are not generally available elsewhere – including the climate.
A situation where we have absolutely no a priori snippets of information, would lead me to question how we could even start to work out the how’s and where’s of sampling. Not least what information would we use to choose a sample frequency/disribution.
This unhappy situation does not appear to be too far from what I can see in the assessment of global average temperature. I do not think the underlying system is well enough understood to come to decisions about how best to sample it. Perhaps the greater spatial coverage of the satellite systems will help to resolve that in time.
But, IMO, the historic “instrumental” temperature record is quite another thing. Analysis of trend lines and error regions takes a remarkable degree of faith in underlying assumptions about: (i) the behaviour of the spatial field in different time scales; and (2) the signal we are getting from an inherited and changing measurement system originally set up for all sorts of other purposes.
I could take issue with those who suggest there are bigger fish to fry than concerns about sampling, spatial and perhaps even temporal aliasing. What do we have to give use the comfort that this data is more than just pile of badly sampled and misleading junk? The trends may have about as much meaning as tracing the path of a drunk man staggering around in the dark.
If talking about priorities, is there anything more important than such a fundamental question about the quality of the data?
There is no question that the historic data has some value. It is better that we have it than not. But are we allowing ourselves to be impressed by the sheer mass of data? Are we trying to convincing ourselves that the more of the data we use, the more meaning we can yield?
Dan makes a perfectly good suggestion in this thread: “don’t try to make it a global or hemispherical average, just track trends at those admittedly small sample of sites.”
But that kind of suggestion will mobilise an army of opinion, arguing that point measurements are not representive of the full spatial field. Arguments which are basically alluding to aliasing – admssion that the field does not behave in a way which can rely on sparse sampling.
So where does this get us? JGC acknowledges a problem in what he calls “coverage bias”. Would it be better for JGC to put the analysis of historical data onto the back burner until we have some pretty convincing analysis and methods which will allow us to extract meaningful information from this historical data.
Right now (I say it again) There is no convincing reason to pay any attention to those trend lines. I think Dan’s suggestion is just as convincing.
Oh dear … spoke too soon. Latest from the Met:
http://www.metoffice.gov.uk/corporate/pressoffice/2009/pr20091218b.html
“New analysis released today has shown the global temperature rise calculated by the Met Office’s HadCRUT record is at the lower end of likely warming. … This independent analysis … uses all available surface temperature measurements, together with data from sources such as satellites, radiosondes, ships and buoys. …. The new analysis estimates the warming to be higher than that shown from HadCRUT’s more limited direct observations. ”
If anybody wonders how this could be, here’s the MET’s latest excuse:
“This is because HadCRUT is sampling regions that have exhibited less change, on average, than the entire globe over this particular period.”
That’s right, you chooses your data and you gets your answer. And whaddayaknow – the latest analysis shows we woz right all along:
Further:
“This provides strong evidence that recent temperature change is at least as large as estimated by HadCRUT.”
NO IT DOESN’T. All it shows is that the data is not robust. Look at the data differently and you get a different result. (That’s almost a way to explain what we mean by statistical insignificance.)
Even the MET acknowledges this (although they probaly don’t realise it). Look at the legend under the graphical presentation which refers to sparseness of the sampling:
“The ECMWF analysis shows that in data-sparse regions such as Russia, Africa and Canada, warming over land is more extreme than in regions sampled by HadCRUT …. We therefore infer with high confidence that the HadCRUT record is at the lower end of likely warming.”
As there appears to be no formal study of the characteristics of the sampling problem (reported in the literature) there is no “identification” of the problem which would supprt a decision on sampling methodology. Without that, this kind of analysis cannot rise above junk status.
I think Gerlich and Tscheuschner talked about random variations in temperature in local regions across the globe due to cloud effects making it rather difficult to figure a global mean from a limited geographic sample. It would be interesting to compare data from home weather stations with the Hansen smoothing across geographic areas.