Analysis of Met Office data back to mid 1800's

John Graham-Cumming has posted an interesting analysis, he could benefit from some reader input at his blog.

See here and below: http://www.jgc.org/blog/

Adjusting for coverage bias and smoothing the Met Office data

As I’ve worked through Uncertainty estimates in regional and global observed temperature changes: a new dataset from 1850 to reproduce the work done by the Met Office I’ve come up against something I don’t understand. I’ve written to the Met Office about it, but until I get a reply this blog post is to ask for opinions from any of my dear readers.

In section 6.1 Brohan et al. talk about the problem of coverage bias. If you read this blog post you’ll see that in the 1800s there weren’t many temperature stations operating and so only a small fraction of the Earth’s surface was being observed. There was a very big jump in the number of stations operating in the 1950s.

That means that when using data to estimate the global (or hemispheric) temperature anomaly you need to take into account some error based on how well a small number of stations act as a proxy for the actual temperature over the whole globe. I’m calling this the coverage bias.

To estimate that Brohan et al. use the NCEP/NCAR 40-Year Reanalysis Project data to get an estimate of the error for the groups of stations operating in any year. Using that data it’s possible on a year by year basis to calculate the mean error caused by limited coverage and its standard deviation (assuming a normal distribution).

I’ve now done the same analysis and I have two problems:

1. I get much wider error range for the 1800s than is seen in the paper.

2. I don’t understand why the mean error isn’t taken into account.

Note that in the rest of this entry I am using smoothed data as described by the Met Office here. I am applying the same 21 point filter to the data to smooth it. My data starts at 1860 because the first 10 years are being used to ‘prime’ the filter. I extend the data as described on that page.

First here’s the smooth trend line for the northern hemisphere temperature anomaly derived from the Met Office data as I have done in other blog posts and without taking into account the coverage bias.

And here’s the chart showing the number of stations reporting temperatures by year (again this is smoothed using the same process).

Just looking at that chart you can see that there were very few stations reporting temperature in the mid-1800s and so you’d expect a large error when trying to extrapolate to the entire northern hemisphere.

This chart shows the number of stations by year (as in the previous chart), it’s the green line, and then the mean error because of the coverage bias (red line). For example, in 1860 the coverage bias error is just under 0.4C (meaning that if you use the 1860 stations to get to the northern hemisphere anomaly you’ll be too hot by about 0.4C. You can see that as the number of stations increases and global coverage improves the error drops.

And more interesting still is the coverage bias error with error bars showing one standard deviation. As you might expect the error is much greater when there are fewer stations and settles down as the number increases. With lots of stations you get a mean error near 0 with very little variation: i.e. it’s a good sample.

Now, to put all this together I take the mean coverage bias error for each year and use it to adjust the values from the Met Office data. This causes a small downward change which emphasizes that warming appears to have started around 1900. The adjusted data is the green line.

Now if you plot just the adjusted data but put back in the error bars (and this time the error bars are 1.96 standard deviations since the published literature uses a 95% confidence) you get the following picture:

And now I’m worried because something’s wrong, or at least something’s different.

1. The published paper on HadCRUT3 doesn’t show error bars anything like this for the 1800s. In fact the picture (below) shows almost no difference in the error range (green area) when the coverage is very, very small.

2. The paper doesn’t talk about adjusting using the mean.

So I think there are two possibilities:

A. There’s an error in the paper and I’ve managed to find it. I consider this a remote possibility and I’d be astonished if I’m actually right and the peer reviewed paper is wrong.

B. There’s something wrong in my program in calculating the error range from the sub-sampling data.

If I am right and the paper is wrong there’s a scary conclusion… take a look at the error bars for 1860 and scan your eyes right to the present day. The current temperature is within the error range for 1860 making it difficult to say that we know that it’s hotter today than 150 years ago. The trend is clearly upwards but the limited coverage appears to say that we can’t be sure.

So, dear readers, is there someone else out there who can double check my work? Go do the sub-sampling yourself and see if you can reproduce the published data. Read the paper and tell me the error of my ways.

UPDATE It suddenly occurred to me that the adjustment that they are probably using isn’t the standard deviation but the standard error. I’ll need to rerun the numbers to see what the shape looks like, but it should reduce the error bounds a lot.

WUWT readers please go to http://www.jgc.org/ to discuss and see the latest updates.

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

132 Comments
Inline Feedbacks
View all comments
December 18, 2009 12:25 pm

Marc:
Sea Temperature Records Obtained From Bucket Collected Water Samples
Having made sea temperature records from the contents of a bucket that had been thrown over the side of a ship, I would support Marc’s opinion that such records are of dubious accuracy.
There are many things one wants to do in a force 9 storm at sea; make an accurate sea temperature measurement isn’t one of them.

yonason
December 18, 2009 12:36 pm

Theodore de Macedo Soares (11:34:30) :
“A sampling error due to bias such as correlated samples or non-sampling measurement type errors is another story and if not taken into account, as you suggest, can be much more problematic than chance type errors.”
It seems that when you compare their very small computed error with the wide natural variation that exists, a good deal of the problem comes from the 2nd type. And, from what we’ve learned from climategate, we can add deliberate bias for good measure.

December 18, 2009 12:38 pm

Anthony, if I were you (and to stop the author posting any more petulant comments) I’d just remove the entire thread and any links that go with it. Sometimes (I know from experience myself) you just can’t help some people – and they can’t see it! Amazing.

Dr A Burns
December 18, 2009 12:44 pm

The analysis claims an error of +/- 0.8 degrees, compared to the IPCC’s +/- 0.1 degree, back in 1850.
Here’s a description of what actual temperature measurement was like back in 1850, long before the days of arguing the effects of dirt and degradation on a Stevenson screen with automatic recording:
http://climate.umn.edu/doc/twin_cities/Ft%20snelling/1850sum.htm
Even modern recording accuracy is generally +/-0.5 degrees.
http://www.srh.noaa.gov/ohx/dad/coop/EQUIPMENT.pdf
Might I suggest that both these estimates are extremely optimistic ?

jknapp
December 18, 2009 12:52 pm

Looking at the mets graph it seems that increasing the number of stations from around 50 (in 1850) to over 1400 (in 1960) has negligible effect on their error bars.
That seems ridiculous at first glance. But if it’s true lets cut the number of stations back to 50 and make sure that they are really well sited.

jlc
December 18, 2009 12:55 pm

That JGC seems to be a surly and ungrateful SOB. We should allow him ti luxuriate in his well-deserved obscurity.

Bill Parsons
December 18, 2009 1:21 pm

Interesting e-mail thread between Jones and Wigley on infilling (backfilling?) data from the models.
http://www.eastangliaemails.com/emails.php?eid=279&filename=1035838207.txt

George E. Smith
December 18, 2009 1:31 pm

“”” jknapp (12:52:15) :
Looking at the mets graph it seems that increasing the number of stations from around 50 (in 1850) to over 1400 (in 1960) has negligible effect on their error bars. “””
Well your observation is correct; and it wouldn’t matter if ther were 14,000 stations, it’s stil not nearly enough.
From Leif’s posting, I get the impression that in fact it is not the normal practice to associate each of these reporting stations with a certain land area around them that is presumed to have the same temperature as the thermometer (at all times).
If that is the case, then there is no way to obtain a global average that has any meaning.
I would put Tgl = S[T.A]/S[A} where my S is sigma, and A is the area associated with each sensor, and S[A] is the total surface area of the earth.
If they are NOT doing that, then it fits in the GIGO folder.
And as Anthony’s Station study has discovered; for the US stations at least, a very large fraction are on airport runways. Well of course they are ther because the airport could give a rip about the climate; they want to know the weather; and specifically, the REAL TEMPERATURE on the runway, as that is of interest to a pilot trying to land or take-off on that runway. They want to know the temperature at the time they want to land not what it might have been 24 hours ago.

TJA
December 18, 2009 1:40 pm
Jordan
December 18, 2009 2:00 pm

Good points from George E. Smith.
Discrete measurement systems need to take into account signal bandwidth. Get it wrong, and results might not just be slightly wrong, they can be downright misleading.
Imagine the silhouette of a mountainous landscape. An aeroplane passes overhead with radar which takes merasures altitude at discrete distances. Imagine the radar takes three samples as it crosses the mountains, but (unlucky) each sample just happens to fall into three valleys. Join the points together, and we have an image of a flat landscape. Totally wrong.
The answer to “aliasing” is to reduce the distance between samples by 5 or 10 times in the above example, if this is what we have absolute confidence in our ability to spot the mountains and create a reasonable reconstruction of the landscape.
But how do we know this at the outset? We need to look at the properties of the signal and make it part of the measuring system design. This is an issue of careful design.
It is worth noting that the theoretical minimum sammpling rate is 2 times the signal bandwidth. But in practical systems, it really needs to be 5 to 10 times.
What about the spacing for sampling the putative global average temperature?
We have seen how Darwin may sit more than a thousand km away from neighbouring measuring sites, What about the spatial distribution of measurements near the poles? Or the Pacific Ocean?
I have no idea what is the spatial bandwidth of the climate system, and what distribution is necessary to avoid aliasing. I wonder what there is in the literature to give us comfort that this is not an issue with the temperature data. I haven’t seen anything.

Jordan
December 18, 2009 2:18 pm

Excuse the typos in my last post. But I’d like to add another comment in support of some of the earlier posts on this thread.
When I first looked at the above plots and saw the shaded areas reducing to near-zero, I just thought to myself “no way”.
There is a huge difference between having 1000 measurements, and 1000 properly sampled data points with statistically independent noise terms.
If you have the latter, you have the basic inputs to statistical analysis.
If you have the former, you might not have as much real information as you think. And if you are wrong, statistical analysis will give you a misleading measurement of properties such a variance and standard error. True variance will be greater.

steven mosher
December 18, 2009 2:22 pm

Jordan.
Start with this paper.
EOF Based Linear Estimation Problems
by
Kwang Yul Kim
Climate System Research ProgramTexas A&M University Abstract
http://citeseerx.ist.psu.edu/viewdoc/summary;jsessionid=0D24751906E7754397CA0BD09F5053B7?doi=10.1.1.53.4547
It addresses the problems that others have raised about sampling a climate field.

Greg
December 18, 2009 2:27 pm

John Graham-Cumming…
Also consider that having your post on a heavy duty site like WUWT is likely to result in a very nice traffic boost to your site and better discussion.

December 18, 2009 2:38 pm

Reading through all the comments here and on my blog, I still have two nagging concerns:
1. It isn’t intuitive that with so few stations in the 1800s that the error in Brohan et al. appears to stay the same across the years. Although there’s another diagram (Figure 12: top) which shows just the land anomaly trend and that does have wider earlier errors although they do not appear to be caused by the number of stations. But even that’s a bit hard to tell because I can’t see clearly how the limited coverage error is combined with the other errors. It would be good to be able to see the underlying data, but http://hadobs.org/ doesn’t appear to have the limited coverage error data.
2. I’m not sure how accurate the sub-sampling can be given that the samples are correlated (spatially).
Here’s hoping my email to the Met Office gets answered. It’s frustrating not to fully understand this paper.

bill
December 18, 2009 3:15 pm

George E. Smith (10:20:39) :
Here’s an interesting experiment for some of you computer nerds to try when you have an evening free;
have your computer (you write the code) go through the data, and pull out every 200th [20th] digital sample of that piece of music. So maybe the disc is recorded at 88 Khz sampling rate or something like that, so you are going to end up with about 4.4 Khz rate of selected samples.
Now have your computer play back those samples at the correct rate, about 4.4 khz, and run it into your hi-fi system, and see how you like the result.
Ever heard of MP3 format Mr. Smith?
or
PASC (Precision Adaptive Sub-band Coding)
or
ATRAC etc.
Compression for MP3 is 34MB to 3.4MB 10:1 even at this compression the quality is difficult to tell apart from the original.
CD 44.1khz 16 bits – to – MP3 44.1kHz – to – 44.1kHz 16 bits is not lossless but it’s damn good quality.

George E. Smith
December 18, 2009 3:17 pm

“”” Jordan (14:00:30) :
Good points from George E. Smith.
Discrete measurement systems need to take into account signal bandwidth. Get it wrong, and results might not just be slightly wrong, they can be downright misleading. “””
In this case Jordan, we have a two variable system, space and time. We can be sure that the time variable contains a very solid 24 hour cyclic signal. Now a min/max thermometer is going to give you two samples a day, and that only satisfies Nyquist, if the daily cyclic signal is purely sinusoidal; but it isn’t daily temperature graphs tend to show a fairly rapid warm up in the morning, and a slower cooldown in the evening, so there is at leas a second harmonic component or 12 hour signal component. So a min/max themrometer already violates Nyquist by a factor of 2 which is all you need to fold the spectrum all the way back to zero frequency; so the correct daily average temperature is not recoverable from a min/max thermometer. And that doesn’t allow for varying cloud cover that will introduce higher signal frequencies beyond even a 12 hour sample time.
So the lcimate data records are already aliassed before you even consider the spatial sampling. Here in the bay area, we get temperature cylces over distances of 10 km or less; yet climatologists believe they can use a temperature reading to represent places 1200 km away.
The problem is that most of them seem to be statisticians, and not signal processing experts, or even Physicists.
Well no central limit theorem is going to buy you a reprieve from a Nyquist criterion violation, and no amount of linear or non-linear regression analysis, is ever going to recover the true signal which has been permanently and irretrievably corrupted by in band aliassing noise.
Other than that slight inconvenience, there isn’t any good data even improperly sampled data before about 1980, when the Argo buoys, and the polar orbit Satellites were first deployed.
So HadCRUT and GISStemp belong on the ash heap of history; they are not even worth fixing.
UHIs cause no problems unless the sampling regimen is improper.
So to me it hardly matters if the playstation video games are any good; the data that goes into them is true garbage anyway. You would think that somebody like Gavin Schmidt, would have heard of sampled data system theory, and the Nyquist Sampling theorem; but he sure doesn;’t act as if he has.

December 18, 2009 3:28 pm

The Met Office has replied (see http://www.jgc.org/blog/2009/12/well-i-was-right-about-one-thing.html for part of their response).
I am still digesting what they had to say about the blog post that’s being discussed here and will post it once I am sure I understand what they are saying.

December 18, 2009 3:35 pm

Slightly off topic but John G-C’s blog on the “hide the decline code” is the most thoughtful and credible analysis I have seen. You can read it on his blog archive for November. Thanks John 🙂

DaveE
December 18, 2009 3:43 pm

John Graham-Cumming (14:38:26) :
To be honest, it is not only not intuitive for the error to remain the same but downright incorrect.
Consider where the stations were & how it has changed over time. E.M. Smith has done a lot of work on this.
To be honest, I don’t even see satellites being a big improvement as I can’t see how they can take more than two (2) measurements at any given location per day.
DaveE.

Jordan
December 18, 2009 3:45 pm

Thanks for the reference steven mosher.
I have had a quick look over the paper, but I’m not sure we’re on the same page.
The paper talks about estimation error for two distinct directions (1) find the best guage locations, or (2) work with the fixed guage locations we have. The paper examines optimal filtering from the starting point of (2).
The question (from George and myself) has more to do with (1): if we were starting with a clean sheet, what design criteria would we use to determine the spatial distribution of measuring stations for the putative gloabal T ? The sampling theorem would have a central role to play in determining the “gap” we could suffer between stations.
So the paper looks like an intersting analysis of optimal filtering, assuming we can live with T(r,t). But T(r,t) could lack meaning due to spatial aliasing.
Looking it another way – those who have questioned the MWP by talking about a “local phenomenon” are basically asserting that spatial aliasing distorts our view of the past. It’s a fair point, but works both ways.

peter_dtm
December 18, 2009 3:46 pm

NickB. (08:43:15) :
Medic1532 (08:26:21) :
Naval data – logged every watch change – ie 4 hours. Proper obs (weather observations) done by a percentage of the British merchant fleet every 6 hours – on the synoptic hour.
When I’ve asked the question before I have been pretty much brushed off with – ‘its amateur observers and therefore rubbish’ Which in itself is rubbish – yes some obs were flogged – but in my personal experience at least 90% were good to excellent. (Basically as the Radio Officer I refused to send rubbish; which occasionally caused some interesting relationships and there were other ‘sparks’ around with the same attitude. There were of course also those who just made it all up – perhaps they got jobs in East Anglia ?)
So I still want to know WHY THE HELL HAS NO ONE ABSTRACTED THE DATA ? Perahps they are too scared of finding some inconvenient truths

December 18, 2009 3:48 pm

Thanks, Colin. For those interested just in my climate change posts you can visit: http://jgc.org/blog/labels/climate%20change.html

DaveE
December 18, 2009 3:52 pm

bill (15:15:22) :
Ever heard of Variable bit-rate bill? That’s where the sample rate is changed by the Nyquist theorem according to the amount of relevant data in the samples
DaveE.

December 18, 2009 3:55 pm

Fred Lightfoot (09:47:37) :
Having a free time I invested in 100 thermometers, just wanted to see for myself, I placed all in my fenced property, plus minus 1 hectare, I placed 3 on 30 wooden polls 2 meters high, … garden in sunlight shows plus minus 1.03C average difference in shade 1.8 C difference from east to west, north to south shows 0.97C and 1.67C respectively, and my science is settled!

Assuming you calibrated the thermometers, melting distilled water, boiling distilled water corrected for baro pressure, water triple point cell, etc., to what do you attribute the variation ?
As these measurements were all taken after the industrial revolution, we might suspect increasing carbon dioxide, but siting problems should be taken into account. In the garden, are there snow plants or firethorns ? Sunflowers ?
UHI is unlikely, discounted even on larger scales by such authorities as Hadley CRU and GISS. Another consideration is instrument precision. Fahrenheit thermometers are more precise than Celsius, and for spacial distribution of your sites, acres are more precise than hectares.
You do make a good point about changes in “global” temperatures. Given the range of your data set (Lightfoot 09), it’s hard to get concerned about 0.6 °C over a century.

Charlie
December 18, 2009 3:58 pm

AlanG (11:09:06) :
AlanG (11:09:06) : …” Take it to the limit where there are 2 temperature sensors. One is rural where rural is 98% of the land area. The other is urban where urban is the remaining 2% of the land area.”
The problem is more basic. Take 2 sensors near the equator. Monitor temperature trends for a couple of decades. Then add a sensor at the north pole. The average temperature drops dramatically when the 3rd sensor is added to the average.
While not so dramatic as the above example, the real history of thermometer measurements is that the Arctic and Antarctic regions were undersampled until relatively recently.
That’s why anomalies are used. While not perfect, using changes from baselines does reduce the effect of changes in locations of thermometers.
The same problem happens when there are missing readings. Just doing simple averages of all available temperature measurements can lead to confusing, erroneous results.