John Graham-Cumming has posted an interesting analysis, he could benefit from some reader input at his blog.
See here and below: http://www.jgc.org/blog/
Adjusting for coverage bias and smoothing the Met Office data
As I’ve worked through Uncertainty estimates in regional and global observed temperature changes: a new dataset from 1850 to reproduce the work done by the Met Office I’ve come up against something I don’t understand. I’ve written to the Met Office about it, but until I get a reply this blog post is to ask for opinions from any of my dear readers.
In section 6.1 Brohan et al. talk about the problem of coverage bias. If you read this blog post you’ll see that in the 1800s there weren’t many temperature stations operating and so only a small fraction of the Earth’s surface was being observed. There was a very big jump in the number of stations operating in the 1950s.
That means that when using data to estimate the global (or hemispheric) temperature anomaly you need to take into account some error based on how well a small number of stations act as a proxy for the actual temperature over the whole globe. I’m calling this the coverage bias.
To estimate that Brohan et al. use the NCEP/NCAR 40-Year Reanalysis Project data to get an estimate of the error for the groups of stations operating in any year. Using that data it’s possible on a year by year basis to calculate the mean error caused by limited coverage and its standard deviation (assuming a normal distribution).
I’ve now done the same analysis and I have two problems:
1. I get much wider error range for the 1800s than is seen in the paper.
2. I don’t understand why the mean error isn’t taken into account.
Note that in the rest of this entry I am using smoothed data as described by the Met Office here. I am applying the same 21 point filter to the data to smooth it. My data starts at 1860 because the first 10 years are being used to ‘prime’ the filter. I extend the data as described on that page.
First here’s the smooth trend line for the northern hemisphere temperature anomaly derived from the Met Office data as I have done in other blog posts and without taking into account the coverage bias.
And here’s the chart showing the number of stations reporting temperatures by year (again this is smoothed using the same process).
Just looking at that chart you can see that there were very few stations reporting temperature in the mid-1800s and so you’d expect a large error when trying to extrapolate to the entire northern hemisphere.
This chart shows the number of stations by year (as in the previous chart), it’s the green line, and then the mean error because of the coverage bias (red line). For example, in 1860 the coverage bias error is just under 0.4C (meaning that if you use the 1860 stations to get to the northern hemisphere anomaly you’ll be too hot by about 0.4C. You can see that as the number of stations increases and global coverage improves the error drops.
And more interesting still is the coverage bias error with error bars showing one standard deviation. As you might expect the error is much greater when there are fewer stations and settles down as the number increases. With lots of stations you get a mean error near 0 with very little variation: i.e. it’s a good sample.
Now, to put all this together I take the mean coverage bias error for each year and use it to adjust the values from the Met Office data. This causes a small downward change which emphasizes that warming appears to have started around 1900. The adjusted data is the green line.
Now if you plot just the adjusted data but put back in the error bars (and this time the error bars are 1.96 standard deviations since the published literature uses a 95% confidence) you get the following picture:
And now I’m worried because something’s wrong, or at least something’s different.
1. The published paper on HadCRUT3 doesn’t show error bars anything like this for the 1800s. In fact the picture (below) shows almost no difference in the error range (green area) when the coverage is very, very small.
2. The paper doesn’t talk about adjusting using the mean.
So I think there are two possibilities:
A. There’s an error in the paper and I’ve managed to find it. I consider this a remote possibility and I’d be astonished if I’m actually right and the peer reviewed paper is wrong.
B. There’s something wrong in my program in calculating the error range from the sub-sampling data.
If I am right and the paper is wrong there’s a scary conclusion… take a look at the error bars for 1860 and scan your eyes right to the present day. The current temperature is within the error range for 1860 making it difficult to say that we know that it’s hotter today than 150 years ago. The trend is clearly upwards but the limited coverage appears to say that we can’t be sure.
So, dear readers, is there someone else out there who can double check my work? Go do the sub-sampling yourself and see if you can reproduce the published data. Read the paper and tell me the error of my ways.
UPDATE It suddenly occurred to me that the adjustment that they are probably using isn’t the standard deviation but the standard error. I’ll need to rerun the numbers to see what the shape looks like, but it should reduce the error bounds a lot.
WUWT readers please go to http://www.jgc.org/ to discuss and see the latest updates.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.






Cherry picking error reports?
This is a form of experimental bias.
Having used fortran and done research, the sampling methods they use lack randomness and do not represent fairly the temps of the planet.
a simple example. How did they ever gather data by means of equally spaced weather stations in vast areas without roads?
Your only plotting one component of historical error that should be represented in the HadCRUT3 plot.
The number of stations is not the correct parameter to use. Imagine that a small country [e.g. Denmark] suddenly gets the idea that more stations are good, so installs a million sensors throughout the smallish land area. Clearly that will do nothing for the error in the global average.
The proper parameter must include the distribution of those stations. Something like the mean area around a station where there are no other stations. This problem has been studied at length in the literature [I can’t remember a good reference, off hand, but they exist].
“So I think there are two possibilities:
A. There’s an error in the paper and I’ve managed to find it. I consider this a remote possibility and I’d be astonished if I’m actually right and the peer reviewed paper is wrong.
B. There’s something wrong in my program in calculating the error range from the sub-sampling data.”
Don’t give up on A. Confirmation bias is a powerful thing. If it turns out to be B that’s OK. But if it is A…… then Wow. It is worth checking.
Even with reduced error bounds, wouldn’t correcting for UHI effect still place today’s temp readings within the past error boundary?
great work here sir … way outside my experience vis a vi statistics … my background is in software design and development with a diploma in Ocean Engineering so I’m not completely out of my comfort zone.
Your work appears to be inside the programming, i.e. data in –> scary cloud of data massaging –> Hockey Stick …
My concern would be that the data in, which up until now everone assumed to be clean is anything but clean, accurate and consistant …
so while you may be able to show that the methods used inside the cloud are invalid its becoming all too likely that we are also experiencing a garbage in situation …
garbage in = garbage out assumes the code in between is exactly right, which you appear to have found out may not be the case …
garbage in –> garbage code –> garbaged (squared) out …
G2O the new greenhouse gas …
Well I’ve said it time and again, we simply cannot trust surface data – it’s far too wooley. All we can say (with any degree of certainty) is that the satellite data is about as good as we’ll get. That makes it very difficult to know if we’re warming or not, and if we could be anything to do with it anyway. But there it is. Recorded data from ‘way back when’ are just too unreliable. As John says, the error bars mean it could be as warm now as it was 150 years ago – who knows? Let’s stick to satellite http://discover.itsc.uah.edu/amsutemps/execute.csh?amsutemps+002
I haven’t read the report but I remember a History Channel program abot the Royal Navy loging temperature and barometric readings with every navigational sighting (at least twice a day). So here is my question does the data include those readings and if not why not? Considering the number of ships at sea at any given moment there should be thousands of readings per day all available in the Royal Navy archives. Second thought if they are used you should be able to extend the series further back in time with a near global coverage since the sun never set on the british empire just my 2 cents
Medic 1532
Medic1532 (08:26:21) :
That thought has been brought up before. It is my understanding that data exists but has not been studied or used anywhere. It would be a HUGE effort, but with computers is feasable
Why is there a coverage bias instead of just a plain old coverage error (which like many types of random measurement errors would be as likely to be positive as negative, giving a mean close to zero)? In other words, do we know that when there is only a small number of regional temperature measurements and they are put into an algorithm to get a regional average temperature, that the calculated average will always tend to be off in one direction — that is, it will be biased off the true value instead of just equally likely to be in error in either a plus or minus way from the true value? I’m sorry if I missed your explanation of this point in the original blog, but was not satisfied that I understood this aspect of what you did after reading it twice.
Dear Anthony-
I am sure you will crack this statistcal nut. When you do i think you’ll find that the UK and Northwest Europe have had material temp increases during the last 150 years. The UK was industrialized and heavily populated in 1860 so UHI affect is limited. The UK also had localized Human impact in 1860, the height of industrialization in the UK, with massive coal burning causing soot and sulphur dioxide created low hanging smog clouds. Those are cooling agents, and when coupled with the possibility that natural Atlantic Ocean currents bringing warmer Gulf Stream water farther north, the result is a material change to UK temps. Climate change. Is that AGW/CO2 caused ? NO. But it is local climate change.
URL says it all
http://www.bbc.co.uk/blogs/thereporters/richardblack/2009/12/cop15_saving_the_planet_or_saving_face.html
I’m thankful for the link to my blog, but was it necessary to rip off the entire content of my blog post? In doing so you’ve missed out on the latest updates I’ve made.
REPLY: Actually John, I’m trying to elevate your work, as I think it is relevant. I wrestled with a partial post but I just didn’t think it would do it for you.
If you wish, I’ll be happy to remove it. Please advise.
In the meantime, readers please go to http://www.jgc.org/ to discuss and see the latest updates. – Anthony
On first look, the trend looks quite a bit like those produced with “value added” data, which is not surprising.
What I noticed is the use of standard deviation as the standard error bands is incorrect. I’m thinking you should be calculating for the standard error of the mean using the population model (n) as opposed to a sample model (n-1) because you are calculating for the entire avalilable data set. Any sites that would be excluded (for whatever reason) don’t meet the inclusion criteria and therefore would not contribute to n (they do not represent available samples).
Also, I don’t think you should use a fixed standard deviation of 1.96 as the numerator when calculating for standard error across the entire timeline (it is not clear if that is what you are doing) as each point on the timeline will have a different variance. You should re-calculate StdDev at each point and obtain your standard error from the re-calculated StdDev.
Quite a bit OT, but I’ve read John Graham-Cumming’s book “The Geek Atlas: 128 Places Where Science and Technology Come Alive” and I quite enjoyed it. 🙂
You might want to run this past William Briggs. Briggs just completed a series of postings that I suspect may have bearing on what you are trying to figure out. His blog is http://wmbriggs.com/blog/
NK (08:52:59)
You bring up some good points on early industrial UHI. But while I agree that sulfur dioxide is a cooling gas, the latest we’ve been hearing on the scientific front is that soot (black carbon) is a net warming gas and has been underestimated in its forcing capabilities. And with the Industrial Revolution relying heavily on coal- and some petroleum-based fuels, this could have been a significant contributor to global temperatures. Whether the soot balanced out the SO2 or not, I have no idea. But it does complicate the implication of early measurements.
John, having your post copied here on WUWT is like having a free one-page advertisement in the New York Times….
If you have a set of long-term stations that have not been biased by urbanization or other changes, look at how well those stations represent the modern record when analyzed separately. That should give you some idea of the range of error you will see when looking at the historical record when those are the only stations. Not precisely what you are looking for, but something to find out how big the ball park is.
The distribution of those stations is critical, as 10 stations outside London don’t tell you nearly as much as one station in each of 10 countries. They might as well be one station.
Leif Svalgaard (08:18:19) :
“The number of stations is not the correct parameter to use. […] The proper parameter must include the distribution of those stations. Something like the mean area around a station where there are no other stations.”
Of course, the number of stations put a limit on the mean area around each station or any other similar parameter. You should be able to estimate a lower bound of the error based on only the number of stations (i.e. the error assuming they were optimally placed).
Oops, I forgot to include wood burning as a possibly significant soot source during the Industrial Revolution.
The net result of the soot/SO2 forcings is that readings of temps from 1860-1920 or so may have been artificially high due to pollution. If so, the immediate post-LIA temperatures would have been lower than the data suggests and steepening the slope of the 150-year rise. Since CO2 didn’t budge much for 1850-1900 or so, this would indicate that CO2 as the primary forcing agent isn’t true.
Leif Svalgaard (08:18:19) : Says:
“The number of stations is not the correct parameter to use. Imagine that a small country [e.g. Denmark] suddenly gets the idea that more stations are good, so installs a million sensors throughout the smallish land area. Clearly that will do nothing for the error in the global average.The proper parameter must include the distribution of those stations…”
The use of confidence intervals in regards to population proportions assumes that the sample is representative of the entire population. John Graham-Cummings use of sample size to arrive at a margin of error is correct. What you and others speak of refers to sampling error (i.e. that the sampling is not representative or is biased in other ways). Sampling error would only add additional uncertainty to the estimate derived by Graham-Cummings and in no way invalidate his base margin of error estimate.
As I understand it, the Navy records are of seminal interest because they are basically travelling weather-stations covering the whole globe. And, it is likely they may contain data that is inconvenient to the Warmists. I also gather that the data is now being ‘looked-after’ by the Tyndale Centre.
Hmmmmm. Now where is that institution based? Stand-by for them to be gently ‘rubbished’ as inaccurate, via PR releases. And, I guess you’ll have to wait a looong time before any proper studies are done on them. [/cynicism]
And OT but I reckon that surface-station weather records can only really be used in one specific way, as they are records of air temperatures measured at one geographic spot on the earth, at one time.
So for each site, pick a decade where things have not changed ( i.e. nothing moved, no instruments changed, or building work built-up nearby, or vegetation growth, etc.. you get the picture ) …and then differentiate the daily readings to get rid of standing offsets/biases. You should them have a measure of whether or not that physical spot on the earth got warmer or cooler over that decade.
Then do it for as many stations as you have got, and you should be able to do a world map showing decadal warming/cooling. And that is about as good a picture of past climate-change as you will ever get from these records.
Oh, and while you are at it, do the same with the rainfall, windspeed/direction, and barometric pressure. Now that WOULD be interesting.
Voluntary Observing Ships (VOS) Scheme which has been running since 1853 whereby participating vessels submitted weather four times daily 00, 06, 12 and 1800 gmt.
http://vos.noaa.gov/vos_scheme.shtml
@Hank
If you read my blog posting (not this copy of it) you’ll see that I’ve updated and I’m pretty sure that it’s the standard error they are using in the paper, not the standard deviation. They never mention the standard error, but it makes more sense.
As for the 1.96 that just the multiplier on whatever sigma is to get the 95% confidence interval.