John Graham-Cumming has posted an interesting analysis, he could benefit from some reader input at his blog.
See here and below: http://www.jgc.org/blog/
Adjusting for coverage bias and smoothing the Met Office data
As I’ve worked through Uncertainty estimates in regional and global observed temperature changes: a new dataset from 1850 to reproduce the work done by the Met Office I’ve come up against something I don’t understand. I’ve written to the Met Office about it, but until I get a reply this blog post is to ask for opinions from any of my dear readers.
In section 6.1 Brohan et al. talk about the problem of coverage bias. If you read this blog post you’ll see that in the 1800s there weren’t many temperature stations operating and so only a small fraction of the Earth’s surface was being observed. There was a very big jump in the number of stations operating in the 1950s.
That means that when using data to estimate the global (or hemispheric) temperature anomaly you need to take into account some error based on how well a small number of stations act as a proxy for the actual temperature over the whole globe. I’m calling this the coverage bias.
To estimate that Brohan et al. use the NCEP/NCAR 40-Year Reanalysis Project data to get an estimate of the error for the groups of stations operating in any year. Using that data it’s possible on a year by year basis to calculate the mean error caused by limited coverage and its standard deviation (assuming a normal distribution).
I’ve now done the same analysis and I have two problems:
1. I get much wider error range for the 1800s than is seen in the paper.
2. I don’t understand why the mean error isn’t taken into account.
Note that in the rest of this entry I am using smoothed data as described by the Met Office here. I am applying the same 21 point filter to the data to smooth it. My data starts at 1860 because the first 10 years are being used to ‘prime’ the filter. I extend the data as described on that page.
First here’s the smooth trend line for the northern hemisphere temperature anomaly derived from the Met Office data as I have done in other blog posts and without taking into account the coverage bias.
And here’s the chart showing the number of stations reporting temperatures by year (again this is smoothed using the same process).
Just looking at that chart you can see that there were very few stations reporting temperature in the mid-1800s and so you’d expect a large error when trying to extrapolate to the entire northern hemisphere.
This chart shows the number of stations by year (as in the previous chart), it’s the green line, and then the mean error because of the coverage bias (red line). For example, in 1860 the coverage bias error is just under 0.4C (meaning that if you use the 1860 stations to get to the northern hemisphere anomaly you’ll be too hot by about 0.4C. You can see that as the number of stations increases and global coverage improves the error drops.
And more interesting still is the coverage bias error with error bars showing one standard deviation. As you might expect the error is much greater when there are fewer stations and settles down as the number increases. With lots of stations you get a mean error near 0 with very little variation: i.e. it’s a good sample.
Now, to put all this together I take the mean coverage bias error for each year and use it to adjust the values from the Met Office data. This causes a small downward change which emphasizes that warming appears to have started around 1900. The adjusted data is the green line.
Now if you plot just the adjusted data but put back in the error bars (and this time the error bars are 1.96 standard deviations since the published literature uses a 95% confidence) you get the following picture:
And now I’m worried because something’s wrong, or at least something’s different.
1. The published paper on HadCRUT3 doesn’t show error bars anything like this for the 1800s. In fact the picture (below) shows almost no difference in the error range (green area) when the coverage is very, very small.
2. The paper doesn’t talk about adjusting using the mean.
So I think there are two possibilities:
A. There’s an error in the paper and I’ve managed to find it. I consider this a remote possibility and I’d be astonished if I’m actually right and the peer reviewed paper is wrong.
B. There’s something wrong in my program in calculating the error range from the sub-sampling data.
If I am right and the paper is wrong there’s a scary conclusion… take a look at the error bars for 1860 and scan your eyes right to the present day. The current temperature is within the error range for 1860 making it difficult to say that we know that it’s hotter today than 150 years ago. The trend is clearly upwards but the limited coverage appears to say that we can’t be sure.
So, dear readers, is there someone else out there who can double check my work? Go do the sub-sampling yourself and see if you can reproduce the published data. Read the paper and tell me the error of my ways.
UPDATE It suddenly occurred to me that the adjustment that they are probably using isn’t the standard deviation but the standard error. I’ll need to rerun the numbers to see what the shape looks like, but it should reduce the error bounds a lot.
WUWT readers please go to http://www.jgc.org/ to discuss and see the latest updates.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.






The word “Denier” is thrown around a lot in an ad hominem attack calling the AGW skeptics “Deniers”, so you think of yourselves as Holocaust Deniers”.
The word “Holocaust” comes from an ancient Greek word that when translated in English is “Burnt Offering”. The Jewish people don’t have an exclusive claim to “Holocausts”. The Chinese Burnt Offering Holocaust killed 100 million people. The Russian Burnt Offering Holocaust killed 50 million people. The Jewish Burnt Offering Holocaust killed 5 million people. I do not “Deny” that any of these “Holocausts” happened. Whenever I am accused of being an “AGW Denier” I respond by saying, I am not a Jewish Burnt Offering Holocaust Denier, and whenever I mention the word Holocaust, I always pre-qualify the word Holocaust using it’s English translation with the words “Burnt Offering”.
To Jerrym (09:21:59) :
Agreed, the affects of UK industrial age deforestration, low clouds, SO2, soot etc. (although soot is primarily linked to arctic climate) are probably too complex to ever untangle. BUT, clouds and ocean current drive planetary climate and local climates. As a maritime climate, the UK is particularly affected by the North Atlantic ocean currents. My main point is, if a statistically valid review of Met temp. records shows a material increase, it is a function of currents first, local cloud formation patterns (possible Human impact) second, AGW/CO2, not do much.
… not SO much.
Cheers
Well I tried to leave a post over there at John’s site. It wouldn’t take the long one and it wouldn’t taske the short one; I only have so much patience.
The problem with the data set is non-recoverable so why bother. It is nothing more than a violation of the Nyquist Sampling Theorem. You can’t use statistics to recover information from corrupt data. The data is likely a reasonably good record of that set of thermometers. In no way does, or can that set of thermometers be used to represent the temperature of the earth, or even the northern hemisphere.
But don’t worry, neither does GISStemp or HadCRUT
Having a free time I invested in 100 thermometers, just wanted to see for myself, I placed all in my fenced property, plus minus 1 hectare, I placed 3 on 30 wooden polls 2 meters high, evenly spaced starting from 50cm 1.00 and 1.50 I put 15 in the shade and 15 in direct light, ( 10 thermos. kept as spares ) after reading 2 times in daylight and 2 times at night ( same time ) at the end of 8 weeks 3 temp. on pole always showed different temp, the highest always cooler, the temp. in my garden in sunlight shows plus minus 1.03C average difference in shade 1.8 C difference from east to west, north to south shows 0.97C and 1.67C respectively, and my science is settled!
@John Gentzel
I’ve sailed across the Atlantic twice on the beautiful clipper Stad Amsterdam, and I’ve taken quite a few water- and air temperature and other measurements and uploaded these via satellite to the Dutch KNMI (Royal Dutch Meteorological Institute), but I must confess, from a scientific point of view, these were possibly quite inaccurate and subjective.
To take the water temperature you dunk an insulated bucket overboard, haul it up, and stick a thermometer in it. Leave it for a few minutes (how long? well, that varies on how long it takes to smoke your fag). Next you read the temp, and note it. This wasn’t a digital one, but a mercury thermometer, so you could be off by half a degree depending on how good your eyesight is.
Air temp was taken by swinging a wet bulb thermometer around for, well, how long you felt like, and reading it again depends on your eyesight.
Cloud cover (percentage, type of cloud, height of cloud, etc.) was again very subjective (stand on deck and look around), the same with ocean swell (period, length of wave, wave height, etc.).
So, if these types of measurements are done like this on the oceans worldwide, there is no telling how they were done, by whom, etc., so I personally would most certainly not consider these measurements to be anywhere near accurate.
Whether this analysis is correct or not it provides this EE with a better understanding of what the temperature plots (or their creators) are trying to say.
Also, While conceding Leif’s distribution argument, it is not at all obvious to me that quantity is irrelevant. As an example it would seem that the temperature of Antarctic would be better represented with 3 (reasonably distributed) measurements than with 2 or 1.
What curve do you get if you just use the stations that have been reporting the full time period? If such stations exist, and if they don’t suffer from urban heat island effects, it would be an interesting check. In other words, don’t try to make it a global or hemispherical average, just track trends at those admittedly small sample of sites.
“”” Leif Svalgaard (08:18:19) :
The number of stations is not the correct parameter to use. Imagine that a small country [e.g. Denmark] suddenly gets the idea that more stations are good, so installs a million sensors throughout the smallish land area. Clearly that will do nothing for the error in the global average.
The proper parameter must include the distribution of those stations. Something like the mean area around a station where there are no other stations. This problem has been studied at length in the literature [I can’t remember a good reference, off hand, but they exist]. “””
Well one good place to start is;- “Digital and Sampled Data Control Systems” by Julius Tou (Purdue University).
Or you can just google “nyquist sampling theorem”
The problem is NOT a problem of statistics, and it cannot be solved by statistical methods; it is a problem of sampling theory; and there is no known solution (post facto).
GISStemp and HadCRUT are accounts of GISStemp and HadCRUT respectively, which are data from a small number of thermometers. They do not represent the Earth’s surface or lower atmosphere temperature.
Isn’t any analysis done using the “value added” data (riiiiight!) suspect? What we really need are the raw data and even then I doubt the coverage and accuracy are good enough that one could extract a meaningful “climate temperature signal” from it.
Theodore de Macedo Soares (09:29:02) :
John Graham-Cummings use of sample size to arrive at a margin of error is correct.
I don’t think so, as values inside the sample are correlated and adding more stations does not decrease the error.
Here’s an interesting experiment for some of you computer nerds to try when you have an evening free; maybe I’ll get my musican savvy son to do this.
Take your favorite cd recording of Beethoven’s Fifth Symphony; or even that shrieker Celine Dion, or Madonna.
play the thing through and store it on your hard drive. these days Terrabyte drives cost next to nothing.
Then have your computer (you write the code) go through the data, and pull out every 200th digital sample of that piece of music. So maybe the disc is recorded at 88 Khz sampling rate or something like that, so you are going to end up with about 4.4 Khz rate of selected samples.
Now have your computer play back those samples at the correct rate, about 4.4 khz, and run it into your hi-fi system, and see how you like the result.
Now if you are also a climatologist or statistician; get to work with your statistical maths and try to fix that sub-sampled music piece.
Good luck !
Anthony: Your attribution is in good faith (not “ripped off”), but, as others have noted here before, it is sometimes difficult to tell where guest posts begin and end. It would be nice to see Mr. Graham-Cumming’s work clearly set off from the introduction. Blockquotes are a familiar convention in paper text, but with column constraints on the internet, maybe that uses up too much vertical space. And nobody (either you or readers) would want to deal with quotes within quotes. Italics raise other issues. Perhaps you’ve thought about (and discarded?) the idea of different text colors for guest posts.
Maybe you should take every 20th sample instead of every 200th. Well I don’t think it makes much difference.
“Let’s see, if we raise the trunk a bit, and shrink the elephant.”
http://lifesizestatue.com/fiberglass/images/elephant_up_big2303.jpg
It looks a lot like CRU data. (Or, maybe an anteater?)
And, speaking of their data, and in particular their analysis of it…
http://hotair.com/archives/2009/12/16/video-east-anglia-crus-below-standard-computer-modeling/
“oops!”
Where do I find the source of those annual means graphs that are found at surfacestation.org?
Dr. Leif Svalgaard is right here. Take it to the limit where there are 2 temperature sensors. One is rural where rural is 98% of the land area. The other is urban where urban is the remaining 2% of the land area. The rural sensor shows no rise in day or night (max and min) temperature. The urban sensor shows no rise in the daytime but a 2 degC rise in the nighttime temperatures.
If you average min and max you get 0 deg change for rural and 1 deg for urban. The average of both is then 0.5 deg which is true for urban but not true for 98% of the (rural) land area. The temperatures need to be weighted by the area they cover.
no good deed goes unpunished
Leif Svalgaard (10:10:20) :
I don’t think so, as values inside the sample are correlated and adding more stations does not decrease the error.
Yes. I agree that if the samples are correlated, adding more such samples would not decrease the error.
Sampling error can occur by chance or by bias in the collection of the samples. A confidence interval (CI) for a population proportion accommodates the first type of sampling error but not the second type. My point was that there is nothing wrong with deriving a CI solely on the basis of sampling error due to a small number of samples increasing the possibility of error due to chance. A resulting wide CI can, by itself, undermine possible conclusions.
A sampling error due to bias such as correlated samples or non-sampling measurement type errors is another story and if not taken into account, as you suggest, can be much more problematic than chance type errors.
I wouldn’t have minded (so much) you taking the post if you’d simply asked first.
Quote: “Having a free time I invested in 100 thermometers, just wanted to see for myself, I placed all in my fenced property, plus minus 1 hectare…”
Fred Lightfoot, am just curious, were you able to purchase mercury thermometers where you live? I became frustrated several years ago in my engineering job at the university, when I found that I could not replace broken mercury thermometers with an identical model, as the mercury filled models had been banned by the EH&S Dept at the university. I was required to purchase the red dye alcohol filled types – I found they were not worth a damn for accuracy, and suffered separation problems. (For separated fluid, T meniscus > T actual. There’s a warming source, i.e. corrupted data.)
I’ve never performed outdoor measurements to the extent that you did, but can tell you that a laboratory oven set to +37 deg C having an internal height of about 60cm will have a temperature gradient of about 8 deg C over the 60cm height, in the absence of forced convection. I was rather surprised the first time i discovered that, and ended up retrofitting all of our natural convection lab ovens to forced convection. Even with forced convection, the gradient was 2 to 3 deg C under the same conditions.
Marc:
“So, if these types of measurements are done like this on the oceans worldwide, there is no telling how they were done, by whom, etc., so I personally would most certainly not consider these measurements to be anywhere near accurate.”
Maybe the errors would cancel out and some “signal” could be extracted statistically? Maybe statistical corrections could be made to compensate for the variable locations. It would be better than nothing, if it at least indicated long-term trends.
I think JGC needs to take a few deep breaths before hitting the return key. Anthony has been very generous in the way he has shared this blog with other people who are trying to get the message out.
Resolving problems with the temperature datasets is not going to happen overnight, and we really need to stick together on this. I wouldn’t have known about JGC’s work if it weren’t for this blog.
That said I liked what he did with putting the grids on Google Earth using KML files, and it got me thinking if it would be worthwhile doing something like Leif is suggesting, a sort of polycell distribution rather than a gridcell distribution.
Temp stations aren’t evenly distributed, but if each was allocated a polygonal area based on how close it’s neighbouring stations are, i.e. more stations equals smaller polygons, would that lead to a more accurate picture about temps?
It would get over the land-sea problem with gridcells and it would also mean that temps could be calculated by country, then just divide the total land area by the country area to get it’s contribution to the overall temperature record. Then do the same with sea temps. It could then be compared to satellite records, and if it matched them closely, then it could be a valid way of analysing the temp dataset.
I’ve glanced at the maths that would be involved in doing something like that, and it’s not pretty, but it would only have to be done once initially for all existing stations and then as new stations are added or subtracted, it would have to be redone for the neighbouring polygons.
Would this be a valid way of providing an alternative dataset, or would it be too computationally heavy for what it added?
So the Royal navy logged temperatures twice a DAY.
WHEN? Take a look at ANY 24 hour temperature data on Weather Underground.
Unless temperature is taken continuously, the error bounds on diary and ship log temperatures HAS TO BE THE ERROR BOUND of the timing of the taking of the temperature.
This could be HUGE. (I.e., plus or minus say 5 to 10 C depending on the location and the time of the year.)
Yet another “error” to put into the mix.
This historical stuff is so much STUFF AND NONSENSE.
Hugoson
Ok,back to obscurity, John.