John Graham-Cumming has posted an interesting analysis, he could benefit from some reader input at his blog.
See here and below: http://www.jgc.org/blog/
Adjusting for coverage bias and smoothing the Met Office data
As I’ve worked through Uncertainty estimates in regional and global observed temperature changes: a new dataset from 1850 to reproduce the work done by the Met Office I’ve come up against something I don’t understand. I’ve written to the Met Office about it, but until I get a reply this blog post is to ask for opinions from any of my dear readers.
In section 6.1 Brohan et al. talk about the problem of coverage bias. If you read this blog post you’ll see that in the 1800s there weren’t many temperature stations operating and so only a small fraction of the Earth’s surface was being observed. There was a very big jump in the number of stations operating in the 1950s.
That means that when using data to estimate the global (or hemispheric) temperature anomaly you need to take into account some error based on how well a small number of stations act as a proxy for the actual temperature over the whole globe. I’m calling this the coverage bias.
To estimate that Brohan et al. use the NCEP/NCAR 40-Year Reanalysis Project data to get an estimate of the error for the groups of stations operating in any year. Using that data it’s possible on a year by year basis to calculate the mean error caused by limited coverage and its standard deviation (assuming a normal distribution).
I’ve now done the same analysis and I have two problems:
1. I get much wider error range for the 1800s than is seen in the paper.
2. I don’t understand why the mean error isn’t taken into account.
Note that in the rest of this entry I am using smoothed data as described by the Met Office here. I am applying the same 21 point filter to the data to smooth it. My data starts at 1860 because the first 10 years are being used to ‘prime’ the filter. I extend the data as described on that page.
First here’s the smooth trend line for the northern hemisphere temperature anomaly derived from the Met Office data as I have done in other blog posts and without taking into account the coverage bias.
And here’s the chart showing the number of stations reporting temperatures by year (again this is smoothed using the same process).
Just looking at that chart you can see that there were very few stations reporting temperature in the mid-1800s and so you’d expect a large error when trying to extrapolate to the entire northern hemisphere.
This chart shows the number of stations by year (as in the previous chart), it’s the green line, and then the mean error because of the coverage bias (red line). For example, in 1860 the coverage bias error is just under 0.4C (meaning that if you use the 1860 stations to get to the northern hemisphere anomaly you’ll be too hot by about 0.4C. You can see that as the number of stations increases and global coverage improves the error drops.
And more interesting still is the coverage bias error with error bars showing one standard deviation. As you might expect the error is much greater when there are fewer stations and settles down as the number increases. With lots of stations you get a mean error near 0 with very little variation: i.e. it’s a good sample.
Now, to put all this together I take the mean coverage bias error for each year and use it to adjust the values from the Met Office data. This causes a small downward change which emphasizes that warming appears to have started around 1900. The adjusted data is the green line.
Now if you plot just the adjusted data but put back in the error bars (and this time the error bars are 1.96 standard deviations since the published literature uses a 95% confidence) you get the following picture:
And now I’m worried because something’s wrong, or at least something’s different.
1. The published paper on HadCRUT3 doesn’t show error bars anything like this for the 1800s. In fact the picture (below) shows almost no difference in the error range (green area) when the coverage is very, very small.
2. The paper doesn’t talk about adjusting using the mean.
So I think there are two possibilities:
A. There’s an error in the paper and I’ve managed to find it. I consider this a remote possibility and I’d be astonished if I’m actually right and the peer reviewed paper is wrong.
B. There’s something wrong in my program in calculating the error range from the sub-sampling data.
If I am right and the paper is wrong there’s a scary conclusion… take a look at the error bars for 1860 and scan your eyes right to the present day. The current temperature is within the error range for 1860 making it difficult to say that we know that it’s hotter today than 150 years ago. The trend is clearly upwards but the limited coverage appears to say that we can’t be sure.
So, dear readers, is there someone else out there who can double check my work? Go do the sub-sampling yourself and see if you can reproduce the published data. Read the paper and tell me the error of my ways.
UPDATE It suddenly occurred to me that the adjustment that they are probably using isn’t the standard deviation but the standard error. I’ll need to rerun the numbers to see what the shape looks like, but it should reduce the error bounds a lot.
WUWT readers please go to http://www.jgc.org/ to discuss and see the latest updates.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.






George E. Smith
We agree on so much.
There are huge issues in the time dimension. I still take the view that where arbitrary “homogenisation” turns a negative trend into a positive, we have nothing more than an elaborate way of saying the historic data is junk.
I agree with your point that the spatial dimension may not have had adequate attention from a sampling perspective. Who has done the analysis to satisfy us that there is no distortion of the signal due to spatial aliasing?
I can appreciate many of the more detailed points you make. But let’s leave that as a challenge to the literature – show us the papers which have addressed and answered these issues. If there are none, it has to be a matter for further research. But it would leave little reason to pay attention to aggregation of the surface network.
“Well no central limit theorem is going to buy you a reprieve from a Nyquist criterion violation and no amount of linear or non-linear regression analysis, is ever going to recover the true signal which has been permanently and irretrievably corrupted by in band aliassing noise.”.
Yep. Worth repeating.
There is a tendency in many of these comments to conflate the requirements for obtaining an accurate mean global temperature with those for obtaining a mean temperature *trend*. The latter does not depend on the former, and it is the latter which is of interest for AGW theory. Even a fairly small number of globally well-distributed stations (in terms of lat/long and altitude) which have long-term records could shed useful light on the trend question. Or as supercritical suggested, stations with shorter-term records could reveal decadal trends, which could be combined to give a long-term trend.
Obtaining an “accurate” global mean is probably insuperably difficult. Obtaining a plausible global trend is probably doable.
George E. Smith (10:20:39) :
Better still..
Try to reconstruct Mozart’s 19th piano concerto by taking 1 out of every 40 notes in succession from the manuscript…
George E. Smith (10:20:39) :
. . . have your computer (you write the code) go through the data, and pull out every 200th [20th] digital sample of that piece of music. So maybe the disc is recorded at 88 Khz sampling rate or something like that, so you are going to end up with about 4.4 Khz rate of selected samples.
Given my tinnitus from thousands of hours piloting noisy airplanes, anything above 4.4 Khz is wasted on me. I think ordinary telephone (copper wire) bandwidth is around 3 to 4 Khz. All the data in conversations is carried well below 1 Khz, so does your point have to do with sample size or station distribution?
The stations chosen should be those whose microenvironments did not change significantly during the observation period (decadal or century) of course.
John Graham-Cumming (15:28:32), congrats on the backpat from the Met Office.
There are many means for computing statistic uncertainties.
1- The most common way assumes that noise (or error, etc.) follows a normal law (gaussian-like). The more you get data, the more ( overall noise / number of data ) averages toward 0. Of course this is pure bet since in many cases noise does not follow normal laws … but classical statiscians often simply forget this care ! For example unknown phenomenons affect all sensors the same way at related times : thermometers boxes are repainted the same way which, boxes shape changes and reacts in a different maner depending on winds, warming urban island grows more or less simultaneously, etc.
2- Another way is to adress uncertainties related to data inhibition is to guess (extapolate, modelize, etc.) the data of one sensor (e.g. thermometor) from the data of other sensors (e.g. thermometors). This way you can compute an uncertainty related to the inhibition of one sensor, then a second one, etc. This more empirical way is muche more costly and shall be prefered only when you get time and money to design the cross-models and many sensors data.
3- An intermediate way consists into getting the delta serial on the (geographicaly averaged) mean temperatures when you inhibit the data of one or the other sensors. You can do it with 3a- simple means or preferably 3b- geographic-surface-weighted-means. Either way you get a fast evaluation of robustness against sensor inhibition.
Method 3 is much more simple, easy, fast, neutral, approximate than method 2. So it is sub-optimal on a mathematical point of view but may prove more reliable on a management point of view facing human biases (guess what I mean following CRU, Climategate etc.).
In cases 2 and 3b, strong robustness are plausible if the remaining sensors are sufficient for guessing/extrapolating (case 2) or localy-averaging/interpolating ( case 3b) the data of each previous sensor.
Although I have not read the article you make mention of, I guess (?) from some Climategate reading the way CRU is used to be working relies on geographic-surface-weighted-means (case 3b), using Voronoi/Delaunay diagramms for computing the relevant surfaces. Such diagramms associate a given location to the 3 sensors drawing the smallest triangle around it. Then some weighted average is computed (possibly with 1/distance to each of these 3 sensors).
cf. CBS article:
http://www.cbsnews.com/blogs/2009/11/24/taking_liberties/entry5761180.shtml
They do not seem to be happy with their Delaunay diagramm(s) because it has turned out to be too loose at lest in some World regions.
Regards,
Xavier DRIANCOURT
PhD machine learning, etc.
On a more genaral basis, there is a global statistic theory on robustness evaluation and uses for tuning statistic systems. See US Vladimir VAPNIK, Léon BOTTOU, et al. on this. They follow some previous USSR work on regularisation (i.e. stabilisation) of unstable systems see TYKONOV et al. on this.
“”” P Wilson (16:15:14) :
George E. Smith (10:20:39) :
Better still..
Try to reconstruct Mozart’s 19th piano concerto by taking 1 out of every 40 notes in succession from the manuscript… “”
Hey that works for me PW; as I recall at least from “Amadeus”, Mozart reportedly said his works had just the right number of notes.
I can assert that it is Ok to play the Symphony # 41 (in C Major, (the Jupiter), and leave out the second clarinet part, and nobody will notice; well they won’t even notice if you leave out the first clarinet part.
But your example may be even better than mine. If anybody can hum the clarinet part in the Jupiter symphony, give me a buzz.
And for Mike McMillan, my experiment was to point out the folly of sampling at too low a rate. The example that everybody is familiar with is the movie of TV Horse Opera with the damsel in distress in a runaway chuck wagon, with the wheels wildly turning backwards. At 24 samples per second (movie or 60 (50 in Europe) for standard TV, the moving wheel spokes represent a time varying signal that has a frequency that is higher than half the sample rate, bearing in mind that the replacement of one spoke, by its neighbor constitutes one cycle, if you sample at exactly the spoke frequency which is half of the Nyquist rate, the spokes appear stationary, and the stationary spokes (the average condition) could appear in any phase, so the error in spoke position could be anywhere in the amplitude of the out of band signal, in this case the spokes moving too fast for the frame rate.
The sampling theorem does not require equal spacing of samples, so random sampling is ok, so long as the maximum sample spacing is no larger than a half cycle of the highest signal frequency. So random sampling is less efficient (you need more sample points) but it has some advantages, and can eleiminate the degenerate case of sampling at exactly twice the signal frequency.
For example, if you have a pure sinusoidal signal at a frequency f, and you sample at exactly 2f, that satisfies the Nyquist criterion, but it could happen that all the samples happened at the zero crossing points, in which case the reconstruction would be zero signal. But if you happened to sample at the positive and negative peaks, you would recover the correct signal amplitude (but you wouldn’t know it was correct. Random sampling would slue threough the whole cycle and eventually reproduce at least a repetitive signal, so random sampling has been used to advantage in sampling oscilloscopes for many years
With climate data gathering from ground stations you automatically end up with random spatial sampling, but unfortunately you don’t have enough samples by orders of magnitude to correctly recover the complete continuous global temperature map or even its average, over time and space.
“”” bill (15:15:22) :
George E. Smith (10:20:39) :
Here’s an interesting experiment for some of you computer nerds to try when you have an evening free;
Ever heard of MP3 format Mr. Smith?
or
PASC (Precision Adaptive Sub-band Coding)
or
ATRAC etc.
Yes I have; does the algorythm I suggested perform Precision Adaptive Sub-band Coding ?
The human ear is well known for being able to find intelligence in the most garbled sounds. Early encoding using things like audio spectrum inversions and such were found to still leave voice messages intelligible to a trained ear.
What we are interested in with climate data, is in being able to recover the correct signal; not one that is similar in some respects.
Can MP-3 encoding be done live in real time; or does it require fore-knowledge of what information is coming next. How would we disburse global temperature sampling stations spatially so as to be able to MP-3 encode them to reduce the amount of data ?
If we can do that it would be a good idea.
Sony were criticized for deliberately falsifying error correction bits as a form of copy protection on their audio CDs.
If an audio CD player saw the error it would just repeat the previous sample or maybe interpolate but a computer needs to have the exact number so would make a few attempts at reading and then crash out.
Very few people, if any, noticed any degradation in the music which is just as well because audio CDs would exhibit this sort of behaviour quite frequently due to misreads even with the correct error correction.
what data can you trust anymore.
take any real (unadjusted) data from the time thermometers were accurate for a baseline, and work to the present time.
we don’t need 0.1 accuracy to spot trends. trends don’t mean anything anyway. were just chasing our tail aren’t we?
if were trying to predict the future were fooling ourselves, if were trying to understand the climate lets collect data.
About the MET surprisingly the errors(errare humanum est) goes only in the “good” way cooler for the Optimum Medieval and hotter in this case.
The met that we all know is in busisness with the barbecue industry should remember
Perseverare diabolicum
The update posted by John Graham-Cumming rings distant bells from my question to the BoM a couple of weeks ago:
“While analysing monthly temperatures in Western Australia for the past 12 months based upon data on the BoM website, I noticed that according to my records the August 2009 data for all observation sites in WA has been adjusted at some time since that month (November 17 I believe). The adjustment has resulted in the mean min and mean max increasing by an average .5 degrees C at all sites for August 2009. The adjustment at almost all sites was a uniform increase for both min and max… i.e. if the min went up by .4, so too did the max. If the min went up by .5, so too did the max. Could you please let me know what caused the adjustment?”
Reply:
“Thanks for pointing this problem out to us. Yes, there was a bug in the Daily Weather Observations (DWO) on the web, when the updated version replaced the old one around mid November. The program rounded temperatures to the nearest degree, resulting in mean maximum/minimum temperature being higher. The bug has been fixed since and the means for August 2009 on the web are corrected.”
There seem to be so many bugs in Australian data that it needs insect spray.
I’ve just uploaded linear graphs and source data for 24 Western Australia surface stations mostly dating to pre-1900 showing the official trendlines according to the historic BoM data, the High Quality data homogenised by the BoM, the GISS records and the HadCRUT3 data, where available:
Albany
Bridgetown
Broome
Busselton
Cape Leeuwin
Cape Naturaliste
Carnarvon
Derby
Donnybrook
Esperance
Eucla
Eyre
Geraldton
Halls Creek
Kalgoorlie
Katanning
Kellerberrin
Marble Bar
Merredin
Perth
Rottnest Island
Southern Cross
Wandering
York
It would appear that different people recording temperature readings on different thermometers in different kinds of places in different seasonal conditions would provide a veried report on local temperature conditions. (as much as 8 degrees F.)
Now how in the world can any one tease 0.6 degrees F. long term climate change from this record and temperature is only one leg of the total energy in the enviroment.
I believe that it has already been demonstrated that when you add a starting data set to a to be averaged string and a set to the end to compleat the averaging, you get an up tick (hockystick) at the end.
I think Roy Spencer and George E. Smith may have it right, Temperature is a local result of all the energy conditions in the system.
Jordan,
Sorry, thought you would read the references.
http://www3.interscience.wiley.com/journal/113468782/abstract?CRETRY=1&SRETRY=0
leonardo.met.tamu.edu/people/faculty/north/pdf/64.pdf
steven mosher
Thanks again for the further reference. I did notice the cross-reference to North, but focused my attention on the paper you suggested. I’ll have a look at the North paper today.
George E Smith
“For example, if you have a pure sinusoidal signal at a frequency f, and you sample at exactly 2f, that satisfies the Nyquist criterion, but it could happen that all the samples happened at the zero crossing points, in which case the reconstruction would be zero signal. But if you happened to sample at the positive and negative peaks, you would recover the correct signal amplitude (but you wouldn’t know it was correct.”
Again, fair points. However we should be much more comfortable with the mechanical min/max thermometers as they have an underlying continuous measurement. No?
any one seen this?
from the nz climate coalition web site
http://www.investigatemagazine.com/australia/latestissue.pdf
The data for Broome station shows no trend at all. This must be excluded from the official global dataset.
stephen
I have had a look at two papers by North. I get the gist of his analysis, where it is coming from and where it ends. But it is still in the realms of measuring statistical aggregates, and not issues around how to reconstruct a signal from sampled data – as that’s what the above posts are about.
I’m sure we can all agree that knowledge of the statistical properties of a signal does not allow us to reconstruct the signal.
Thinking about it, failure to observe the Nyquist sampling period (in time or in space) can have an imact on how we measure statistical properties. Injudicious sampling interval could result in a need to increase the sampling interval, since more samples would be needed to counter the effects of aliasing. But other than that, I would be confident that the statistical properties should emerge from sample data, eventually.
That is not the case when we seek to reconstruct a signal from sampled data. Randomly scattering a minimum number of point measuring stations around the globe is not a sufficient condition to enable us to reconstruct a global signal.
It is essential to assess and then comply with a minimum distance between measuring points in order to avoid spatial aliasing. Failure could result in a seriously distorted impression of a trend in the global signal.
To repeat, those who claim the MWP was a “local effect” are making exactly the same point.
And – as somebody mentioned above – siting measuring stations at airports and other built-up could be another example of distortions being introduced by spatial aliasing. A way to address this kind of issue would be to design into the measuring system a decent number of point-samples between the airports and cities. With that, we would have a chance of being able to re-construct a picture of the true temperature field.
And that’s before we even think about polar ice caps and the Pacific Ocean.
In the absence of the matter being addressed in the literature, this looks like a serious gap in how we have approached the reconstruction of the putative global temperature trend.
George makes a good point – can we be satisified that the global temperature trends have not been reduced to junk due to spatial aliasing,
I don’t see how the sampling theorem is relevant. We are not interested in exactly recreating daily temperature variations, but estimation of variations over decadal time frames (besides, if you have a priori information that there is a 24-hour signal with harmonics, the sampling theorem doesn’t really apply, does it). That said, I certainly agree that there are large errors due in part to poor sampling methods, and the HadCRUT3 error estimates seem to be off by an order of magnitude or so.
For Peter dtm
When I was Apprentice, 3rd. Mate and 2nd. Mate, I served on several Weather Reporting Ships and agree with your post. All the Mates tried very hard to get it right and Sparks always tried to get the message away timeously.
Although it might have been hard to take readings in a force 9 or more, those days were few and I would say that 98% of our reports were as good as the instruments would let them be.
The raw data must be somewhere; Portishead might know where their log records are now.
Jordan I think the sampling of an audio signal is very misleading
The spatial field is highly correlated. It’s boring music.
Toho (08:26:18) :
“I don’t see how the sampling theorem is relevant. We are not interested in exactly recreating daily temperature variations, but estimation of variations over decadal time frames”
It is important to get the temporal sampling regime right, although we also need to consider spatial sampling and risk of aliasing.
Take an event like ENSO – a phenomenon which evolves over several months, shifting thermal energy over wide regions. it is not felt equally over all parts of the globe.
Spatial distribution of the measurement network is surely a crucial factor in our attempts to create an accurate picture. Get it wrong and the picture could be completely wrecked by spatial aliasing. (Something I often wonder, when looking at 1998.)
The measurement network seems to change almost continuously (above plot of number of stations). If there was an identical “1998” at a different time, to greater or lesser extent, the temperature reconstruction would produce a different impression – solely due to changes in the network.
It is neither standard deviation nor standard error, it is standard fraud. Eliminate the fraud and the data will make more sense.
Jordan:
I agree with most of what you write. But my point is that it doesn’t follow from the sampling theorem. The sampling theorem is about recreating exactly. Sure, you are going to lose information when sampling, and that will cause errors in the temperature estimates. I certainly agree with that. But it has nothing to do the sampling theorem. I don’t think aliasing is a big deal by the way, because localized atmospheric energy will not stay localized for long.