Guest Essay by Kip Hansen
Those following the various versions of the “2014 was the warmest year on record” story may have missed what I consider to be the most important point.
The UK’s Met Office (officially the Meteorological Office until 2000) is the national weather service for the United Kingdom. Its Hadley Centre in conjunction with Climatic Research Unit (University of East Anglia) created and maintains one of the world’s major climatic databases, currently known as HADCRUT4 which is described by the Met Office as “Combined land [CRUTEM4] and marine [sea surface] temperature anomalies on a 5° by 5° grid-box basis”.
The first image here is their current graphic representing the HADCRUT4 with hemispheric and global values.
The Met Office, in their announcement of the new 2014 results, made this [rather remarkable] statement:
“The HadCRUT4 dataset (compiled by the Met Office and the University of East Anglia’s Climatic Research Unit) shows last year was 0.56C (±0.1C*) above the long-term (1961-1990) average.”
The asterisk (*) beside (+/-0.1°C) is shown at the bottom of the page as:
“*0.1° C is the 95% uncertainty range.”
So, taking just the 1996 -> 2014 portion of the HADCRUT4 anomalies, adding in the Uncertainty Range as “error bars”, we get:
The journal Nature has a policy that any graphic with “error bars” – with quotes because these types of bars can be many different things – must include an explanation as to exactly what those bars represent. Good idea!
Here is what the Met Office means when it says Uncertainty Range in regards HADCRUT4, from their FAQ:
“It is not possible to calculate the global average temperature anomaly with perfect accuracy because the underlying data contain measurement errors and because the measurements do not cover the whole globe. However, it is possible to quantify the accuracy with which we can measure the global temperature and that forms an important part of the creation of the HadCRUT4 data set. The accuracy with which we can measure the global average temperature of 2010 is around one tenth of a degree Celsius. The difference between the median estimates for 1998 and 2010 is around one hundredth of a degree, which is much less than the accuracy with which either value can be calculated. This means that we can’t know for certain – based on this information alone – which was warmer. However, the difference between 2010 and 1989 is around four tenths of a degree, so we can say with a good deal of confidence that 2010 was warmer than 1989, or indeed any year prior to 1996.” (emphasis mine)
This is a marvelously frank and straightforward statement. Let’s parse it a bit:
• “It is not possible to calculate the global average temperature anomaly with perfect accuracy …. “
Announcements of temperature anomalies given as very precise numbers must be viewed in light of this general statement.
• “…. because the underlying data contain measurement errors and because the measurements do not cover the whole globe.”
The reason for the first point is that the original data themselves, right down to the daily and hourly temperatures recorded in humongous data sets, contain actual measurement errors – part of this includes such issues as accuracy of equipment and units of measurement – and errors introduced by methods to attempt to account for “measurements do not cover the whole globe” – various methods of in-filling.
• “However, it is possible to quantify the accuracy with which we can measure the global temperature and that forms an important part of the creation of the HadCRUT4 data set. The accuracy with which we can measure the global average temperature of 2010 is around one tenth of a degree Celsius.”
Note well that the Met Office is not talking here of statistical confidence intervals but “the accuracy with which we can measure” – measurement accuracy and its obverse, measurement error. What is that measurement accuracy? “…around one tenth of a degree Celsius” or, in common notation +/- 0.1 °C. Note also that this is the Uncertainty Range given for the HADCRUT4 anomalies around 2010 – this uncertainty range does not apply, for instance, to anomalies in the 1890s or the 1960s.
• “The difference between the median estimates for 1998 and 2010 is around one hundredth of a degree, which is much less than the accuracy with which either value can be calculated. This means that we can’t know for certain – based on this information alone – which was warmer.”
We can’t know (for certain or otherwise) which is different from any of the other 21st century data points that are reported as within 100ths of a degree of one another. The values can only be calculated to an accuracy of +/- 0.1˚C
And finally,
• “However, the difference between 2010 and 1989 is around four tenths of a degree, so we can say with a good deal of confidence that 2010 was warmer than 1989, or indeed any year prior to 1996.”
It is nice to see them say “we can say with a good deal of confidence” instead of using a categorical “without a doubt”. If two data are 4/10ths of a degree different, they are confident of a difference and the sign, + or -.
Importantly, Met Office states clearly that the Uncertainty Range derives from the accuracy of measurement and thus represents the Original Measurement Error (OME). Their Uncertainty Range is not a statistical 95% Confidence Interval. While they may have had to rely on statistics to help calculate it, it is not itself a statistical animal. It is really and simply the Original Measurement Error (OME) — the combined measurement errors and lack of accuracies of all the parts and pieces, rounded off to a simple +/- 0.1˚C, which they feel is 95% reliable – but has a one in twenty chance of being larger or smaller. (I give links for the two supporting papers for HADCRUT4 uncertainty at the end of the essay.****)
UK Met Office is my “Hero of the Day” for announcing their result with its OME attached – 0.56C (±0.1˚C) – and publicly explaining what it means and where it came from.
[ PLEASE – I know that many, maybe even almost everyone reading here, think that the Met Office’s OME is too narrow. But the Met Office gets credit from me for the above – especially given that the effect is to validate The Pause publically and scientifically. They give their two papers**** supporting their OME number which readers should read out of collegial courtesy before weighing in with lots of objections to the number itself. ]
Notice carefully that the Met Office calculates the OME for the metric and then assigns that whole OME to the final Global Average. They do not divide the error range by the number of data points, they do not reduce it, they do not minimize it, they do not pretend that averaging eliminates it because it is “random”, they do not simply ignore it as if was not there at all. They just tack it on to the final mean value – Global_Mean( +/- 0.1°C ).
In my previous essay on Uncertainty Ranges… there was quite a bit of discussion of this very interesting, and apparently controversial, point:
Does deriving a mean* of a data set reduce the measurement error?
Short Answer: No, it does not.
I am sure some of you will not agree with this.
So, let’s start with a couple of kindergarten examples:
Example 1:
Here’s our data set: 1.7(+/-0.1)
Pretty small data set, but let’s work with it.
Here are the possible values: 1.8, 1.7, 1.6 (and all values in between)
We state the mean = 1.7 Obviously, with one datum, it itself is the mean.
What are the other values, the whole range represented by 1.7(+/-0.1)?:
1.8 and every other value to and including 1.6
What is the uncertainty range?: + or – 0.1 or in total, 0.2
How do we write this?: 1.7(+/-0.1)
Example 2:
Here is our new data set: 1.7(+/-0.1) and 1.8(+/-0.1)
Here are the possible values:
1.7 (and its +/-s) 1.8, 1.6
1.8 (and its +/-s) 1.9, 1.7
What’s the mean of the data points? 1.75
What are the other possible values for the mean?
If both data are raised to their highest value +0.1:
1.7 + 0.1 = 1.8
1.8 + 0.1 = 1.9
If both are lowered to their lowest -0.1:
1.7 – 0.1 = 1.6
1.8 – 0.1 = 1.7
What is the mean of the widest spread?
1.9 + 1.6 / 2 = 1.75
What is the mean of the lowest two data?
1.6 + 1.7 / 2 = 1.65
What is the mean of the highest two data:
1.8 + 1.9 / 2 = 1.85
The above give us the range of possible means: 1.65 to 1.85
0.1 above the mean and 0.1 below the mean, a range of 0.2
Of which the mean of the range is: 1.75
Thus, the mean is accurately expressed as 1.75(+/-0.1)
Notice: The Uncertainty Range, +/-0.1, remains after the mean has been determined. It has not been reduced at all, despite doubling the “n” (number of data). This is not a statistical trick, it is elementary arithmetic.
We could do this same example for data sets of three data, then four data, then five data, then five hundred data, and the result would be the same. I have actually done this for up to five data, using a matrix of data, all the pluses and minuses, all the means of the different combinations – and I assure you, it always comes out the same. The uncertainty range, the original measurement accuracy or error, does not reduce or disappear when finding of the mean of a set of data.
I invite you to do this experiment yourself. Try the simpler 3-data example using the data like 1.6, 1.7 and 1.8 ~~ all +/- 0.1s. Make a matrix of the nine +/- values: 1.6, 1.6 + 0.1, 1.6 – 0.1, etc. Figure all the means. You will find a range of means with the highest possible mean 1.8 and the lowest possible mean 1.6 and a median of 1.7, or, in other notation, 1.7(+/-0.1).
Really, do it yourself.
This has nothing to do with the precision of the mean. You can figure a mean to whatever precision you like from as many data points as you like. If your data share a common uncertainty range (original measurement error, a calculated ensemble uncertainty range such as found in HADCRUT4, or determined by whatever method) it will appear in your results exactly the same as the original – in this case, exactly +/- 0.1.
The reason for this is clearly demonstrated in our kindergarten example of 1, 2 and 3-data data sets – it is a result of the actual arithmetical process one must use in finding the mean of data each of which represent a range of values with a common range width*****. No amount of throwing statistical theory at this will change it – it is not a statistical idea, but rather an application of common grade-school arithmetic. The results are a range of possible means, the mean of which we use as “the mean” – it will be the same as the mean of the data points when not taking into account the fact that they are ranges. This range of means is commonly represented with the notation:
Mean_of_the_Data Points(+/- one half of the range)
– in one of our examples, the mean found by averaging the data points is 1.75, the mean of the range of possible means is 1.75, the range is 0.2, one-half of which is 0.1 — thus our mean is represented 1.75(+/-0.1).
If this notation X(+/-y) represents a value with its original measurement error (OME), maximum accuracy of measurement, or any of the other ways of saying that the (+/-y) bit results from the measurement of the metric then X(+/-y) is a range of values and must be treated as such.
Original Measurement Error of the data points in a data set, by whatever name**, is not reduced or diminished by finding the mean of the set – it must be attached to the resulting mean***.
# # # # #
* – To prevent quibbling, I use this definition of “Mean”: Mean (or arithmetic mean) is a type of average. It is computed by adding the values and dividing by the number of values. Average is a synonym for arithmetic mean – which is the value obtained by dividing the sum of a set of quantities by the number of quantities in the set. An example is (3 + 4 + 5) ÷ 3 = 4. The average or mean is 4. http://dictionary.reference.com/help/faq/language/d72.html
** – For example, HADCRUT4 uses the language “the accuracy with which we can measure” the data points.
*** – Also note that any use of the mean in further calculations must acknowledge and account for – both logically and mathematically – that the mean written as “1.7(+/-0.1)” is in reality a range and not a single data point.
**** – The two supporting papers for the Met Office measurement error calculation are:
Colin P. Morice, John J. Kennedy, Nick A. Rayner, and Phil D. Jones
and
J. J. Kennedy , N. A. Rayner, R. O. Smith, D. E. Parker, and M. Saunby
***** – There are more complicated methods for calculating the mean and the range when the ranges of the data (OME ranges) are different from datum to datum. This essay does not cover that case. Note that the HADCRUT4 papers do discuss this somewhat as the OMEs for Land and Sea temps are themselves different.
# # # # #
Author’s Comment Policies: I already know that “everybody” thinks the UK Met Office’s OME is [pick one or more]: way too small, ridiculous, delusional, an intentional fraud, just made up or the result of too many 1960s libations. Repeating that opinion (with endless reasons why) or any of its many incarnations will not further enlighten me nor the other readers here. I have clearly stated that it is the fact that they give it at all and admit to its consequences that I applaud. Also, this is not the place continue your One Man War for Truth in Climate Science (no matter which ‘side’ you are on) – please take that elsewhere.
Please try to keep comments to the main points of this essay –
Met Office’s remarkable admission of “accuracy with which we can measure the global average temperature” and that statement’s implications.
and/or
“Finding the Mean does not Reduce Original Measurement Error”.
I expect a lot of disagreement – this simple fact runs against the tide of “Everybody- Knows Folk Science” and I expect that if admitted to be true it would “invalidate my PhD”, “deny all of science”, or represent some other existential threat to some of our readers.
Basic truths are important – they keep us sane.
I warn commenters against the most common errors: substituting definitions from specialized fields (like “statistics”) for the simple arithmetical concepts used in the essay and/or quoting The Learned as if their words were proofs. I will not respond to comments that appear to be intentionally misunderstanding the essay.
# # # # #
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.

I’ve been reading this recent book rubbishing climate change Climate Change The Facts ed Alan Moran – there’s a dense chapter on Forecasting which sets out the rules and studies the inaccuracies of the past. Does the Met Office obey any of that? Can we ask them? These five year jobboes they do? One should watch the trend in the language as the truth emerges – could be good for a laugh if I’m spared.
Reply to Lion Heart ==> In regards to forecasting principles, see Forecasting Principles .
The BEST data set includes uncertainty values (margin of error) for each month. The uncertainty in the early years is huge. So large, in fact, that it is easy to construct a set of values, well within the uncertainty bars, which shows that the earth has been cooling since 1752.
Reply to Keith A. Nonemaker ==> Yes — that is an important point. The BEST project data set does have acknowledged “error bars” but these are expressly defined as “(0.05 C with 95% confidence)” for the year 2014 — and defines this term, in their latest missive as “the margin of uncertainty (0.05 C).”. So I expect that this is a calculated statistical Confidence Interval.
The “margin of uncertainty” the 1890s looks to be, from their anomaly chart, a mere 0.35 and by the early 1900s, they seem to have scrubbed it out nearly altogether.
Hi Kip, you should really read up on the theory behind averaging multiple inaccurate measurements and how that IMPROVES accuracy. Regardless of what you think It is a very common practice which WORKS in the real physical world. As an example you could read the application note AN118 from Silicon Laboratories (oversampling = multiple measurements). Quote: “Oversampling and averaging can be used to increase measurement resolution, eliminating the need to resort to expensive, off-chip ADC’s”. Or, mapped to the discussion at hand: multiple inaccurate temperature readings can be averaged to a result which have greater accuracy than any individual reading. The AN118 even contains this exact example!
There are probably many reasons (such as various forms of bias and adjustments) why we should regard the temperature record with skepticism, the process of getting a more reliable answer by averaging many readings is however not one of them.
Sorry, but the main conclusion of your post is just plain wrong.
Pax: the very well known and used data sheet AN118 is using ‘ONE’ A2D and increases the effective number of A2D bits very well.
This thread has been about averaging readings from thousands of sensors (A2Ds if you like) and being unable to eradicate the original uncertainty.
Not quite comparing like with like I think.
See my response at February 10, 2015 at 12:31 pm, an uncertainty is for life not just for averaging.
Would you suggest that if you made 1000’s of products A and B, then due to the massive number, when averaged, that they would now fit into a slot 27.8mm wide?
I suspect a wise person would agree that if you do not know what the size/value of something is precisely, AND you need to use that something later, then you have to retain that lack of precision otherwise only bad fortune will follow. In my example above, worst case using your averaging methods, some times the products A and B will not fit into the slot. With my method, they always will fit.
In the temperature case, you can not know which of your thermometers are reading high or low at anyone time (hence this discussion) so the only legitimate action is to keep the original ‘wide’ error bars (or uncertainty).
Surely if you employ enough Climate Scientists you should be able to eliminate the error in what they’re telling you 😉
Reply to pax ==> I’ll leave the technical point to steverichards1984, who speaks specifically about AN118.
You might consider why, with so many math and science people commenting here, that they don’t all weigh in in defense of your opinion. I think it is that they see the difference between the application of my simple grade school math to the question of finding the mean of a series of different measurements of different things, measurements themselves that must be considered ranges and the question of “a thousand thermometers in my backyard, read all at once” (or, a thousand thermometers in my never-changing backyard read over and over again.) I do appreciate your devil’s-advocate challenges.
Nonetheless, it would be informative for you to do the basic three-data data set experiment suggested in my essay, and if you find a different answer, or can state with simple logic why the results, which will be as I state, are somehow “wrong” — I would be happy to read it.
As Randy Newman says in “It’s a Jungle Out There”: I could be wrong now, but I don’t think so
Hi Steve, I do not see how using one measuring device making many readings compared to using many measuring devices each making one reading changes the central principle in any fundamental way.
“Would you suggest that if you made 1000’s of products A and B, then due to the massive number, when averaged, that they would now fit into a slot 27.8mm wide?” No, I would not suggest that. Further I do not understand your analogy since the individual product samples are not an approximation (inexact measurement) of a real physical property – the products are the physical property themselves.
Regarding the rest of your comment: You do not need to know which of your thermometers are reading high or low at anyone time so long as you are only measuring differences between means (temperature anomaly) – this is another reason why your product A/B analogy fails.
Of course, the thermometers 20 years ago are not the same as today so this could indeed make this whole exercise moot due to unknown bias. But Kip is making a much stronger statement here, he claims that the mathematics of deriving a mean does not give a more accurate value. Well, it verifiably does.
Hi Kip, I understand your simple example and I have explained why I think it does not apply – namely that you operate with a 100% uncertainty interval. I agree that the full (100%) uncertainty range remains after the mean has been derived.
Reply to pax ==> Now we see that we have been talking past one another.
I have been making the simple example that applies exactly to the case of our Co-Op Weather Station volunteer who reads the glass thermometer and regardless of actual reading, must write a value to the nearest whole degree — historically, the entire temperature record is made up of such readings until very recently. In the more recent past, a sensor does exactly the same thing, rounding to some decimal point — and then some software may do it again to some other less precise decimal point, so the example still applies. This produces a value that is, as you say, 100% uncertain across its entire range. With temperature records, this is the actuality — it is not a matter of probabilities or possibilities. ALL the temperature recordings are of this type — something “rounded” to some arbitrary accuracy.
We see that all surface temperature records are in actuality ranges — expressed in this manner: 71(+/-0.5) and although they look like statistical uncertainty ranges, they are as you describe, 100% uncertainty ranges/intervals by their very nature, and must be treated as such. Thus “the full (100%) uncertainty range remains after the mean has been derived.”
I am glad that we have worked through the disagreement and have arrived at a common understanding — even if you do not think applies. Thanks for sticking it out with me.
The temperature graph that starts the essay above is expressed in “anomaly” units of degrees C.
The text quotes the Met Office as “The HadCRUT4 dataset (compiled by the Met Office and the University of East Anglia’s Climatic Research Unit) shows last year was 0.56C (±0.1C*) above the long-term (1961-1990) average.”
That is, the average value of the climate between 2 nominated points in time, such as 1961 to 1990, is subtracted from the values displayed. Therefore, the errors calculated from the data in 1961-90 propagate into the year 2010 anomaly data.
The errors from 30 years of old data are highly probably greater than those of one year of more modern data such as the quoted year of 2010 – “The accuracy with which we can measure the global average temperature of 2010 is around one tenth of a degree Celsius.”
This is the first and the most simple way to question the 0.1 deg claim.
Further, in parts of the World like Australia, between about 1990 and 2010, most weather stations changed from using liquid in glass thermometers in Stevenson screens to MMTS thermistor-type detectors in various housing. An error in this changeover would propagate to the 2010 and 2014 calculations. There seems to be little data comparing the 2 instrument types. Anyone know where to look for detailed, controlled overlap data?
The formal approach to the propagation of error is given usefully at
http://www.bipm.org/metrology/thermometry/
A better link would be: http://www.bipm.org/en/publications/guides/
Reply to Geoff Sherrington and steverichards1984 ==> Thanks for the links…great stuff.
“One consequence of working only with temperature change is that our analysis does not produce estimates of absolute temperature. For the sake of users who require an absolute global mean temperature, we have estimated the 1951–1980 global mean surface air temperature as 14°C with uncertainty several tenths of a degree Celsius. That value was obtained by using a global climate model [Hansen et al., 2007] to fill in temperatures at grid points without observations, but it is consistent with results of Jones et al. [1999] based on observational data. The review paper of Jones et al. [1999] includes maps of absolute temperature as well as extensive background information on studies of both absolute temperature and surface temperature change.”
Error for global data between 1951-1980 was several tenths of a degree Celsius. They are now no more accurate and the distribution of stations used reduced since then. Therefore this would support the error recently is no where near the 0.1 c claim now and maybe if anything even a little worse than the stated period. Areas of less than 50 square miles can vary by several degrees centigrade. Changing some of the locations and reducing the coverage, only increases the errors seen in local regions. These errors are much larger than the ridiculous claim of 0.1 c.
Reply to Matt G ==> You quote, I assume, GLOBAL SURFACE TEMPERATURE CHANGE — Hansen et al. (2010) doi: 10.1029/2010RG000345.
Surface temperatures were still recorded from glass thermometers throughout the 1951-1980 period (almost exclusively). Thermometer temps must be recorded as ranges +/- 0.5 degrees (F or C). Since the original measurement error can not be excluded from the resulting mean, their uncertainty is minimally 0.5 degrees — and that is before all the uncertainty of in-filling etc is added onto that uncertainty range.
Thanks for highlighting this Hansen quote.
Kip Hansen February 12, 2015 at 5:49 am
That’s correct, your welcome.