Guest Essay by Kip Hansen
Those following the various versions of the “2014 was the warmest year on record” story may have missed what I consider to be the most important point.
The UK’s Met Office (officially the Meteorological Office until 2000) is the national weather service for the United Kingdom. Its Hadley Centre in conjunction with Climatic Research Unit (University of East Anglia) created and maintains one of the world’s major climatic databases, currently known as HADCRUT4 which is described by the Met Office as “Combined land [CRUTEM4] and marine [sea surface] temperature anomalies on a 5° by 5° grid-box basis”.
The first image here is their current graphic representing the HADCRUT4 with hemispheric and global values.
The Met Office, in their announcement of the new 2014 results, made this [rather remarkable] statement:
“The HadCRUT4 dataset (compiled by the Met Office and the University of East Anglia’s Climatic Research Unit) shows last year was 0.56C (±0.1C*) above the long-term (1961-1990) average.”
The asterisk (*) beside (+/-0.1°C) is shown at the bottom of the page as:
“*0.1° C is the 95% uncertainty range.”
So, taking just the 1996 -> 2014 portion of the HADCRUT4 anomalies, adding in the Uncertainty Range as “error bars”, we get:
The journal Nature has a policy that any graphic with “error bars” – with quotes because these types of bars can be many different things – must include an explanation as to exactly what those bars represent. Good idea!
Here is what the Met Office means when it says Uncertainty Range in regards HADCRUT4, from their FAQ:
“It is not possible to calculate the global average temperature anomaly with perfect accuracy because the underlying data contain measurement errors and because the measurements do not cover the whole globe. However, it is possible to quantify the accuracy with which we can measure the global temperature and that forms an important part of the creation of the HadCRUT4 data set. The accuracy with which we can measure the global average temperature of 2010 is around one tenth of a degree Celsius. The difference between the median estimates for 1998 and 2010 is around one hundredth of a degree, which is much less than the accuracy with which either value can be calculated. This means that we can’t know for certain – based on this information alone – which was warmer. However, the difference between 2010 and 1989 is around four tenths of a degree, so we can say with a good deal of confidence that 2010 was warmer than 1989, or indeed any year prior to 1996.” (emphasis mine)
This is a marvelously frank and straightforward statement. Let’s parse it a bit:
• “It is not possible to calculate the global average temperature anomaly with perfect accuracy …. “
Announcements of temperature anomalies given as very precise numbers must be viewed in light of this general statement.
• “…. because the underlying data contain measurement errors and because the measurements do not cover the whole globe.”
The reason for the first point is that the original data themselves, right down to the daily and hourly temperatures recorded in humongous data sets, contain actual measurement errors – part of this includes such issues as accuracy of equipment and units of measurement – and errors introduced by methods to attempt to account for “measurements do not cover the whole globe” – various methods of in-filling.
• “However, it is possible to quantify the accuracy with which we can measure the global temperature and that forms an important part of the creation of the HadCRUT4 data set. The accuracy with which we can measure the global average temperature of 2010 is around one tenth of a degree Celsius.”
Note well that the Met Office is not talking here of statistical confidence intervals but “the accuracy with which we can measure” – measurement accuracy and its obverse, measurement error. What is that measurement accuracy? “…around one tenth of a degree Celsius” or, in common notation +/- 0.1 °C. Note also that this is the Uncertainty Range given for the HADCRUT4 anomalies around 2010 – this uncertainty range does not apply, for instance, to anomalies in the 1890s or the 1960s.
• “The difference between the median estimates for 1998 and 2010 is around one hundredth of a degree, which is much less than the accuracy with which either value can be calculated. This means that we can’t know for certain – based on this information alone – which was warmer.”
We can’t know (for certain or otherwise) which is different from any of the other 21st century data points that are reported as within 100ths of a degree of one another. The values can only be calculated to an accuracy of +/- 0.1˚C
And finally,
• “However, the difference between 2010 and 1989 is around four tenths of a degree, so we can say with a good deal of confidence that 2010 was warmer than 1989, or indeed any year prior to 1996.”
It is nice to see them say “we can say with a good deal of confidence” instead of using a categorical “without a doubt”. If two data are 4/10ths of a degree different, they are confident of a difference and the sign, + or -.
Importantly, Met Office states clearly that the Uncertainty Range derives from the accuracy of measurement and thus represents the Original Measurement Error (OME). Their Uncertainty Range is not a statistical 95% Confidence Interval. While they may have had to rely on statistics to help calculate it, it is not itself a statistical animal. It is really and simply the Original Measurement Error (OME) — the combined measurement errors and lack of accuracies of all the parts and pieces, rounded off to a simple +/- 0.1˚C, which they feel is 95% reliable – but has a one in twenty chance of being larger or smaller. (I give links for the two supporting papers for HADCRUT4 uncertainty at the end of the essay.****)
UK Met Office is my “Hero of the Day” for announcing their result with its OME attached – 0.56C (±0.1˚C) – and publicly explaining what it means and where it came from.
[ PLEASE – I know that many, maybe even almost everyone reading here, think that the Met Office’s OME is too narrow. But the Met Office gets credit from me for the above – especially given that the effect is to validate The Pause publically and scientifically. They give their two papers**** supporting their OME number which readers should read out of collegial courtesy before weighing in with lots of objections to the number itself. ]
Notice carefully that the Met Office calculates the OME for the metric and then assigns that whole OME to the final Global Average. They do not divide the error range by the number of data points, they do not reduce it, they do not minimize it, they do not pretend that averaging eliminates it because it is “random”, they do not simply ignore it as if was not there at all. They just tack it on to the final mean value – Global_Mean( +/- 0.1°C ).
In my previous essay on Uncertainty Ranges… there was quite a bit of discussion of this very interesting, and apparently controversial, point:
Does deriving a mean* of a data set reduce the measurement error?
Short Answer: No, it does not.
I am sure some of you will not agree with this.
So, let’s start with a couple of kindergarten examples:
Example 1:
Here’s our data set: 1.7(+/-0.1)
Pretty small data set, but let’s work with it.
Here are the possible values: 1.8, 1.7, 1.6 (and all values in between)
We state the mean = 1.7 Obviously, with one datum, it itself is the mean.
What are the other values, the whole range represented by 1.7(+/-0.1)?:
1.8 and every other value to and including 1.6
What is the uncertainty range?: + or – 0.1 or in total, 0.2
How do we write this?: 1.7(+/-0.1)
Example 2:
Here is our new data set: 1.7(+/-0.1) and 1.8(+/-0.1)
Here are the possible values:
1.7 (and its +/-s) 1.8, 1.6
1.8 (and its +/-s) 1.9, 1.7
What’s the mean of the data points? 1.75
What are the other possible values for the mean?
If both data are raised to their highest value +0.1:
1.7 + 0.1 = 1.8
1.8 + 0.1 = 1.9
If both are lowered to their lowest -0.1:
1.7 – 0.1 = 1.6
1.8 – 0.1 = 1.7
What is the mean of the widest spread?
1.9 + 1.6 / 2 = 1.75
What is the mean of the lowest two data?
1.6 + 1.7 / 2 = 1.65
What is the mean of the highest two data:
1.8 + 1.9 / 2 = 1.85
The above give us the range of possible means: 1.65 to 1.85
0.1 above the mean and 0.1 below the mean, a range of 0.2
Of which the mean of the range is: 1.75
Thus, the mean is accurately expressed as 1.75(+/-0.1)
Notice: The Uncertainty Range, +/-0.1, remains after the mean has been determined. It has not been reduced at all, despite doubling the “n” (number of data). This is not a statistical trick, it is elementary arithmetic.
We could do this same example for data sets of three data, then four data, then five data, then five hundred data, and the result would be the same. I have actually done this for up to five data, using a matrix of data, all the pluses and minuses, all the means of the different combinations – and I assure you, it always comes out the same. The uncertainty range, the original measurement accuracy or error, does not reduce or disappear when finding of the mean of a set of data.
I invite you to do this experiment yourself. Try the simpler 3-data example using the data like 1.6, 1.7 and 1.8 ~~ all +/- 0.1s. Make a matrix of the nine +/- values: 1.6, 1.6 + 0.1, 1.6 – 0.1, etc. Figure all the means. You will find a range of means with the highest possible mean 1.8 and the lowest possible mean 1.6 and a median of 1.7, or, in other notation, 1.7(+/-0.1).
Really, do it yourself.
This has nothing to do with the precision of the mean. You can figure a mean to whatever precision you like from as many data points as you like. If your data share a common uncertainty range (original measurement error, a calculated ensemble uncertainty range such as found in HADCRUT4, or determined by whatever method) it will appear in your results exactly the same as the original – in this case, exactly +/- 0.1.
The reason for this is clearly demonstrated in our kindergarten example of 1, 2 and 3-data data sets – it is a result of the actual arithmetical process one must use in finding the mean of data each of which represent a range of values with a common range width*****. No amount of throwing statistical theory at this will change it – it is not a statistical idea, but rather an application of common grade-school arithmetic. The results are a range of possible means, the mean of which we use as “the mean” – it will be the same as the mean of the data points when not taking into account the fact that they are ranges. This range of means is commonly represented with the notation:
Mean_of_the_Data Points(+/- one half of the range)
– in one of our examples, the mean found by averaging the data points is 1.75, the mean of the range of possible means is 1.75, the range is 0.2, one-half of which is 0.1 — thus our mean is represented 1.75(+/-0.1).
If this notation X(+/-y) represents a value with its original measurement error (OME), maximum accuracy of measurement, or any of the other ways of saying that the (+/-y) bit results from the measurement of the metric then X(+/-y) is a range of values and must be treated as such.
Original Measurement Error of the data points in a data set, by whatever name**, is not reduced or diminished by finding the mean of the set – it must be attached to the resulting mean***.
# # # # #
* – To prevent quibbling, I use this definition of “Mean”: Mean (or arithmetic mean) is a type of average. It is computed by adding the values and dividing by the number of values. Average is a synonym for arithmetic mean – which is the value obtained by dividing the sum of a set of quantities by the number of quantities in the set. An example is (3 + 4 + 5) ÷ 3 = 4. The average or mean is 4. http://dictionary.reference.com/help/faq/language/d72.html
** – For example, HADCRUT4 uses the language “the accuracy with which we can measure” the data points.
*** – Also note that any use of the mean in further calculations must acknowledge and account for – both logically and mathematically – that the mean written as “1.7(+/-0.1)” is in reality a range and not a single data point.
**** – The two supporting papers for the Met Office measurement error calculation are:
Colin P. Morice, John J. Kennedy, Nick A. Rayner, and Phil D. Jones
and
J. J. Kennedy , N. A. Rayner, R. O. Smith, D. E. Parker, and M. Saunby
***** – There are more complicated methods for calculating the mean and the range when the ranges of the data (OME ranges) are different from datum to datum. This essay does not cover that case. Note that the HADCRUT4 papers do discuss this somewhat as the OMEs for Land and Sea temps are themselves different.
# # # # #
Author’s Comment Policies: I already know that “everybody” thinks the UK Met Office’s OME is [pick one or more]: way too small, ridiculous, delusional, an intentional fraud, just made up or the result of too many 1960s libations. Repeating that opinion (with endless reasons why) or any of its many incarnations will not further enlighten me nor the other readers here. I have clearly stated that it is the fact that they give it at all and admit to its consequences that I applaud. Also, this is not the place continue your One Man War for Truth in Climate Science (no matter which ‘side’ you are on) – please take that elsewhere.
Please try to keep comments to the main points of this essay –
Met Office’s remarkable admission of “accuracy with which we can measure the global average temperature” and that statement’s implications.
and/or
“Finding the Mean does not Reduce Original Measurement Error”.
I expect a lot of disagreement – this simple fact runs against the tide of “Everybody- Knows Folk Science” and I expect that if admitted to be true it would “invalidate my PhD”, “deny all of science”, or represent some other existential threat to some of our readers.
Basic truths are important – they keep us sane.
I warn commenters against the most common errors: substituting definitions from specialized fields (like “statistics”) for the simple arithmetical concepts used in the essay and/or quoting The Learned as if their words were proofs. I will not respond to comments that appear to be intentionally misunderstanding the essay.
# # # # #
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.

I spent 17 years as a MN Deck/ Navigating Officer, doing weather observations every 6 hours as one of the many VOs for the Met Office . We provided most of the data used by the UEA. CRU for their HADCRUT World Graphs shown at the top of this essay . When I read the notes on the derivation of the graphs and the use of SSTs, I realised they were a waste of space, complete rubbish.
The statement :
“It is not possible to calculate the global average temperature anomaly with perfect accuracy because the underlying data contain measurement errors and because the measurements do not cover the whole globe.”
is , to say the least, laughable.
Recently, I attempted a survey of fellow Vos on a MN web site. I received 42 replies, all confirming that the assumptions for ” corrections ” to SSTs were wrong…..
Reply to The Ol’ Seadog ==> I have spent 1/3 of my adult life at sea. Yes, SSTs before the satellite era [are] at very best random guesses.
Oops — happy fingers day! ….satellite era are at very best ….
This is the error in measurement’s of temperature at a single point ? Then taken as representing the temperature over how much of the planet’s surface, before being aggregated with how many approximations from other points, to produce a figure for the whole surface ?
How likely is that to be within even 1degree of the actual average of global temperatures ?
Perhaps it is better for comparison only , year to year , but only if the selection of actual measurement point doesn’t keep changing. Doesn’t error due to selection & re-selection of measurement points have rather more potential than it is given credit for ?
And, only if the conditions in the area surrounding the measurement station don’t keep changing.
Reply to eddiesharpe ==> In the great big real world? Of course the calculated GAST is dubious at best.
This essay is about a brave move by Met Office UK and the arithmetical realities of finding the mean of a data set made up of numbers that are really ranges, such as 71(+/-0.5).
Why is it that the 1961 – 1990 “global average” (I know meaningless) is used in comaprisons? If we have “global measurements” (LOL) from 1880 to today, why not average those?
Perhaps it has been determined that this time period reflects the perfect temperature that must be maintained to sustain life on Earth.
Or maybe it simply gives them the graphs they desire.
Reply to Patrick ==> I refer to this as the “My Golden Childhood bias” — its that wonderful time when things were perfect and the sun always shone, the birds always sang, people were kind and could be trusted, etc.
I must say that I have a simple question here to address the ‘lack of total global coverage of data’:
‘Can you say with accuracy whether the temperature of the USA (excluding Hawaii and Alaska) was the hottest ever in 2014? Ditto Europe?’
By highlighting what the regional temperatures are, you can present the results as: ‘definitive in the following regions, less definitive in the following regions and lacking in sufficient data to make judgements in the following regions’.
Not only can that be done supremely easily, it removes all this nonsense and pointless argument due to ‘lack of data’. It removes the point of ‘in filling’ and data manipulation and focusses the global climate community on the need to establish and maintain weather measurements in the regions where insufficient data currently exists.
Every farmer needs accurate trends for his/her local region. What’s going on in Australia really isn’t important for growing corn in Iowa, unless there are some pretty clear correlations between the two data sets (that sort of research is also interesting if it generates leading indicators of value).
Why not just say what we DO know, rather than waste time, money and scientific credibiliy trying to put sticking plaster on to achieve something which really isn’t that important in practical terms??
This is why the farmers get their information here, and in degree F.
http://planthardiness.ars.usda.gov/PHZMWeb/InteractiveMap.aspx
And, there is an app for it!
Sorry, there is also one for our British friends, and the info is in degree F and degree C!
http://www.trebrown.com/hrdzone.html
Just make sure you read the 2010-2013 update. The update is important if you are a farmer.
Must be an app for this one also.
Too bad a satellite record didn’t start in 1679, 1779 or 1879 instead of 1979. Or even 1929. It’s likely that 1934 was warmer than 2014 & Arctic sea ice extent less in 1944 than 2014
The former is probable and the later is near certain based on Larsens Northwest Passage transit that year. Essay Northwest Passage.
Surely this is an error?
Thus, the mean is accurately expressed as 1.75(+/-0.1)
if the 2 sets are for the same data point then the new “Range” becomes 1.6 to 1.9, therefore that should now be expressed as
1.75(+/-0.15)
Reply to Osborn ==> The range is for the mean, not the data. You are right for the range of data but the range of the mean is as given.
O/T but Booker has gone to town with the scandal of Arctic temperature adjustments today.
https://notalotofpeopleknowthat.wordpress.com/2015/02/07/booker-the-fiddling-with-temperature-records-in-the-arctic/
Good for Booker. And thank you, Paul for being the inspiration for his writing. But I see that his piece has garnered nearly 1000 comments at time of writing – many from the usual suspects who, no matter what, abuse the man and his writings when many of them (the trolls) are not fit to sharpen his pen nibs.
Some scandal. The Arctic temperature has increased after 1987.
One big part of that scandal is of course that stations have been added after 1987. For 1930-40 the number of stations has gone from about 500 to more than 2000.
Scandal, scandal. They increase the number of stations! They reduce the infilling!
And the number of land-based stations has been reduced from what to what? Hint: A factor of ten?
Harry Passfield says:
“And the number of land-based stations has been reduced from what to what? Hint: A factor of ten?”
Harry Passfield managed to get an fourfold increase of stations 1930-1940 in 1987 vs 2015 to be a reduction. By a factor of ten.
The axiom “Averaging never improves accuracy but may increase precision” can be found in any statistics textbook, and is nicely illustrated in wikipedia..
http://en.wikipedia.org/wiki/Accuracy_and_precision#mediaviewer/File:Accuracy_and_precision.svg
The neatest explanation I have heard is..
The chance of rolling a six with a fair die is 1/6. Rolling the die a thousand times might give an average of 3.5, but this doesn’t alter the chance of rolling a six to 1/3.5. It is still 1/6.
Reply to Max S. ==> Thanks, Max. I wanted to include the precision issue in the original essay (actually had it in and removed it) because my “kindergarten examples” don’t demonstrate that aspect very well.
The British Met office data is absolute #### (see climategate, Climateaudit), as is all of the above as usual. Refer to Steven Goddard, Paul Homewood, Mahorasy, Climatedepot, Climateaudit, ect ect ect….You are giving far, far, far too much attention to fraudsters who should be in jail by now. Same goes for nearly all the current WMO/UN, “Science” publication agents promoting this scam. When will you wake up?
This is utter bull droppings. Utter lunacy. Utter deception.
There is no way at present to find an average temperature of the entire planet to one tenth of a degree of degree Celsius. No way at all. (and that is beside the point that the average temperature is mostly meaningless anyway)
Mark: Well said. The way I look at it is that the GAT (whatever that is) is a propagandists wet dream expression that he needs to get accepted into mainstream thought. An example of this is the 70 mph speed limit: it is only a number. When it came out in the ’60s it was far more dangerous to drive at that speed in the cars of the day (mine topped at 65!) than modern cars. However, safety campaigners will tell you that driving at 80 mph is dangerous because the speed limit is 70 mph. It is meaningless. Like GAT.
Actually, according to my uncle a former professional race car driver and past Safety Officer at Indianapolis Speedway, it is the response time of the average driver that is the issue. He tells me that once you are at 80 mph closing time is faster than human reaction time.
What’s all this averaging of temperatures anyhow ?
How does it relate to anything in the real world. Average propensity to melt perhaps ?
It’s certainly gets less to do with heat content, when changes of state become involved.
They don’t. They average temperature anomalies off a common time frame baseline. That is mathematically legitimate for determining average trends. It does not give an average absolute temperature. Only an average rate of change.
Which has, BTW, been fiddled first by homogenization. See essay When Data Isn’t.
Whilst I appreciate the Met Office using a + or – 0.1 deg C error, I have a problem because whilst the underlying data remains as suspect and modified as it does, all this serves to do is to try and add scientific integrity to their pronouncements:
We are referred to “Quantifying uncertainties in global and regional temperature change using an ensemble of observational estimates: the HadCRUT4 data set.”
wherein it states:
“The land record has now been updated to include many additional station records and (b)re-homogenized station data. This new land air temperature data set is known as CRUTEM4.”
We know that GISS has reduced temperature records for Patagonia, changing a long term (60 year) cooling trend into a warming and it has now just come out that they have done the same in Iceland to get rid of the ‘sea-ice years’ of the 1970s. They have done the same elsewhere removing 1930s and 1940s temperatures which exceeded current ones.
Producing claims based on not just homgenised data but from data which has then again been Re-Homogenised without all changes being detailed and valid argument (and I mean valid as opposed to specious ‘argument’) provided to justify this is nonsensical and cannot be relied upon.
Slightly OT, but dealing with efforts to understand what data are telling you. I spend a lot of time reading census and economic data. When doing so, it important to understand what the reports are about.
You may be reading about financial data in nominal or real terms. Incomes could be expressed as median or mean. That could be skewed significantly when there is income inequality. Also, reports can cover family incomes, household incomes and wages. They all have specific meanings.
When data are presented, I try to understand the subtext of who is writing the report and why is it being written. Climate data and economic data are both subject to this admonition……. everyone has his hustle.
“Does deriving a mean* of a data set reduce the measurement error?
Short Answer: No, it does not.
I am sure some of you will not agree with this.”
We’re not only concerned with ‘measurement error’, i.e. was the temperature at each location correctly recorded, but (assuming the temperatures were accurately recorded) does the average of these measurements represent a true estimate of this abstract called ‘global surface temperature’?
Now it’s obvious that the actual temperature at the “surface” will vary greatly depending on the thermometer’s location, elevation, time of day, season, weather conditions etc. So any set of recorded measurements will tend to vary according to these environmental parameters.
For each thermometer X, the measurements can be viewed as samples of a random variable. So the question should be “what is the expected value of X”? If probability distribution of X is strictly uniform, then the expected value would be the arithmetic mean as you demonstrated.
Some of the error components may be randomly distributed (i.e. uniform) then the expected value of such a component is zero.
But other error components represent bias and are not randomly distributed. Their expected value is the sum of n samples, each weighted by its probability (which for uniformly distribution would be 1/n), which in general will not be zero.
So “averaging” will reduce errors caused by random variance, but will not eliminate errors due to bias.
Also, it is a common fallacy to view a set of unbiased thermometer readings as ‘true’ temperatures. But the readings depend on their environment. If we took another set of “unbiased” readings at different locations and times would we get the same expected values? Of course not, because the total squared error of our estimates has two component bias²+variance (ignoring non-deterministic noise).
So even if we were guaranteed that our thermometers were unbiased, we would still use many random samples to estimate regional and global temperatures because averaging reduces variance component of the estimation error.
Johanus
There cannot be a determinable “error” for a parameter which is not defined. And there is no agreed definition of global temperature.
Each of the teams which determines global temperature uses its own definition which it changes almost every month so its values of global temperature change almost every month. If it were possible to estimate the errors of global temperature reported last month then those estimated errors would change for the values of global temperature reported this month.
Richard
Excellent point Richard. I’m still waiting for one of these organizations to publish the “specifications” page from the Earth’s Owner/Operator Manual that says what the “average” temperature of the planet is supposed to be at this specific point in its 3,500,000,000 year history or how one’s supposed to measure it. For some reason none of them ever answer when I ask “”What’s the temperature supposed to be?” I can’t imaging why…
I’ll disagree slightly Richard.
There ‘is’ an error range for ‘global temperature change’. The fact that the definition is changed willy nilly by the alleged data caretakers does not eliminate the error; but it does make it near impossible to correctly identify and assign the error.
still that error range exists; but remains unknown. Since the temperature caretakers believe they can calculate global temperature change; they should be able to calculate the error; all they need is real expertise in deriving that error (an engineer with a very strong stomach).
A database and/or a data method that changes past results is abuse.
Corrections do not eliminate error ranges; they become a factor for calculating a representative, usually larger, error range.
e.g. Temperatures were transmitted late and missed entry
– Temperature were transcribed or entered incorrectly
– Which means that someone is verifying come individual temperatures
Homogenization adjustments are opinions without absolute data verification. Yes, NOAA, MetO, BOM are guessing when they do mass ‘homogenizations’. The result is garbage for accuracy or science.
Infilling is a spatial fantasy guess with as much validity of a carnival freak sideshow. I believe the main purpose is for ‘make work’.
ATheoK
I dispute that we have a slight disagreement.
I wrote
And you say
It seems to me that we agree.
Richard
Then we are in agreement Richard. Good commentary, as usual.
Reply to the richardscourtney // ATheoK comment thread ==> You two are, unfortunately, probably entirely correct. The other issue is, even if someone was to formulate a scientifically defensible definition of “Global Surface Air Temperature at 2 meters above ground level” AND a scientifically defensible definition of “Global Sea Surface Temperature” — what justification could be offered for reducing both to anomalies, then averaging the anomalies together weighted for surface area? One is a “skin temperature” of a liquid, one is the ambient temperature of a gas at a particular altitude near the ground.
Somewhat like determining Average Room Temperature by some hodge-podge combination of air and floor surface or air and wall surface temperatures.
+1 Johanus; well stated.
Reply to Johanus ==> Let me address this point only –> “For each thermometer X, the measurements can be viewed as samples of a random variable.”
We are not taking random variables, but measurements of actual temperatures. The range +/-0.1°C presented by the Met Office derives (through a lot of complicated steps) from situations like this: The actual real world temperature being measured in Outpost, OK is 70.8°F. Our weather co-op station volunteer accurately records this as 71°F. The obvious 0.2°F difference between actuality and the report is not random variance. It is simply the difference caused by the recording rules, which are “report whole degrees, round up or down as appropriate” — actual temperatures are evenly spread but do not follow a normal distribution centered on thermometer degree marks.
I’m sure you see where this leads.
Kip, you said:
“We are not taking random variables, but measurements of actual temperatures….The actual real world temperature being measured in Outpost, OK is 70.8°F.”
You seem to believe in the “thermometers-show-‘true’-temperature” fallacy. I’m surprised, because you clearly don’t believe the “one-true-global-temperature” fallacy.
‘temperature’ is a mathematical abstraction which cannot be precisely measured without applying mathematical models, such as expansion of mercury in a calibrated tube, or current flowing in a calibrated thermocouple, or thickness of calibrated ice layers etc. These are all ‘proxies’ in the sense that there is no device that measures arbitrary ‘temperature’ directly. And all of these proxies tend to be wrong, more or less. Some are more ‘accurate’ than others, and some are actually useful for the purpose of making us believe we can measure temperatures ‘directly’.
So how can we measure the ‘error’ of some mathematical abstraction that doesn’t really exist, except in our minds?
Good question. It is possible because there are natural ‘calibration points’, such as freezing and boiling points, which are uniquely determined by kinetic energy, which in turn have a precise relationship (more or less) to ‘temperature’, which is defined abstractly as the average kinetic energy at thermodynamic equilibrium. So arbitrary temperatures can be modeled as linear interpolations/extrapolations around these fixed points, and also allows us to estimate temperatures as mathematically continuous values. Temperature ‘errors’ occur whenever a thermometer does not match the value expected from the calibration points (or values interpolated from these points).
So back to your example, let’s say we need to know the answer to the question “What is the current temperature in Outpost, OK?” Some might say it is 70.8F because someone observed that temperature on a thermometer there. So is that really the ‘actual real world’ temperature in Outpost, as you stated?
No. Imagine, for the sake of argument, that your thermometer is surrounded by other thermometers, independently engineered and operated, and spaced at intervals, say on the order of 1 kilometer, but all located within Outpost OK. Will they all produce exactly the same simultaneous values?
Most likely they will not (with this likelihood increasing with the size of Outpost). So, without knowing the ‘true’ temperature (if such exists), how should we interpret these differences?
A statistical approach make most sense here, viewing each thermometer’s reading as a kind of mathematical value which is subject to variations due to physical processes and/or random chance. So, some of the thermometer readings may actually represent true variances in the actual kinetic energy of air molecules (“variance”), while others may have been improperly calibrated or interpolated and thus have a permanent offset (“bias”). Or some thermometers may have simply been misread, which could either be bias (deliberate “finger on the scale”) or variance (purely random transcription error).
So here’s my answer to the question “What is the current temperature in XYZ?”. Assume XYZ is an arbitrary surface location with finite extent in surface area and time, so XYZ’s “temperature” is the expected value estimated from measurements (“samples”) taken at one instant of time (more or less) from one or more thermometers in XYZ which claim to represent the ‘true’ average atmospheric kinetic energy in XYZ (more or less).
We can’t know the ‘true’ kinetic energy temperature Tk at each point, but we can still reason about these values mathematically and represent them as sampled random variables, because the Tk are well posed in physics (mechanical thermodynamics). Let SE be the ‘squared error’ i.e. the sum of the squared differences between the our thermometer readings and the ‘true’ Tk. Then using variance-bias theory we can decompose into three components: SE = bias²+variance+noise
‘bias’ is defined mathematically as the expected difference between the Tk and the samples (thermometer readings). When bias=0 we say the estimator is unbiased. ‘variance’ is the sum of the squared deviations of the samples from the mean sample value, and ‘noise’ is, in effect, the natural uncertainty of Tk itself, which is unknown. (noise is usually modeled as a random Gaussian, because it is convenient to do so. That does not mean it is Gaussian in nature).
So we should be somewhat skeptical about thermometers because their readings are estimates of the ‘true’ kinetic temperature, based on more or less reliable proxies/models. But we can, with some confidence obtained from statistical theory, make useful explanations and predictions about the world around us, using these readings.
… and using the mean of a set of nearby recorded temperatures is a better estimator of Tk than an arbitrary reading within that set.
Reply to Johanus ==> Well, that’s thorough, alright! Thanks for the in-delth analysis.
I’m not sure it helps our co-op weather station volunteer with his doubts about the accuracy of his recorded temperatures when he knows he has to round either up or down. Luckily, he is not charged with the responsibility of determining the real actual temperature of his city or even of his backyard — only with writing in his log the readings he finds his thermometers.
Here, we acknowledge that when he writes “71” in the log that the thermometer showed some reading between 70.5 and 71.5, but what that reading was is lost forever. Thus in the future when some climatologist uses this datum, it must be used as the range it represented when written which is 71(+/-0.5)
Thanks for taking the time to share with us.
I doubt that any current co-op reports are generated from manual observations of glass-bulb thermometers. There are many thousands of co-ops using relatively inexpensive (~$250 and up) stations with automated Internet reporting, such as the ones made by Davis Instruments, which are surprisingly robust and reliable, and mostly operated by dedicated amateur observers:
http://wxqa.com/stations.html
https://www.google.com/search?q=amateur+weather+station+equipment
As for the historical co-op data (Form 1009’s etc) their temperatures were rounded to the nearest degree F, probably because that was the effective intrinsic accuracy of measurements made by eyeball from the old glass-bulb instruments.
http://wattsupwiththat.com/2011/01/22/the-metrology-of-thermometers/
Reply to Johanus ==> The illustrative Co-op Weather Station volunteer is just that – a cartoon used to represent the vast majority of past temperature records, and the one that most easily illustrates the point.
All temperature reports are similar in nature….there is the instrument reading which is rounded off to some agreed upon precision (maybe twice — once inside the instrument itself and once more in the data repository) and represents, in reality, a range of values “71.23(+/-0.005)” or some such.
My kindergarten examples apply to these exactly the same.
Thanks for the links.
An OME error of +/-0.1C becomes rather meaningless when the measurement instruments are located at sites which qualify as CRN4+ on average.
we can say with a good deal of confidence that 2010 was warmer than 1989, or indeed any year prior to 1996
That’s a complete unknown on a human history scale, and a flat-out falsehood on a geological time scale.
The meme that it’s hotter now than “ever before” is deliberate deception. The worst sort of political propaganda.
+1
The geologic record shows so much variation, so much change, at times quite rapid change, the mechanisms are so poorly and incompletely understood, that a claim of “any year prior to 1996” is transparently false on it’s face.
I might be tempted to congratulate the Met Office for it’s candor if instead of admitting to some tiny uncertainty in a mostly meaningless statistics exercise, they would STOP intentionally conflating uncertainty in their subjective statistical map of “global temperature” with true climatic uncertainty.
I know I’m just a dumb old construction worker, but I have a question.
How can one justify an error bar of.+/- 0.1˚C for 2014 when the error bar for 1961 may have been +/-.1.0’C or +/-.50C without a lot of finagling with the 1961 data and introducing more error?
Reply to old construction worker ==> The measurement error (accuracy of measurement) is surely better in 2014 than in 1961, thus we would expect the Uncertainty Range to be smaller in 2014. Remember, they still used human-read glass hi/low thermometers for measurement in 1961 and, in the US, only reported whole degrees F.
Likewise for your field, we except that your floors or ceilings or foundations are more accurately level now that you can used laser-light levels to draw nearly perfectly level lines at the desired ceiling line (for instance) — in my experience, they beat the heck out of bubble string levels or water tube levels.
The uncertainty of ±0.1°C sounds a great deal like the margin of error you can sometimes dig out of NCDC. So, is this the measurement uncertainty or the use of MOE to estimate the standard deviation?
Does climate science ever compare the difference of means using any statistical test? Or, is just eyeballing the rarely included “error bars” good enough?
At least the Met discussed uncertainty instead of proudly announcing that a 0.02°C difference was significant with a margin of error between 0.05°C and 0.1°C.
Reply to Bob Greene ==> I am of the opinion that the Met Office means what is says with their “The accuracy with which we can measure the global average temperature” statement — right or wrong quantitatively, I think they try to portray “measurement accuracy” and not the various statistical constructs.
The journal Nature requires that charts and graphs depicting “error bars” have an explanation of what those marks mean — as the same marks are used for many different concepts — some of them less useful and more confusing than others.
“Measurement accuracy” or the ability to read the temperature in 0.1° increments? Measurement accuracy would be a component of the overall variability in most measurements. Climate science seems to take the most rudimentary error estimates and may or may not use even that. Pretty haphazard and sloppy way to be in the forefront of plotting the global economy.
Reply to Bob Greene ==> Well, of course, you are right when you say “Measurement accuracy would be a component of the overall variability in most measurements.”
Met Office UK has based their rather rule-of-thumb-like 0.1°C on two complex papers, links for which are given at the end of the original essay above. These papers to find a quantitative answer to the very very complex question.
I’m really surprised your example [1.7(+/-0.1)] really isn’t expressed as 1.8(-0.2) in keeping with today’s way of expressing global temperature
What is the point of discussing an accuracy of +/- 0.1DegC when the data is being manipulated by more than 1DegC in the first place? Makes a mockery of claiming to measure to that degree of accuracy.
http://www.telegraph.co.uk/news/earth/environment/globalwarming/11395516/The-fiddling-with-temperature-data-is-the-biggest-science-scandal-ever.html
So – it’s back to bold political statements and weather reactionism for 2015. “I Got You, Babe” is still playing on the global temperature radio station this morning. OK, Campers. Rise and shine. It’s cold out there. But yes, thanks to Met for going all “publical” with their “scientifical” honesty. Strange days when we have to give attaboys to government scientists for doing what they’re rudimentarily expected to do.
Hey Coach ==> You give your boys an attaboy when they do better…so do I.
I am not so impressed with the idea that they know the accuracy to 0.1 C at all. They may have the measure of a particular station even more accurate than the 0.1C at any time. The question though is whether that translates into a valid meaning and error level for a 5×5 deg grid used for the grand average.
I commented (Dr Spencer’s site) to a Cowtan and Way reference (5 Jan, 2015) and repeat here:
“Apropos to your question, why not take a look at two T’s I have experienced today. I live in Perth, Australia. One Perth station hit just over 43C today max. Another in a suburb called Swanbourne peaked at just over 42 but has been 10C lower than the Perth station.
They are about 10kms apart.!! It is 19:30 WST now and the difference is still 7C. So what is the temp for this small area????? Yet you insist on incestuous 1200km infilling as being acceptable methodology. Nonsense!
Check for yourself; over the whole day
http://www.weatherzone.com.au/station.jsp?lt=site&lc=9225&list=ob
http://www.weatherzone.com.au/station.jsp?lt=site&lc=9215&list=ob ”
NB:
1. these two references will lead you to the latest day’s hourly readings for 24 hours so will not show the Jan 5, 2015 readings. You may need to go to the original BOM data.
2. Perth has a number of stations within a 15 km radius. Although not as dramatic as that Jan day there are significant variations which would swamp any 0.1C idea of error level. I’m bemused that anyone could suggest a T for any day for Perth which has some real meaning and could be considered accurate to 0.1C.
Reply to tonyM == > In-flling and homogenization are very problematic….
So are simple things such as finding a daily mean …. use the Hi/Low? Use the hourly readings? Use the minute-by-minute readings? All different answers by huge amounts.
Watch for my post on this topic — but please don’t hold your breath.
Addition to above ==> Don’t forget the infamous Time of Observation….
One trick is to admit a small crime to hide a bigger one.
Has not the big problem always been the adjustments to the raw data?
OK, give them credit for admitting to the existence of the mouse but when will they admit to the existence of the elephant?
Eugene WR Gallun
+1 Eugene!
Kip,
I liked the article but can see that some have decided that data from thermometers located near the surface will never be trusted………..unless, maybe if they showed a cooling trend.
Satellite temperatures were cooler and did not show 2014 as the hottest year, which may be part of this.There are numerous issues for thermometer data and complicating factors(for satellites too) but I think this article addresses one of those issues well.
Regardless off whether you agree or not, even the warmest data is coming in cooler than global climate model projections and show no dangerous warming.
As soon as the hysterical New York Times front page article cam out a few weeks ago, I posted the following on my blog — https://luysii.wordpress.com which essentially said the same thing
The New York Times and NOAA flunk Chem 101
As soon as budding freshman chemists get into their first lab they are taught about significant figures. Thus 3/7 = .4 (not .428571 which is true numerically but not experimentally) Data should never be numerically reported with more significant figures than given by the actual measurement.
This brings us to yesterday’s front page story (with the map colored in red) “2014 Breaks Heat Record, Challenging Global Warming Skeptics“. Well it did if you believe that a .02 degree centigrade difference in global mean temperature is significant. The inconvenient fact that the change was this small was not mentioned until the 10th paragraph. It was also noted there that .02 C is within experimental error. Do you have a thermometer that measures temperatures that exactly? Most don’t, and I doubt that NOAA does either. Amusingly, the eastern USA was the one area which didn’t show the rise. Do you think that measurements here are less accurate than in Africa, South America Eurasia? Could it be the other way around?
It is far more correct to say that Global warming has essentially stopped for the past 14 years, as mean global temperature has been basically the same during that time. This is not to say that we aren’t in a warm spell. Global warming skeptics (myself included) are not saying that CO2 isn’t a greenhouse gas, and they are not denying that it has been warm. However, I am extremely skeptical of models predicting a steady rise in temperature that have failed to predict the current decade and a half stasis in global mean temperature. Why should such models be trusted to predict the future when they haven’t successfully predicted the present.
It reminds me of the central dogma of molecular biology years ago “DNA makes RNA makes Protein”, and the statements that man and chimpanzee would be regarded as the same species given the similarity of their proteins. We were far from knowing all the players in the cell and the organism back then, and we may be equally far from knowing all the climate players and how they interact now.
I fully appreciate Kip Hansen’s point that the Met is trying to craft a meaningful statement about their statistics. I still feel that they have missed the mark: The accuracy with which we can measure the global average temperature of 2010 is around one tenth of a degree Celsius. One must realize that “accuracy” relates solely to the ability to find the actual mean temperature, in the same way an accurate archer hits the bull’s-eye. The sparse global sampling of temperature makes it impossible to know the global mean temperature to +/-0.1deg C. Though the Met acknowledges this sampling problem, they should instead have said: “The accuracy with which we measure temperature at [however many] temperature stations distributed around the globe is ….” This is important because that single average temperature they publish can change as new stations are added or as existing stations are removed, regardless of the actual change in global temperature. I suppose someone has the data of all individual stations, and could show how the changes in sampling density over time have altered the mean. Of course then the heat-island effects and urbanization effects would quite possibly exacerbate the global mean temperature rise, and complicate such ananlysis.
Precision of measurement is quite different from accuracy, and rather speaks to the ability to reliably reproduce a measurement (or process, such as an archer making additional shots in sequence). In a global sampling scheme, each of the measurement stations has its own characteristic precision. Climatologists are wont to treat this as a random variable, a risky proposition with their scientific credibility at stake. Of course scientific thermometers have a quoted precision, but they are being used in a measurement process, not a in controlled standards laboratory.
Here is a more mechanical example: Consider a wind gauge of the type with spinning hemisphere cups. In a lab, the device would be tested with a known flow rate of dry air at several different flow points. The the precision would be calculated and reported with the instrument. But in use, the instrument is measuring a continuous variable. The inertia of the spinning component may cause the instrument to indicate faster wind speeds during periods of rapid wind subsidence. This means that there will be a bias in the measurements that is not reflected in the advertised precision, and treating the measurements as a random variable is not correct. This might be a very small effect, but would make it impossible to quantify minute differences in daily mean wind speed.
Thanks to Mr. Hansen, we now have excellent explanation of precision of the means, and its significance on “global” temperature records.
Excellent point, I constantly tell my Project Managers and Sales types that they cannot count on the specifications listed in the brochure for a piece of gear to be realized in the field. They are merely starting points measured from a known set of conditions… i.e an “ideal” lab.
Reply to Matt ==> Read the HADCRUT4 paper — interestingly, they run a 100-ensemble error/accuracy model in their determination of the 0.1°C uncertainty range estimate.