Guest Post by Kip Hansen
This post does not attempt to answer questions – instead, it asks them. I hope to draw on the expertise and training of the readers here, many of whom are climate scientists, both professional and amateur, statisticians, researchers in various scientific and medical fields, engineers and many other highly trained and educated professions.
The NY Times, and thousands of other news outlets, covered both the loud proclamations that 2014 was “the warmest year ever” and the denouncements of those proclamations. Some, like the NY Times Opinion blog, Dot Earth, unashamedly covered both.
Dr. David Whitehouse, via The GWPF, counters in his post at WUWT – UK Met Office says 2014 was NOT the hottest year ever due to ‘uncertainty ranges’ of the data — with the information from the UK Met Office:
“The HadCRUT4 dataset (compiled by the Met Office and the University of East Anglia’s Climatic Research Unit) shows last year was 0.56C (±0.1C*) above the long-term (1961-1990) average. Nominally this ranks 2014 as the joint warmest year in the record, tied with 2010, but the uncertainty ranges mean it’s not possible to definitively say which of several recent years was the warmest.” And at the bottom of the page: “*0.1° C is the 95% uncertainty range.”
The David Whitehouse essay included this image – HADCRUT4 Annual Averages with bars representing the +/-0.1°C uncertainty range:
The journal Nature has long had a policy of insisting that papers containing figures with error bars describe what the error bars represent, I thought it would be good in this case to see exactly what the Met Office means by “uncertainty range”.
In its FAQ, the Met Office says:
“It is not possible to calculate the global average temperature anomaly with perfect accuracy because the underlying data contain measurement errors and because the measurements do not cover the whole globe. However, it is possible to quantify the accuracy with which we can measure the global temperature and that forms an important part of the creation of the HadCRUT4 data set. The accuracy with which we can measure the global average temperature of 2010 is around one tenth of a degree Celsius. The difference between the median estimates for 1998 and 2010 is around one hundredth of a degree, which is much less than the accuracy with which either value can be calculated. This means that we can’t know for certain – based on this information alone – which was warmer. However, the difference between 2010 and 1989 is around four tenths of a degree, so we can say with a good deal of confidence that 2010 was warmer than 1989, or indeed any year prior to 1996.” (emphasis mine)
I applaud the Met Office for its openness and frankness in this simple statement.
Now, to the question, which derives from this illustration:
(Right-click on the image and select “View Image” if you need to see more clearly.)
This graph is created from data directly from the UK Met Office, “untouched by human hands” (no numbers were hand-copied, re-typed, rounded-off, krigged, or otherwise modified). I have greyed-out the CRUTEM4 land-only values, leaving them barely visible for reference. Links to the publically available datasets are given on the graph. I have added some text and two graphic elements:
a. In light blue, Uncertain Range bars for the 2014 value, extending back over the whole time period.
b. A ribbon of light peachy yellow, the width of the Uncertainty Range for this metric, overlaid in such a way as to cover the maximum number of values on the graph.
Here is the question:
What does this illustration mean scientifically?
More precisely — If the numbers were in your specialty – engineering, medicine, geology, chemistry, statistics, mathematics, physics – and were results of a series of measurements over time, what would it mean to you that:
a. Eleven of the 18 mean values lie within the Uncertainty Range bars of the most current mean value, 2014?
b. All but three values (1996, 1999, 2000) can be overlaid by a ribbon the width of the Uncertainty Range for the metric being measured?
Let’s have answers and observations from as many different fields of endeavor as possible.
# # # # #
Authors Comment Policy: I have no vested opinion on this matter – and no particular expertise myself. (Oh, I do have an opinion, but it is not very well informed.) I’d like to hear yours, particularly those with research experience in other fields.
This is not a discussion of “Was 2014 the warmest year?” or any of its derivatives. Simple repetitions of the various Articles of Faith from either of the two opposing Churches of Global Warming (for and against) will not add much to this discussion and are best left for elsewhere.
As Judith Curry would say: This is a technical thread — it is meant to be a discussion about scientific methods of recognizing what uncertainty ranges, error bars, and CIs can and do tell us about the results of research. Please try to restrict your comments to this issue, thank you.
# # # # #


I believe they are overconfident in asserting a 0.1C accuracy for recent annual global temperature anomalies. My guess would be at least 0.2C to 0.3C and possibly as much as 0.5C or more in recent years and possibly as much as 1.0C or more for the oldest years in the data set. I suspect the largest sources of uncertainty are from very poor spatial coverage, representativeness of measurements, changes in station locations, and “homogenization” that may add uncertainty rather than reducing it. Siting is critical for representative measurements and the USCRN is helping to address this problem, but only for a very small portion of the globe.
“””””…..“The HadCRUT4 dataset (compiled by the Met Office and the University of East Anglia’s Climatic Research Unit) shows last year was 0.56C (±0.1C*) above the long-term (1961-1990) average……”””””
Well I have a number of problems with this statement.
To begin with; the base period of 1961-1990. This conveniently includes that period in the 1970s when the climate crisis was the impending ice age, with wild suggestions to salt the arctic ice with black soot to fend off the ice age. Global starvation was predicted by the very same bunch of control freaks who are now trying to stop the impending global frying.
Well also half of theta base period comes before the age of satellite data gathering which I believe started circa 1979, which is nearly coincident with the launching of the first oceanic buoys that were able to make simultaneous ocean near surface (-1m) water Temperatures, and near surface (+3m) oceanic air Temperatures, circa 1980. In 2001 this buy data (for about 20 years) showed water and air Temperatures were not the same and were not correlated. Why would anyone even imagine they would be either of those things.
So I don’t believe any “global” climate Temperatures prior to 1980. And why stop the comparison date 25 years before the present.
Why not use the average of ALL of the credible data you have. Otherwise the base period numbers are just rampant cherry picking.
So I don’t give any credibility to any HADCRUD prior to 1980, or anything they might deduce later on referenced to that early ocean rubbish data.
And finally I don’t think any of their sampling strategies are legitimate, being quite contrary to sampled data theory that is well established. (you wouldn’t be able to be reading this if it wasn’t).
G
[snip – Epiphron Elpis is yet another David Appell sockpuppet.]
Reply to oz4caster ==> I think that the Uncertainty Range of 0.1°C may be too small even for most recent measurements. But for this discussion, I will let that slip by — it is a near miracle that they admit such an Uncertainty Range at all.
I am working (longer term) on a piece that explores actual Original Measurement Error in world temps over time, and what that may mean for the Global Averages.
For instance, I understand that BEST’s krigging results are maximally accurate only to 0.49°C.
Good luck with the study.
SWAG: Pre 1995 +/- 1.0 Deg C, Post 1995 (Land Sites) +/- 0.5 Deg C and Satellite +/-0.25 Deg C.
And we are having a “serious” discussion of an incremental increase of 0.15 Deg C for 2014 and an increase of 0.45 Deg C over a spread of 36 years while the IPCC models are predicting what?
http://phys.org/news/2015-01-peer-reviewed-pocket-calculator-climate-exposes-errors.html
Well, you belong to a Church. Fine with me.
Just don’t ask scientists to play Church games.
Yes some scientists to go to church, in fact many scientists were ministers, monks, priests etc. but they respected separation of church and science.
A related item:
IPCC says the climate may have cooled since 1998!
Here is the rational:
The claimed error is +/- 0.1 degree. But the warming is only 0.05 degree per decade, so the actual warming is between -0.05 (cooling) to +0.15 degree/decade.
In other words since 1998, the climate may have cooled by 0.05 degree/decade or warmed by 0.15 degree, or anything between those two limits.
Here is how te IPCC stated it:
“Due to this natural variability, trends based on short records are very sensitive to the beginning and end dates and do not in general reflect long-term climate trends. As one example, the rate of warming over the past 15 years (1998–2012; 0.05 [–0.05 to 0.15] C per decade), which begins with a strong El Niño, is smaller than the rate calculated since 1951 (1951–2012; 0.12 [0.08 to 0.14] C per decade). {1.1.1, Box 1.1} “
from pg 5 of : https://www.ipcc.ch/…/asse…/ar5/syr/SYR_AR5_SPMcorr1.pdf
3 hrs
(Of course I don’t believe that +/-0.1 degree, but this is about using their numbers.)
Will you take the analysis of someone who deals in world commodity markets? The price has peaked, get out now.
Reply to Joseph Murphy ==> Astute, thank you! (I don’t have a strong position in AGW — so I think I’m safe.)
based on your recommendations, I’m writing puts on CAGW
The first thing I would ask about such a graph is if the computed average is even relevant. Since temperature does not vary linearly with power (w/m2) it is possible to arrive at different spatial temperature distributions that have identical average temperatures, but very different energy balances. For example, two points with temperatures of 280K and 320K would have an average temperature of 300K and an equilibrium radiance of 471.5 w/m2. But two points each at 300K would also have an average temperature of 300K, but an equilibrium radiance of 459.3 w/m2.
So, with that in mind, the error bars not only render any conclusion about temperature trend being positive or not meaningless, the error range of the equilibrium energy balance is much larger due to the non linear relationship between the two. Since AGW is founded upon the premise that increasing CO2 changes the energy balance of the earth, attempting to quantify the manner in which it does so by averaging a parameter that has no direct relationship to energy balance renders the graph itself meaningless in terms of statistical accuracy and physics as well.
“The first thing I would ask about such a graph is if the computed average is even relevant.”
In point of fact it is not actually an average of temperatures at all.
Although most people who follow the climate debates dont get this ( in fact most guys who produce these averages dont get it )
What is the global temperature average if it is not really an average.
mathematically, it is a prediction. It is a prediction of what you would measure at unvisted locations.
“Station observations are commonly used to predict climatic variables on raster grids (unvisited locations), where the statistical term “prediction” is used here to refer to“spatial interpolation” or “spatio-temporal interpolation” and should not be confused with “forecasting.” In-depth reviews of interpolation methods used in meteorology and climatology have recently been presented by Price et al. [2000], Jarvis and Stuart [2001], Tveito et al. [2006], and Stahl et al. [2006]. The literature shows that the most common interpolation techniques used in meteorology and climatology are as follows: nearest neighbor methods, splines, regression, and kriging, but also neural networks and machine learning techniques.”
Spatio-temporal interpolation of daily temperatures for global land areas at 1 km resolution
Milan Kilibarda1,*, Tomislav Hengl2, Gerard B. M. Heuvelink3, Benedikt Gräler4, Edzer Pebesma4, Melita Perčec Tadić5 andBranislav Bajat1
So when you read that the global average for dec 2014 is 15.34 C, That means the following.
if you randomly sample the globe with a perfect thermometer, an estimate of 15.34 will minimize your error.
pick 1000 random places where you don’t have a thermometer. the prediction of 15.34 minimizes the error.
Steven,
Thank you for that description. To be as fair as possible to BEST, it seemed they offered a reasonable evaluation of their analysis of 2014 being one the 5 warmest with some confidence. Kip Hansen asked that we stay on the discussion of the “uncertainty ranges, etc.” so to honor that as fully as possible would you be willing to discuss how the decision comes about to title the work product as” The Average Temperature of 2014 from Berkeley Earth” if indeed it’s predictive in nature and not inteded to be what the title suggests? In other words, why not title it The Predictive Average……….http://static.berkeleyearth.org/memos/Global-Warming-2014-Berkeley-Earth-Newsletter.pdf?/2014
MET seemed to provide a more reasonable description of their work and confidence levels. NOAA and NASA not so much when taking even the most cursory view of their confidence levels leading to either a clear plan to mislead when compared with their headlines, or to provide AGW propaganda while lacking good scientific commentary.
Back before the new math, it used to be the case that the arithmetic mean was also a least squares best estimate, which would minimize randomly distributed errors. You may believe that methods that involve kriging, neural networks and machine learning techniques will produce a result that minimizes errors, but don’t expect me to believe that crap.
In point of fact it is not actually an average of temperatures at all.
Although most people who follow the climate debates dont get this ( in fact most guys who produce these averages dont get it )
I understand and agree to a certain extent with your point, though I find your assertion that even the guys who do the calculations don’t understand what it is they are calculating kind of amusing. That said, your point doesn’t change mine. My point is not about how you calculate an average for a given point in time, but what the change in that average implies. Call it an average, call it a prediction of a randomized measurement, as that measurement changes over time, due to the non linear relationship between temperature and power, the computed change is even less meaningful than the error bars would suggest. The change in the value cannot represent the change in energy balance because simple physics requires that cold temperature regimes (night, winter, high latitude, high altitude) are over represented and high temperature regimes (day, summer, low latitude, low altitude) are under represented.
The raw value of the prediction as you have illustrated it is one thing, the change in that value another thing. That change isn’t directly related to the metric of interest (change in energy balance), no matter how you define it.
So lets take Antarctica as a place where we have every sparse data. You are claiming that an average temperature of 15.34C minimizes the estimation error. Highly unlikely particularly in the Antarctic winter. It is absurd to suggest that the temperatures at the South Pole and Death Valley can be considered as random numbers drawn from the same distribution with the same mean and the same variances. This is why people work with changes in temperature and not the actual temperatures themselves.
@bones : The fact is that using data infilling through the aforementioned approaches is not really new math. These approaches have been used for a considerable length of time. However, the problem here is that many people apply them without understanding the implications. Many of these approaches CANNOT be shown to minimize error – except under very specific circumstances. For example, ordinary Kriging is only an unbiased estimator if the process is stationary. What is worse is that there really is no justification for treating climate data as a stochastic process at all.
In other disciplines, when we use these techniques for data analysis, we provide examples where they work and leave it to the user to decide about its appropriateness for their problem. But we make NO claims about the optimality of the approach. Because we know that we cannot do so. However, I have read FAR too many climate science papers that use advanced approaches solely for the purpose of justifying a claim about the underlying PROCESS that generated the data! If few claims can be made about the statistical characteristics of the DATA itself when using these approaches, almost nothing can be said about the PROCESS from the data.
@ur momisugly Steven Mosher & davidmhoffer
Just trying to get a handle on your points and the difference between them.
Steven Mosher – what you are saying is similar to a case where you had two stations one in the tropics averaging 30° and one near the poles averaging 0°. Then the best estimate of the global average temperature would be 15° as that would minimise the error between the measurements?
davidmhoffer – From an energy balance point of view, the best estimate for global average temperature would be the temperature of a sphere at uniform temperature having the same net energy balance?
On that basis, sampling theory will return a more realistic value than the current practice of adjusting stations to appear static.
Simply assume that every station reading is one-off. That the station itself may move or otherwise change between one reading and the next, and any attempt at adjustments to create a continuous station record will simply introduce unknown error.
There is no need. Since you are predicting the value at unknown points, a sample based on known points will suffice, while eliminating the possibility in introduced errors. All that is required is a sampling algorithm that matches the spacial and temporal distribution of the earth’s surface.
All that is required is a sampling algorithm
===========
Pollsters would be turning in their grave if we did sampling the way climate science does. In effect climate science takes individual people, and instead of sampling them, tries to build a continuous record of their views over time. Every time their views jump sharply, they assume the person has moved, changed jobs, etc, so they adjust the persons views. Then they add all these adjusted views together to predict who will win the next election.
Reply to Steve Mosher ==> Does BEST have an estimate of what the expected error is that is being minimized? Is it +/- 0.1°C? +/- 1°C? more? less?
ferdberple
February 2, 2015 at 7:08 am
Fred,
I went about this in a different way, I use the stations previous day’s reading as the baseline for creating an anomaly. Then I look at the rate of change at that station in small to large areas. No infilling, no homogenizing other than averaging the rate of change for an area. For Annual averages I make sure an included station has data for most of the year.
You can read about it here
http://www.science20.com/virtual_worlds
code and lots of surface data
https://sourceforge.net/projects/gsod-rpts/files/Reports/
Hi Stephen,
Lots of doubts have been expressed about the ‘infilling’ algorithms, or as you express it predictions for a location that doesn’t exist.
I have no idea whether this is even possible but couldn’t you randomly exclude locations and see how close the predictions are to measurements for those locations?
Would that not also feed into your algorithms to improve the predictive skill?
This is an honest question with no hidden agenda, I am genuinely curious.
I agree with you David. Averaging a quantity that is non linearly related to anything else is just asking for trouble.
And the error in this case acts to reduce the calculated rate of radiation from earth’s surface from what it really is, thus bolstering the notion of warming. The hottest tropical deserts in summer daytime Post noon radiate at about 12 times the rate for the coldest spots on earth, at their coldest. Cold places do very little to cool the earth.
g
Reply to GES ==> While I agree fully with the idea that “Averaging a quantity that is non linearly” produced is asking for trouble, I am try to get viewpoints on what a series of measurements over time that can all be covered with the Uncertainty Range for the metric means scientifically.
What do you think on that issue?
David, I raised a similar point on the Talkshop recently:
““But in the climate rising temperature restores the balance between incoming and outgoing radiation. Warming acts against the feedbacks. It damps them down.”
Do you mean the T^4 in SB laws? I really wonder whether the climate models take into account the enhanced T^4 effect during the SH summer. I think they use the 1/R^2 sun distance to adjust the 1AU value for TSI, but this higher power input that raises the SH summer temps then has a knock-on effect on the rate of outward radiation. Although the increase in outward radiation is fairly close to being linear for small increments in temp (1 degree more for SH summer than NH summer?) it isn’t exactly linear, especially being a fourth power.
If this isn’t being accounted for then it’s where some of the missing heat is going- out into space during the SH summer.”
MSimon replies to this with a few additional thoughts a few comments later.
According to the Met Office:
“It is not possible to calculate the global average temperature anomaly with perfect accuracy because the underlying data contain measurement errors and because the measurements do not cover the whole globe…”
I would further qualify that by suggesting that there can be no error because there is no real entity to be measured. There is no such thing as a real temperature anomaly, no measuring station real temperature anomaly, and certainly no real global average temperature anomaly to be measured. The ” temperature anomaly” is a comparison to a modelled past baseline temperature at each location- it is a convenient fiction.
So we need to establish what exactly is the REAL parameter before we can estimate uncertainties.
It seems to me that they expect a meaningful average based on poorly collected and therefore meaningless data. I think they are saying the correct answer can be found by taking the average of the incorrect answers. Sounds more like Voodoo than science.
I read somewhere the the measurement error of surface thermometers used in the NOAA sites was about +/ – 0.1 degrees C. I don’t understand how HADCRUT could not have a greater error range. What am I missing?
Bob, taking multiple measurements tends to reduce error. Think of a single weather station. At any given moment the instantaneous temperature measurement will be off +/ – 0.1 degrees, but those errors can generally be expected to fall within a normal distribution unless the damn thing is situated next to an air conditioner. The more measurements recorded by that single station throughout the day, the lower the expected error when computing the daily mean. Even if the station only reports a min and a max on a daily basis, one can still roll the median values up to a monthly mean with a lower expected error than the instrument itself is capable of.
When it comes to doing annual means for the entire globe, sample size still helps, but it’s not as clear cut as the single-station case, not least because number of samples and spatial distribution isn’t constant throughout the entire record. So that’s the interesting and … difficult … part of this discussion. While an individual data product might claim a given error range in the hundredths of degrees, it’s child’s play to compare two different products and find short term discrepancies in the tenths. Frex, the largest annual discrepancy between HADCRUT4 and GISTemp is 0.13 degrees and the standard deviation of the annual residuals from 1880-2014 is 0.05.
Good explanation. A simpler one that gets the point across about averaging errors is rolling a dice.
We know that the average value when you throw a dice is 3.5 – however the “reading” from the dice could be anywhere between 1 and 6 so the “error” on an individual “reading” (throw of the dice) is +/- 2.5. However the more times you throw the dice and average the “readings” the closer the result will be to the true “reading” of 3.5. In other words the more times you sample the more the random errors cancel each other out. The “average of the errors” with an infinite number of throws will be zero. The more throws of the dice the lower “average of the errors” so the more confidence you can have in the result.
So if you are using the same thermometer with the same inherent random error of +/-0.1c to measure the temperature in a 100 different places, the chances of all the measurements being +0.1c (throwing a six one hundred times) is extremely low. The random errors are “throws of the dice” and when averaged over all 100 instruments those errors will cancel out and tend towards zero.
This only applies to multiple separate thermometers. An individual thermometer may have a systemic error meaning all its readings are +0.1c .-This is another reason to use temperature anomalies which dispense with the absolute temperature and just measure the change.
The misleading bit in all the press hype is that it is not a “global average temperature” at all, it is a global average change in temperature. How useful it is to average a change from -25c to -24c in the Arctic and a change from +24c to +25c in the tropics I will leave others to judge!
TLM, you can’t average the value of a categoric (or nominal) variable. Your example is invalid.
Reply to Brandon Gates ==> Averaging (getting a mean) only reduces original measurement error for multiple measurement of the same thing at the same time. 100 thermometers in my yard, polled at exactly noon, with the results averaged will give me a more accurate temperature.
Creating “means” for multiple measurements of a thing at different times does not reduce Original Measurement Error, it only disguises it. One still has to deal with the Original Measurement error itself in the end. This is another of my personal projects — it will be about as popular as my “Trends do not predict future values” post — in other words, viciously attacked by all comers, despite being true.
“The more measurements recorded by that single station throughout the day, the lower the expected error when computing the daily mean. Even if the station only reports a min and a max on a daily basis, one can still roll the median values up to a monthly mean with a lower expected error than the instrument itself is capable of.”
Nope! This statistical operation does not lower the error of the instrument itself. That error is a physical characteristic of the instrument. Statistical operations performed on data collected from an instrument do not reach back through time and space to correct physical error sources in that instrument!
Those statistical operation merely improve the PRECISION of the measurement. Precision and accuracy are independent characteristics. Precision refers to how fine the unit division are that we are able to record. Accuracy refers to how close those unit divisions are to true. Performing averaging on data value to reduce apparent noise in data collected over time from a thermometer or other measurement instrument can improve its precision. However, that improvement is only true and useful if the character of the noise is well understood and that improvement has been validated to achieve a correct value.
Regardless of how accurate and precise we can make current temperature observations, our century old observation accuracy remains no better than about 1 degrees Celsius. When the starting point on your trend line has an accuracy of plus or minus 1 degree Celsius, claiming the an accuracy for the slope of that trend better than 1 degree is bogus.
Kip Hansen,
As TLM already pointed out, the ultimate goal of this exercise is to arrive at a gridded mean anomaly product on monthly and annual time frames. Not high noon on June 20th, 2014 in Topeka Kansas. One needn’t homogenize for the answer to the latter, just look it up. Thing is, that tells you butkus about what 30+ year global trends are doing. Climate and weather are apples and oranges in much the same way that precision of a single measurement is a completely different animal from error estimates for tens of thousands of observations.
Of course not. The only thing which reduces original measurment error is better instrumentation. Since we can’t go back and do it over with the latest in high-precision thermometers, we’re pretty much stuck. None of that changes the reality that more samples leads to better estimates.
Ya. Error bars. They get smaller as n gets bigger. That’s why weather station data tell you how many observations were used to calculate the daily summary statistics.
GaryW,
No kidding. And for cripes sake, the local real temperature often fluctuates from minute to minute more than the precision of the damn instrument. Modern ones anyway. Best we normally get is what, hourly data? Do we need to go to picoseconds to keep you guys happy? It’ll cost you.
See again: HADCRUT4 is not intended to tell you how cold it was last night in Mankato, Minnesota to plus or minus a gnat’s nose hair. Best it will do is give you an estimate for a grid square of the monthly min, max and mean. Since the subject of this thread is the global mean anomaly, you need to be thinking about the number of grids, the number of thermometers in each grid, the number of days in a month, and the number of hours in a day. Think law of large numbers, and again review the concept that climate is the statistics of weather over decades lengths of time, not how warm it was three point two oh six seconds ago to three decimal places.
Gates says:
…not how warm it was three point two oh six seconds ago to three decimal places.
But that is exactly the kind of argument we always see from the warmist side. Even if you accept the astounding accuracy claimed, which records global T to within tenths and hundreths of a degree, the planet’s temperature has fluctuated by only 0.7º – 0.8ºC over the past century and a half.
That is nothing! Skeptics are constantly amazed that such a big deal is made over such a tiny wiggle.
Reply tyo Brandon Gates ==> “Error bars. They get smaller as n gets bigger.” That is true only in statistics and for Confidence Intervals. Original Measurement Error can not be reduced by division or averaging when the measurement is of different things at different times. If OME is +/- 1°C for the individual measurements (of different things at different times) then in the end, you have your metric mean +/- 1°C. You can’t make it go away through arithmetic.
The Met Office is surprisingly candid on this point, admitting forthrightly:
This is what they consider the Original Measurement Error, NOT a statistical Confidence Interval, but a statement of accuracy of measurement. They state it as a Maximum Measurement Accuracy which is the obverse side of the Original Measurement Error coin. Even the entire global data set of two entirely different metrics (Global Air Temperature at 2 meters) and Global Sea Surface Temperature can not erase the Original Measurement Error.
Kip Hansen commented
This is a question I’ve had for a long time. NCDC’s GSoD data set is said to be to 1 dp, so 70.1F +/- 0.1F for instance was yesterday’s temp. On the same station today, it’s 71.1F (+/- 0.1F). But, if I compare them it’s 70.1 (70.0-70.2) – 71.1 (71.0-71.2) so the difference is 1 +/- 0.2 because they add, right?
But what happens if I do the same for a 3rd day at that same station, 72.1F (+/-0.1), when calculating the difference AB and then BC, B can’t both simultaneously be both +0.1 and -0.1, ie AC has to maintain X +/- 0.2 no matter how many subsequent measurements we string, as long as they are continuous, correct?
I believe when you average anomalies that are base lined against another average of the same measurements with +/-0.1 precision (do I have this correct, precision as apposed to accuracy)?
So, when I average a large number of the differences as I described above together, doesn’t my precision increase? But to what (honestly I’m not sure)?
But I also don’t think my accuracy has increased beyond +/-0.1 at best, if it’s really not +/-0.2 or worse.
I commented:
I didn’t finish this, got side tracked.
When you calculate the anomaly on a baseline of averaged measurements, don’t the errors add?
So today’s 71.1 +/-0.1 is compared to the 30 year average of measurements collected to 1 dp, so wouldn’t each anomaly be X.x +/- 0.2?
dbstealey,
That’s exactly why this warmie cringes when “the hottest year evah” [1] is uttered by his fellows.
I think techncially we’re talking about precision, as has already been pointed out. Especially since we’re dealing with anomaly calculations which moots what any individual absolute readings are.
Considering that 3-4 degrees lower and we’d be in the neighborhood of an ice age again, 0.8 C is up to a quarter the way there … in the opposite direction. The Eemian interglacial was 2 degrees higher than the Holocene. Sea levels were some 6-8 meters higher. 0.8 C is a puny 40% of 2 degrees. Not even worthy of being called a pimple on a midget’s bottom.
Alarmunists [2] note with amusement that contrarians are fascinated with the puny 0.25 degree discrepancy between CMIP5 and observations. What is The Pause if but a tiny wiggle in the grand scheme of things?
We could have endless amounts of mirth discussing who’s trying to have it both ways here, yes?
————————
[1] 38% chance according to one press release. I’ve already forgotten which. GISS I think.
[2] We weren’t supposed to talk religion on this thread according to its author. Yeah right, like that was ever going to happen.
Reply to Mi Cro ==> On additive errors in subsequent measurements. 70.1 (70.0-70.2) – 71.1 (71.0-71.2)
Today 71.1 (+/- 0.1) Yesterday 70.1 (+/- 0.1)
Range Range
71.2 — 71.0 70.2 — 70.0
Averages UP Average Average of DOWN
70.7 70.6 70.5
70.6 (+/- 0.1)
Average the actual measurements, average the ups (maximum if both error are up), average the downs (minimums if both are down). The answer is the mean with with original measurement error still in place.
Kip Hansen,
Yup. Statistics is how we quantify estimates of measurement, and other, error. One way to improve estimates based on measurements is to gather a bunch of them and take a mean. This is basic, standard, old as the hills statistical practice. The specifics of this particular application are:
http://www.metoffice.gov.uk/hadobs/crutem3/HadCRUT3_accepted.pdf
Measurement error (ϵob) The random error in a single thermometer reading is about 0.2 °C (1σ) [Folland et al., 2001]; the monthly average will be based on at least two readings a day throughout the month, giving 60 or more values contributing to the mean. So the error in the monthly average will be at most 0.2/√60 = 0.03 °C and this will be uncorrelated with the value for any other station or the value for any other month. There will be a difference between the true mean monthly temperature (i.e. from 1 minute averages) and the average calculated by each station from measurements made less often; but this difference will also be present in the station normal and will cancel in the anomaly. So this doesn’t contribute to the measurement error. If a station changes the way mean monthly temperature is calculated it will produce an inhomogeneity in the station temperature series, and uncertainties due to such changes will form part of the homogenisation adjustment error.
0.2 °C single measurement error (best case!) improves to 0.03 °C IN AGGREGATE … nearly a whole order of magnitude. The HADCRUT4 paper you linked to contains similar language, and heavily references Brohan (2006).
The estimated error of a mean is not original measurment error, the latter which nothing will ever change. The former can be estimated. By the law of large numbers and the central limit theorem, a sufficiently large sample will easily have a smaller estimated error of the mean than one single measurement. The arithmetic for that is laid in the text I quoted above. I don’t know how much more clear I can be on the distinctness between these two things. Apples cannot be conflated with oranges.
TLM,
Which is a very elegant illustration of the concept which I greatly appreciated reading. Unfortunately I note that from the very first reply that it’s anything but a sure-fire way to get the point across.
Reply to Brandon Gates ==> We will have to leave this issue for another time as it is not resolving.
We are talking past one another in some way.
I suggest that we might refer to someone like Wm Briggs for the statistical explanation of why OME must be included in the statement of results, even mathematical and statistical means. If OME is +/- 0.1°C, then whatever you derive from any number of these data (ten or ten million) then you must state MyMean(+/-0.1°C).
I do understand that you believe this is not the case — well, you and some others — but you are talking, I think, about something like the precision of a derived mean. Derive all you like, at whatever precision you like, but at the end, you must add the +/- 0.1°C of the OME/Original Measurement error/Uncertainty Range for the metric and method to have a scientifically true statement.
The Dept of Commerce directive (National Weather Station Instruction 10-1302) for air temperature measurement specifies a standard for minimum and maximum temperatures between -20 and 115 degree Fahrenheit of +/- 1 degree Fahrenheit (2 degree outside that range). So a single daily minmax temperature has a 90% confidence accuracy spec of +/- 2 degree F at best.
The actual uncertainty (as opposed to the single reading spec) would be determined statistically.
The precision, as opposed to accuracy, is a specific to the construction of the thermometer. A precision of +/- 0.1 degree (F or C) would be unremarkable.
Reply to Leo G ==> I am interested in this subject. Can you give a link for National Weather Station Instruction 10-1302?
Thank you. –kh
Kip, The full title for the directive:-
NATIONAL WEATHER SERVICE INSTRUCTION 10-1302
NOVEMBER 14, 2014
Operations and Services Surface Observing Program (Land), NDSPD 10-13
REQUIREMENTS AND STANDARDS FOR NWS CLIMATE OBSERVATIONS:
it is available at the U.S. National Weather Service (NOAA) website:-
http://www.nws.noaa.gov/directives/sym/pd01013002curr.pdf
Reply to Leo G ==> Thank you, sir!
If this were a process control chart the yellow band would be the control limits for the process.
It isn’t a process control chart.
I’d say the yellow band is misleading unless it is clearly identified.
Reply to Greg Locock ==> You are right of course, but there is a limited amount of data that can be typed onto the chart. Fully described in the text as:
That seemed a bit much for the graphic. – kh
Your third chart is interesting. Granting the Met office their claimed +/- 0.1 deg C error bars, theHadCrut 4 averaged temps from 1997 to 2014 are all statistically indistinguishable (fall within the error bars, save for 1999 and 2000, which fall slightly below. A nice demonstration of the ongoing Pause or Hiatus in warming.
And I agree with those who suspect that the +/- 0.1 deg C error bars are optimistic.
Cheers — Pete Tillman
Professional geologist, amateur climatologist
Reply to pdtillman ==> Thank you for your “geologist’s” insight!
If I were measuring a parameter in my lab that plotted like that my conclusion would be that nothing had changed between the ordinate and abscissa.
Except time of course.
Reply to Scott Scarborough ==> Thank you for your input — you say that the ordinates (in this case, average global temperatures) do not significant change over the time period shown. (Which mirrors my conclusion as well).
What is your field of research?
From the text:
“… above the long-term (1961-1990) average. ”
This makes no sense. The 30 year period in the sense of “climate normals” is usually the most recent – ending in zero – set of 30. That is fine with me when used by the local paper or TV, and those folks now use 1981 – 2010.
Because the average (mean) of the more recent set will be higher than the out-of-date set, a person might get the idea that they are not being honest.
Further, for this sort of research, why does not long term include all the data up through the period of interest. That would include 2014.
And finally, it seems to me assumptions about randomness and distribution type are being violated – but that is above my pay grade.
Reply to John F. Hultquist ==> Yes yes yes….we see this type of thing all the time. Different folks use different forks ( in this case, 30-year time periods).
The Intro graph at the top is from Climate.gov and uses the 2oth Century (1901-2000) average!
There are a couple of points to make before anyone attempts to answer that question.
The first is that the uncertainty range as shown is simply a conventional artifact of the distribution in the errors. In fact what you have is a mean and a probability distribution for the estimate of global average temp based (solely) on measurement errors. It is perhaps more useful to imagine a z axis that shows the probability density when working with these estimates.
Second the answer is going to be all about the question. If the question is what are the odds that a prior year exceeded 2014 there are a series of probability calculations to be done comparing each year’s probability density with 2014’s. The diagram (and in particular a)) doesn’t help here. If the question is what is the strength of evidence that we are in fact measuring the same quantity (each year’s sample having the same distribution of errors), then b) is suggestive, but little more.
HAS,
“In fact what you have is a mean and a probability distribution for the estimate of global average temp based (solely) on measurement errors.”
I don’t think it is. It is based on Brohan 2006. Measurement error is a small part, because of the large number of readings in the average. The main component is a spatial sampling uncertainty, based on the finite number of points sampled. IOW, the range of values you might get if you could repeat the measurements in different places.
Nick
It is a semantic point. In order to measure the global average temp you need to interpolate values where you don’t have readings. This IMHO is part of the process of measuring global average temps.
I note that Mr Mosher is dining out on a similar point above, but elsewhere I’ve seen him run the line that everything is an inference from reality, from direct measurement to arcane model outputs.
Reply to Nick Stokes and HAS ==> The Met Office states clearly that their Uncertainty Range is based on two papers:
Quantifying uncertainties in global and regional temperature change using an ensemble of observational estimates: the HadCRUT4 data set
Colin P. Morice, John J. Kennedy, Nick A. Rayner, and Phil D. Jones
and
Reassessing biases and other uncertain ties in sea-surface temperature observations measured in situ since 1850, part 2: biases and homogenisation
J. J. Kennedy , N. A. Rayner, R. O. Smith, D. E. Parker, and M. Saunby
Both available as linked, online, free.
Thanks, Kip. But both papers refer to Brohan 2006 for the description of the uncertainty model.
Reply to Nick Stokes ==> You might re-check that. The SST paper doesn’t even reference Brohan (2006 or 2009). The HADCRUT4 paper explicitly states it abandons the Brohan 2006 method and instead ” the
method used to present these uncertainties has been revised. HadCRUT4 is presented as an ensemble data set in which the 100 constituent ensemble members sample the distribution of likely surface temperature anomalies given our current understanding of these uncertainties.”
Links to the papers are above.
Errata: … The SST paper doesn’t even reference Brohan (2006), only 2009.
Kip,
“The HADCRUT4 paper explicitly states it abandons the Brohan 2006 method”
That is the relevant paper – your post is about HADCRUT4. And they have modified the method. But I was talking about the uncertainty model. And of that they say:
“The models of random measurement error, ?, and sampling error, ?, used in this analysis are exactly as described in Brohan et al. [2006].”
And that is where you will find the full description.
Reply to Nick Stokes ==> So, I think we agree now … HADCRUT4, the data set being discussed here, does not use the Brohan 2006 model, but instead:
“The uncertainty model of Brohan et al . [2006] allowed conservative bounds on monthly and annual temperature averages to be formed. However, it did not provide the means to easily place bounds on uncertainty in statistics that are sensitive to low frequency uncertainties, such as those arising from step ch
anges in land station records or changes in the makeup of the SST observation network. This limitation arose because the uncertainty model did not describe biases that persist over finite periods of time, nor complex spatial patterns of interdependent errors.
To allow sensitivity analyses of the effect of possible pervasive low frequency biases in the observational near-surface temperature record, the method used to present these uncertainties has been revised. HadCRUT4 is presented as an ensemble data set in which the 100 constituent ensemble members sample the distribution of likely surface temperature anomalies given our current understanding of these uncertainties. This approach follows the use of the ensemble method to represent observational un
certainty in the HadSST3 [Kennedy et al ., 2011a, 2011b] ensemble data set “
I would assert that the measurement error is only useful in determining the variance of the measurement, not the process. Therefore, before claiming that there is any change in the underlying process, one needs to also take into account the variance of the process. Clearly the process variance is MUCH greater than the measurement error.
What matters here is the process, not the measurement. Until the variance in the process can be accounted for, these temperature measurements, however small the measurement variance, CANNOT be used as evidence of a warming trend that is attributable to increasing CO2.
Climate scientists, almost universally, have this data analysis process backwards. They use this data (with small measurement error) as EVIDENCE for their claim when in fact the uncertainty (variance) in (their assumptions about) the process is such that they cannot even make the claim – let alone point to evidence.
Jeff I am with you on this.
“Climate scientists, almost universally, have this data analysis process backwards. They use this data (with small measurement error) as EVIDENCE for their claim when in fact the uncertainty (variance) in (their assumptions about) the process is such that they cannot even make the claim – let alone point to evidence.
The problem I have with what I like to call “the physical evidence” is that the average is a mixture of data and interpolation.
Suppose we made no interpolations at all – just used the available data. There would be a calculated average temp with some small degree of uncertainty. If the temperature changed one could make a claim that the places where the temperature rose (increased) were, per unit area, higher or lower than some other parts of the total group. Well, that is OK, however the final number and any change in it over time still has value because it is based on measurements. Changes are real.
Area-weighting is obviously attractive but the temperature varies within an area because it is the topography (etc) that has strong effects. Interpolation of data trying to account for topography is risky and increases overall uncertainty so it should be avoided.
If the average temperature were calculated without area weighting, the result would be inaccurate, but as precise as possible because it involved no modelling, which is to say, no guessing what the data should be.
We don’t really care what the actual temperature is because no one experiences it. We are constantly in flux. Our experience is a daily rise and fall between 2-3 degree ‘limits’. Now suppose that average temperature changed. The interesting thing is the change, not the magnitude, and not any modelling based on raw data applied to unmeasured areas. My point is that a comparative calculated temperature is more valuable and repeatable as it contains no modelling.
Repeating 1000 data readings at the same place would be valuable. Estimating what the numbers would be at 1000 points that were not measured cannot possibly be as accurate as using 1000 real measurements.
Trying to extend the temperatures confidently to other unmeasured places is a stretch when the values of the changes are so small. I am interested to see the rise in any set of values – as many as possible and as untouched as possible. I am not immediately concerned with the rate of rise for the whole (which requires measuring the whole). I want to see a measured rise across as wide a spectrum of conditions as possible. If there is a measured rise, we have something to talk about. Rejigging past temperatures, area weighting and interpolation are useful for forecasting what the numbers should be in the unmeasured areas but that is not as helpful as a comparative contract. It doesn’t have to be ‘certified’ but it has to be as comparatively precise and accurate as possible. It is Delta T that matters, not T. The best Delta T comes from measurements, not modelled numbers.
Exactly. The assumption of anthropogenic warming has already been made by most of climate science. Therefore they assume that they have already accounted for all sources of variance other than measurement error through the use of their models. As a result we have the situation that we have today in climate science – the models are correct and it is the data that is wrong.
Crispin in Waterloo
February 1, 2015 at 8:29 pm
This is my problem too. Due to the sparsity of actual measurements the so called Global Average Temperature is more based on the interpolation and homogenization algorithms. Averaging only removes errors if the errors are random not systemic and using one set of algorithms inherently leads to systemic errors. So a validation approach should be taken.
Run the interpolation and homogenization algorithms for every station that you have accurate measurements for as if that station was not there. Then compare the values generated to the actual measured values. This will result in errors of various sizes. Assuming that the error bars show the potential errors in the from these algorithms then the largest of these errors both positive and negative – worldwide – become the error bar values of the ‘Global Average Temperature’. This would be a far more real world defensible approach,
Those coming up with ‘better methods’ for assessing error should always validate their mathematical models against real world data. They are climate _scientists_ after all
Reply to JeffF and Crispin ==> All good points…for another essay though.
Appreciate your input.
Nonetheless, if these were your data points from an acceptable method and process, what would their lying within the Uncertainty Range band mean in your fields?
OK, so IF we had a justifiable reason to assume that the data was distributed in a certain way, we could employ a statistical test on the RAW DATA to determine whether or not any data point was statistically significantly different from any other data point. That is really just basic hypothesis testing. I could therefore make a justifiable claim that, for example, 2014 was statistically significantly warmer than 2012, etc. – but ONLY under the aforementioned assumptions. Depending on the distribution, I would use a different test. The focus, in my field, therefore, is in the validity of the CLAIM. Statistics is a formally defined mathematical discipline and as such, statistical claims are no different than the results from calculus or linear algebra, in my opinion.
Now, IF you start monkeying with the data or adding in additional assumptions, you are likely inserting uncertainty, not reducing it. What I would tend to do there is to add confidence to my “adjusted” data, to reflect the fact that I am no longer certain that the data is reflecting the actual underlying process – because I am just “guessing” or modeling the process. At NO point would I assume that I was more confident in my adjusted data than the confidence I have in the raw data. That point is KEY. When I adjust the data, I am guessing at how to do it. Therefore my confidence decreases.
As a final note, a claim that 2014 is warmest because I find, by whatever approach, that 2014 is the MOST LIKELY to be the warmest, when all of the differences are within the margin of error is, statistically, unjustifiable. If I state my assumptions and find that 2014 is not statistically significantly warmer than other years, then I CANNOT KNOW whether it is the warmest or not. My statistical test is just not good enough. End of story.
As with many others, I think the claim of 0.1C error bars is unfounded and unbelievable. There are just too many sources of error. Further, even if currently that is true, there’s not a chance it is true for data pre-1940 or so. Does anyone know how the Met Office calculated this error bar?
Reply to Patrick B ==> See this reply above.
Given that the uncertainty is “+/- 0.1°C” (exactly? ), we must assume that it is a rough guestimate.
I have less expertise than you so I can’t add much other than to say I love the approach.
Reply to Gunga Din ==> Thank you.. – kh
I’ve stated it before and I’ll reiterate it here; that accuracy level is impossible and frankly irrational.
Normally the first chore is to calculate the error possibilities for every datum/datum source.
e.g. A temperature station capturing temperature / weather data,
– the equipment installed, the related conditions surrounding the station,
– the quality and training of the staff,
– method and consistency of data entry,
– any transcription of the data
Not forgetting the time of day and frequency. These error statistics are cumulative; while initially centered on individual temperature readings, for many/most stations these error stats remain for long periods of time.
As a quick translation, government climatologists have been actively analyzing and re-analyzing temperature records adding in corrections for many reasons.
These corrections do not eliminate nor minimize the error! Instead they are somewhat definitive identification of error ranges; including these ranges as temperature adjustments effectively double error ranges whenever they are made because of assumptions.
When adding temperature ranges from different stations, a more accurate station does not improve the error range for less accurate stations though it might decrease overall average station deviation.
Whenever errors and deviations are calculated for information gained through a process, errors are calculated for every process component;
e.g. data capture,
– data storage,
– data lookup,
– data processing,
– data transmission followed by storage, lookup, processing,
– data analysis,
…
And no! These stages are not error proof. Error rates for every stage in a process are multiplied against error rates in all other process stages for a cumulative process error. This is the process engineers use to determine accuracy, effectiveness and efficiency for industrial processes.
If station placement or upkeep introduce a two to three degree error at certain times of the day, the final temperature anomaly can no be less than that two to three degree error. As Willis and Steven McIntyre have pointed out, Met Office reaching a .1C precision level does not mean that they’ve reached a .1C accuracy level; only that they’ve introduced enough numbers to overwhelm lower precision numbers.
Bluntly, Met office and others are ignoring overall accuracy and error rates while claiming their pseudo precision as accuracy or error levels.
Their error bars should include ranges for assumed temperature adjustments, observed station deficiencies, (all non-certified stations should be listed as unimproved stations until proven otherwise), Other processes should either have identified error rates or an error assumption based on sampling.
Yes. It is important to separate precision from accuracy. You can have low variation and all your measurements are still far from the real value. See http://www.mathsisfun.com/accuracy-precision.html
The trend is more important than the actual value of the global average temperature which is difficult to calculate from the existing data. I would like to see trends of individual weatherstations and then calculate average global trend.
ATheoK, I agree.
Global temperature is a meaningless number. If we fear heating or cooling, the things to track are the sectors of the planet where Ice Ages create glaciers. If these select areas are cooling, the glaciers and snow cover increases.
This is why global warmists loved to talk about the melting glaciers and ice sheets until it was obvious, these are now growing rapidly in both the North and South Poles. So now they talk about ‘global temperatures’.
Measuring this with a very fine scale so the tiniest of mathematical ‘rises’ of 0.001 are ‘detected’ is pure fraud. The ‘error bars’ are meaningless since we have no ‘global temperature’ but rather, a mosaic of temperature zones that never move up or down entirely in tandem.
We do know all ice ages happen again and again, they are ten times longer than interglacials and that they all, without exception, end very abruptly like a light being turned on (that light being the sun).
ATheoK,
This childish ignorance of accuracy and precision concepts extends through a lot of climate science.
There are related questions about GCMs and ensembles. An individual modeller might make several runs, the error bars should be outside all of them. For a CMIP, all modelling centres should submit these (wide?) error bars so that an estimate of overall accuracy can be attempted. If they are so stupid as to take an ensemble average, as they do, and error estimates based on the statistics of precision, the most meaningful count is the one about the number of wrong steps they have incorporated into the fictitious result.
I cut my teeth on similar concepts felated to ore grade estimates at develoling mines, from sparse drill hole assays. The significant difference is that it is easy to lose, or fail to make, large amounts of money if you let your heart rule over reality.
If there is unceftainty, correct procedures will usually be found for uncertainty/error estimates in the tomes from the French Bureau of Weights and Measures.
I have not ever seen that authority quoted in a cli sci paper.
Before the temperature adjustments, the 30s were the warmest. Who disagrees with that.???
Me. And I get weary of people who won’t bother to check which data set they are talking about.
I don’t know – maybe this one for starters:
Or maybe take a look at some of this Raw Data before adjustments:
http://www.breitbart.com/london/2015/01/30/forget-climategate-this-global-warming-scandal-is-much-bigger/
Take a good look…
“I don’t know”
Well, you could try saying what it is. It sure doesn’t look like HADCRUT 4.
Nick Stokes, what is the total increase in land temperatures caused by the adjustments? Reported increase (1900 to 1914) minus Raw temperature increase (1900 to 1914).
What is the similar figure for the Ocean SSTs?
If the process of continual adjustment has been going on for years now, (and shows no signs of ever stopping), how can the error margins be anything but a large figure approaching 0.7C or so. Every week, they are fixing last week’s errors.
Nick’s own analysis of the adjustments made to the US temperature record are that it has added over 0.5C (0.9F) to the trend (which is what the total warming trend is in the US – all adjustments).
http://s30.postimg.org/8e6xa6moh/Nick_Stokes_US_Adjustments.png
Bill Illis,
“Nick Stokes, what is the total increase in land temperatures caused by the adjustments?”
Zeke analyzed this some years ago in posts at the Blackboard. Here is the most comprehensive. He and I, and some others, did indices using unadjusted GHCN data, and he compared with the majors. Trends showed no systematic difference. And nor do graphs. Here is land/ocean. Here is SST, though there there is no adjustment difference. And here is land only.
Bill Illis,
“Reported increase (1900 to 1914) minus Raw temperature increase (1900 to 1914).”
I presume you mean 2014? You can get quantitative answers here. It’s an active graph where you can change, stretch the scales, and choose datasets. You can also put it into trendback mode, where it plots the trend from x-axis date to present (2014). If you select the indices based on unadjusted GHCN (TempLS mesh and grid) and say HADCRUT and NOAAlo (adjusted), you’ll see they are all in the range 0.7 to 0.8 °C/Cen. Adjustment does tend to increase the trend, but less than 0.1 °C/Cen. In fact, HADCRUT and TempLS track almost exactly bact to 1910. GISSlo has a slightly higher trend. Here is a screenshot:
http://www.moyhu.org.s3.amazonaws.com/pics/adjustdiff.png
Mosher,
Can you simply acknowledge anything?
The graph you shows would be interpreted as 1998, 2005, 2010, and 2014 are statistically in a dead heat. There is nothing else that can be said from a science view point. For anyone who wants 2014 to the Hottest ever,” then it’s all just Feynman’s Cargo Cult Science for them.
Joel O’Bryan
PhD, UMass Med School, BioMedical Science
BS, USAF Academy, Civil Engineering
Footnote: I have long realized that the scientific truth in areas matters not to people who align with a “Progressive” ideology. That includes Professors whom I have worked with in the past.
Dead heat is a good description. According to Trenberth it got drowned in the deep, blue sea ;-}
Reply to Joel O’Bryan ==> Thanks for your professional biomedical viewpoint. You draw this conclusion from the top set of blue lines, which form the Uncertainty Range for the 2014 value? But you wouldn’t include 2006, 2007 or 2004? If not, why not?
Statistical confidence intervals deal with sample size error and assume ceteris paribus for all other potential sources of error. Obviously, all other things are not equal and many other sources of error may be at work making the very small changes being discussed ridiculous for any decision making purposes.
Reply to Jim G ==> The Met Office Uncertainty Range is not strictly and only a statistical confidence interval. See their FAQ and the two papers quantifying this value linked in the FAQ answer.
Thanks, read it. Not real impressed as it is another model to estimate uncertainty and goes so far as to caution the reader to check against other sources. From what I was able to discern it does not cover the myriads of other sources of error that could be involved in producing these miniscule temp anomalies.
Jim G commented on
So, we’re now augmenting the models of surface temp, with the output of gcm’s that are validated against the output of the surface temp model.
That sounds like a splendid idea!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
I noticed up above in warrenlb’s post http://wattsupwiththat.com/2015/02/01/uncertainty-ranges-error-bars-and-cis/?replytocom=1851270#respond
that we have the error down under 0.01C, see it’s working already!
You are all just jealous you didn’t come up with it!
JimG,
The ceteris paribus violation that I find fascinating in global temperature sets is in the Time of Observstion TOBs corrections. Example, a temperaure max read at 0900 hrs probably reflects the previous day’s peak. That is OK at the Equator, where the term previous day has climate meaning. It is not the case at the Poles, where nights and days are half a year long. The concept has variation in accuracy as you move from Equator to Pole.
The entire concept of tiny anomalies in temperature having any usefulness based upon the data observations available is ludicrous on its face. The only thing even more ridiculous is making multi billion tax dollar expenditures and huge negative impacts upon national economies based upon these same observations, and of course, their adjustments of same.
Jim G commented on
It’s criminal.
If this was just a bunch of naive scientists fooling themselves with simulators I would just laugh and poke fun at them, but when they decided to use them to save the planet by crashing modern society into a brick wall fulfilling greens fantasy of stopping human development, I take that personal.
These statistics leave a lot to be desired. First, the measures come from a number of different sensors (IE land and ocean sensors), and measure not one but 1000’s of different locations each with its own response to short and long term oceanic/atmospheric weather pattern parameters. So for me, I can’t really say much about what the graph says. The data pool is so fraught with inconsistencies and lack of variable control as to be useless.
Pam, a friend of mine many years ago worked on a cattle ranch in Alberta and he described to me a Chinook in February that he could ride from 15F to 40F in a few dozen steps and weave in and out of the warm and cold air. There the accuracy of any measurement at a spot would be + or – ~12F.
I’ve seen more than that. In the Arizona desert, and many other places, temp inversions of > 20° F are common. Pilots that fly frost control missions over sensitive crops (which I did for years) deal with this all the time. The ground observer reports a temp of -1 C so you fly over him at about 50 feet altitude and a few seconds later his temp is +4 C. I have seen the temp from the runway to the top of the control tower at various airports vary by > 10° C many times at night. Point being, what is the “real” temp there?
I should point out that the surface temp increase in frost control has nothing to do with engine heat. That’s trivial. It’s the downwash from the aircraft breaking up the inversion. The wingtip vortex tends to settle and spread out, dragging warmer air from above the inversion down to the ground and spreading it out.
The larger point as it applies to this discussion is that if little ol’ me in a 4000 pound airplane can raise the temp of a 640 acre farm field by several degrees C, what effect on the temp records worldwide has the massive increase in air traffic and heavy jet traffic late night and pre-dawn at airports with official stations had? I strongly suspect that a LOT of airport based “Official Low Temps” were higher than they otherwise would have been had the air traffic not stirred up the inversion layers pre-dawn.
So we can all blame UPS and FedEx for it… [grin]
Reply to Gary Pearse and Bill Murphy ==> Thank you both for your Real World experience input on very local temperature variance. I have experienced the same in the ocean with snorkling and scuba diving.
Love the frost control flights story!
@ur momisugly Bill Murphy, that makes me wonder what happens to measured air temperatures down wind of wind turbines.
Glad you asked – you have scientific curiosity which seems to be lacking among those thousands of “climate scientists” of the 97 percent. Before I get down to specifics, let me remind you of the advice that Ernest Rutherford, father of the nuclear atom, gave to his colleagues. “If you find that your work requires statistics, you should have been doing something else.” Next, lets look at your CRU & Hadley Centre graph. It is worthless. HadCRUT, NCDC, and GISS have collaborated to falsify the temperature record since 1979. I discovered this this was done to the eighties and nineties temperatures when I was writing my book. I even put warning about it into the preface but nothing happened and nobody had any comments. After the book went to press I discovered that all three had used common computer processing and unbeknownst to them, the computer left identical footprints on all three data-sets. These comprise sharp upward spikes at thr beginnings of years. Comparing their temperatures to satellite temperatures reveals that they gave the eighties and nineties an upward slope amounting to 0.1 degrees Celsius in 18 years. Satellite data do not show this warming. ENSO was also active at the time and created five El Nino peaks, with La Nina valleys in between. To determine the mean global temperature in such a situation you have to draw a straight line from an El Nino peak to the bottom of a neighboring La Nina valley and mark its cente with a dot. These dots define the global mean temperature for the corresponding calender date. I did this for all the El Ninos in that wave train and found that the dots lined up in a horizontal straight line. This proves the eighties and the nineties were another no-warming zone, equivalent to the current hiatus/warming and equally as long. This means that the total no-warming time since the beginning of observations in 1988 is 36 years, three-quarters of the time that IPCC has officially existed. There really was a quick step warming there that started in 1999, in three years raised global temperature by a third of a degree Celsius, and then stopped. This is the only warming we have had since 1979. In the presence of the fake warming it is hard to find but you can easily find it in satellite records. It is responsible for all twenty-first century temperatures being higher than the twentieth century, with the exception of the super El Nino. But the fake warming triple alliance was not content to stop with the eighties and nineties biy continued their manufactured warming in the twenty-first century. This id obvious fom the graph thatbyou show. In satellite records the twenmty-first century is flat (with the exception of the La Nina of 2008 and El Nino of 2010 which cancel one anpther). The ground-based three, on the other hand, exhibit the same temperature rise they showed in the eighties and nineties. This rises the right hand end of the graph enough to make 2014 the warmest year according to their calculation. Or does it? In their graph the El Nino of 2010 is higher than the super El Nino of 1998 which is impossible. The two El Nino peaks are poorly resolved and it is obvious that only thanks to the continued use of fake warming is this reversal of temprerature values possible. The warmest place winner is thus the super El Nino of 1998 and not that lowly 2014 as advertised.
Dr. Richard Feynman said (wrote) in his famous “Cargo Cult Science” 1974 CalTech commencement address the following:
I think that Dr Feynman’s statement from 1974 is quite prescient for what we are seeing today in Climate Science, i.e. the new Cult Cargo Science. Mainstream NASA GISS, NOAA, and US Doe, and UK CRU, and Aus BoM scientists bask in the temporary limelight of “fame and excitement” of their latest alarmist rhetoric.. rhetoric that pleases their political paymasters, and green coterie, while their integrity slips away month-by-month, year-by-year, into the drain.
Today’s Climate Scientists are lost in a wilderness of their own deceit, many in search of grants… rent-seekers, a temporary fame and excitement. They double down now on their dishonesty, hoping that nature will somehow prove them right, but each year’s passing makes their deceptions even more visible. They collaborate with data keepers to adulterate the data records to further the deception and preserve their reputations. History will not be kind to the Climate Science charlatans that infest NOAA< GISS, GISS etc.
Note: to those who are unfamiliar with Dr Feynman’s Cult Cargo Science analogy, I offer this explanation from the book, “The Pleasure of Finding Things Out – The Best Short Work of Dr. Richard Feynman.” By (of course) Dr Richard Feynman, Helix Books, Perseus Publishers, Cambridge, Mass, 1999:
The climate is not adhering to their precepts and forms of investigation. Precisely becasue they have failed to follow the scientific method and allow for alternative hypotheses, and indeed the validity of the null hypothesis, that the variability in temperature we are seeing is mostly natural in causation. Yet GISS, NCDC, DoE/LLNL modelers, and UK’s CRU, and all the other believers they have convinced to follow them, await for the airplanes to land.
That’s one large paragraph…
Well, since you asked for opinions from all fields I may as well have a go at this.
The stated “accuracy” of the temperature record is in reality a precision and is in fact useless as a basis for comparing years against each other, especially over many decades.
Lets say we have two yard sticks (meter sticks for folks outside the USA), one is made of wood and one is made of Invar (a iron/nickel metal alloy that is very stable size wise when the temperature changes). Each has 1000 divisions scribed along it’s edge which yields measurements as small as 1 millimeter. This is the precision of the yardstick. You can use either one (wood/metal) to measure a length and report the length to within +/- 1/2 mm (assuming your eyesight is good enough to see which scribe line is closest to what you are measuring).
Now, the problem is that the accuracy of these two example measuring instruments is vastly different, the wooden one will shrink and swell as the humidity changes, perhaps by 5 percent (50 millimeters). But the Invar one will be very stable with temperature and humidity changes. Note that both instruments are fit for certain purposes, a wooden yard stick is probably fine if you are cutting cloth to sew together into a dress with tolerances of plus or minus 10 millimeters (a little less than half an inch) but you sure would not want to use one for constructing an aircraft.
One way to reconcile this would be to calibrate each instrument against a standard instrument before each use. Of course that is costly and requires that the place where the instrument is used has the same temperature and humidity as the place where it is calibrated, not convenient at all.
The other way to do this is to have very precise and stable instruments which can be calibrated for accuracy on a periodic basis. In industry, measuring instruments that are “mission critical” (i.e. if it’s wrong there could be loss of life, or limb, etc) are generally under a strict re-calibration procedure. In this type of system every instrument goes through a periodic recalibration. For example voltage meters are sent to a calibration facility every year and re calibrated. In the old days there usually some adjustable resistors (potentiometers) inside the meter. The meter would be calibrated against a single standard for that company and the resistors adjusted. Then a seal would be put on the instrument to tell if anybody changed it’s calibration. Modern voltage meters usually contain a microprocessor and some memory where calibration factors are stored. By changing the values of these calibration factors the reported measured value can be made to match the standard.
Manufacturing processes for critical items almost always start with an instruction to list the “cal status” of all instruments used in the tests. No professional would sign off on test data for mission critical items with test instruments that are “out of cal”. This is called traceability, and you can be sure if an airplane crashes they will go back through all the manufacturing and maintenance records to make sure somebody did not adjust the auto-pilot (for example) using a voltage meter that was “out of cal” (not just a hypothetical example, it has happened)
The whole problem with the temperature data sets is that this critical calibration step has always been missing. These things got installed with very little in the way of calibration and traceability involved, They where mostly used at airports in the beginning to make predictions about how the weather might change during an upcoming flight. The temperature sensor network was never designed to be accurate enough to determine the “hottest year”.
Until and unless the climate science community can provide full traceability, i.e. how often where all these sensors calibrated, including in-situ errors (AC units installed after the sensor was installed) we can safely assume that the “Accuracy” of these measurements is probably ten or one hundred times worse than the stated precision. Therefore the temperature data is probably only good to +/- 1 degree (F) at best, over the length of the record (150 years or so) it is probably only good to 2 or 3 degrees and that is being generous.
Arguing about one year being 0.01 degrees “warmer” than all the rest of the temperature record is silly.
Cheers, KevinK.
+1
And when you look at Anthony’s revelatory work on the stations in AC exhausts, asphalt parking lots, etc that are used by climatologists in the US, knowing that these are the best in the world, I would say your 2 or 3 degrees is about right.
Reply to Kevin K ==> Very good input. You may be interested to know that temperatures in the 1960s are officially reported (in the USA) in single degree F increments. 60, 64, 72, 02, etc. There is no degree of accuracy, not even 1/2 degrees. Thermometers were likewise “standardized” but not celebrated ever, as I understand it.
I am working, as mentioned before, on investigating this aspect of Original Measurement Error. A lot of this is covered in the HADCRUT4 papers.
Another point about instrument calibration issues. In critical applications, it is not adequate to simply use an instrument with an unexpired calibration sticker. The instrument in question will be check for accuracy when it is turned in for its next calibration check. If it is found out of spec, a notice will be issued to the users and any critical measurements will be re-checked with another calibrated instrument. The out of spec instrument will then be re-adjusted to be in spec.
Gary, yeah, been there done that. I helped calibrate an expensive satellite (many ten’s of millions of dollars) using supposedly calibrated test instruments.
Then, AFTER it was launched an audit found one test instrument that was “out of cal” (by a month). Boy that got lots of attention from the customer; “You sold us a multimillion dollar instrument that was not fully calibrated……” and “We want our money back….”
Yeah, a real feces hitting the rotating approximately planar surfaces moment…. I had to go back through years of test data and demonstrate that the “calibration error” was still small enough that the total system still met the specifications.
You can be sure I triple check all those calibration stickers now…
Calibration, it’s a good thing.
Cheers, KevinK.
Well said Kevin.
You should also pay attention to the advice of Ernest Rutherford. Mathematical masturbation, graphical or otherwise, is much too prevalent on both sides of the climate debate. Looking at the actual temperature observatios, even those that have been manipulated, irrespective of source, instead of the anomalies, shows the irrelevance of much of the statistical discussion.
My comment was meant for Mr. Arrak.
And so just what was it that Ernest Rutherford gave as advice we should pay attention to ??
One quote often attributed to Ernest Rutherord is:
“If your experiment needs statistics, you ought to have done a better experiment.”
That is quite apt, in the context of climate ‘science’
My interpretation of Rutherford’s advice is: Don’t waste your time with sophisticated error analysis, but invest it in thinking about better instrumentation and better methods of data reduction. Electronic thermometers are better than mercury thermometers. ARGO buoys are better than measurements by ships, although they can’t replace them. Temperature measurements by satellite are ideal but they don’t measure the surface temperature. Weather models which are used for weather forecast can also be used to bring both techniques together and to fill missing areas. Future is the main issue, not the past.