Uncertainty Ranges, Error Bars, and CIs

Guest Post by Kip Hansen

clip_image002

This post does not attempt to answer questions – instead, it asks them. I hope to draw on the expertise and training of the readers here, many of whom are climate scientists, both professional and amateur, statisticians, researchers in various scientific and medical fields, engineers and many other highly trained and educated professions.

The NY Times, and thousands of other news outlets, covered both the loud proclamations that 2014 was “the warmest year ever” and the denouncements of those proclamations. Some, like the NY Times Opinion blog, Dot Earth, unashamedly covered both.

Dr. David Whitehouse, via The GWPF, counters in his post at WUWT – UK Met Office says 2014 was NOT the hottest year ever due to ‘uncertainty ranges’ of the data — with the information from the UK Met Office:

“The HadCRUT4 dataset (compiled by the Met Office and the University of East Anglia’s Climatic Research Unit) shows last year was 0.56C (±0.1C*) above the long-term (1961-1990) average. Nominally this ranks 2014 as the joint warmest year in the record, tied with 2010, but the uncertainty ranges mean it’s not possible to definitively say which of several recent years was the warmest.” And at the bottom of the page: “*0.1° C is the 95% uncertainty range.”

The David Whitehouse essay included this image – HADCRUT4 Annual Averages with bars representing the +/-0.1°C uncertainty range:

clip_image004

The journal Nature has long had a policy of insisting that papers containing figures with error bars describe what the error bars represent, I thought it would be good in this case to see exactly what the Met Office means by “uncertainty range”.

In its FAQ, the Met Office says:

“It is not possible to calculate the global average temperature anomaly with perfect accuracy because the underlying data contain measurement errors and because the measurements do not cover the whole globe. However, it is possible to quantify the accuracy with which we can measure the global temperature and that forms an important part of the creation of the HadCRUT4 data set. The accuracy with which we can measure the global average temperature of 2010 is around one tenth of a degree Celsius. The difference between the median estimates for 1998 and 2010 is around one hundredth of a degree, which is much less than the accuracy with which either value can be calculated. This means that we can’t know for certain – based on this information alone – which was warmer. However, the difference between 2010 and 1989 is around four tenths of a degree, so we can say with a good deal of confidence that 2010 was warmer than 1989, or indeed any year prior to 1996.” (emphasis mine)

I applaud the Met Office for its openness and frankness in this simple statement.

Now, to the question, which derives from this illustration:

clip_image006

(Right-click on the image and select “View Image” if you need to see more clearly.)

This graph is created from data directly from the UK Met Office, “untouched by human hands” (no numbers were hand-copied, re-typed, rounded-off, krigged, or otherwise modified). I have greyed-out the CRUTEM4 land-only values, leaving them barely visible for reference. Links to the publically available datasets are given on the graph. I have added some text and two graphic elements:

a. In light blue, Uncertain Range bars for the 2014 value, extending back over the whole time period.

b. A ribbon of light peachy yellow, the width of the Uncertainty Range for this metric, overlaid in such a way as to cover the maximum number of values on the graph.

Here is the question:

What does this illustration mean scientifically?

More precisely — If the numbers were in your specialty – engineering, medicine, geology, chemistry, statistics, mathematics, physics – and were results of a series of measurements over time, what would it mean to you that:

a. Eleven of the 18 mean values lie within the Uncertainty Range bars of the most current mean value, 2014?

b. All but three values (1996, 1999, 2000) can be overlaid by a ribbon the width of the Uncertainty Range for the metric being measured?

Let’s have answers and observations from as many different fields of endeavor as possible.

# # # # #

Authors Comment Policy: I have no vested opinion on this matter – and no particular expertise myself. (Oh, I do have an opinion, but it is not very well informed.) I’d like to hear yours, particularly those with research experience in other fields.

This is not a discussion of “Was 2014 the warmest year?” or any of its derivatives. Simple repetitions of the various Articles of Faith from either of the two opposing Churches of Global Warming (for and against) will not add much to this discussion and are best left for elsewhere.

As Judith Curry would say: This is a technical thread — it is meant to be a discussion about scientific methods of recognizing what uncertainty ranges, error bars, and CIs can and do tell us about the results of research. Please try to restrict your comments to this issue, thank you.

# # # # #

Advertisements

252 thoughts on “Uncertainty Ranges, Error Bars, and CIs

  1. I believe they are overconfident in asserting a 0.1C accuracy for recent annual global temperature anomalies. My guess would be at least 0.2C to 0.3C and possibly as much as 0.5C or more in recent years and possibly as much as 1.0C or more for the oldest years in the data set. I suspect the largest sources of uncertainty are from very poor spatial coverage, representativeness of measurements, changes in station locations, and “homogenization” that may add uncertainty rather than reducing it. Siting is critical for representative measurements and the USCRN is helping to address this problem, but only for a very small portion of the globe.

    • “””””…..“The HadCRUT4 dataset (compiled by the Met Office and the University of East Anglia’s Climatic Research Unit) shows last year was 0.56C (±0.1C*) above the long-term (1961-1990) average……”””””
      Well I have a number of problems with this statement.
      To begin with; the base period of 1961-1990. This conveniently includes that period in the 1970s when the climate crisis was the impending ice age, with wild suggestions to salt the arctic ice with black soot to fend off the ice age. Global starvation was predicted by the very same bunch of control freaks who are now trying to stop the impending global frying.
      Well also half of theta base period comes before the age of satellite data gathering which I believe started circa 1979, which is nearly coincident with the launching of the first oceanic buoys that were able to make simultaneous ocean near surface (-1m) water Temperatures, and near surface (+3m) oceanic air Temperatures, circa 1980. In 2001 this buy data (for about 20 years) showed water and air Temperatures were not the same and were not correlated. Why would anyone even imagine they would be either of those things.
      So I don’t believe any “global” climate Temperatures prior to 1980. And why stop the comparison date 25 years before the present.
      Why not use the average of ALL of the credible data you have. Otherwise the base period numbers are just rampant cherry picking.
      So I don’t give any credibility to any HADCRUD prior to 1980, or anything they might deduce later on referenced to that early ocean rubbish data.
      And finally I don’t think any of their sampling strategies are legitimate, being quite contrary to sampled data theory that is well established. (you wouldn’t be able to be reading this if it wasn’t).
      G

    • Reply to oz4caster ==> I think that the Uncertainty Range of 0.1°C may be too small even for most recent measurements. But for this discussion, I will let that slip by — it is a near miracle that they admit such an Uncertainty Range at all.
      I am working (longer term) on a piece that explores actual Original Measurement Error in world temps over time, and what that may mean for the Global Averages.
      For instance, I understand that BEST’s krigging results are maximally accurate only to 0.49°C.

  2. Well, you belong to a Church. Fine with me.
    Just don’t ask scientists to play Church games.
    Yes some scientists to go to church, in fact many scientists were ministers, monks, priests etc. but they respected separation of church and science.

  3. A related item:
    IPCC says the climate may have cooled since 1998!
    Here is the rational:
    The claimed error is +/- 0.1 degree. But the warming is only 0.05 degree per decade, so the actual warming is between -0.05 (cooling) to +0.15 degree/decade.
    In other words since 1998, the climate may have cooled by 0.05 degree/decade or warmed by 0.15 degree, or anything between those two limits.
    Here is how te IPCC stated it:
    “Due to this natural variability, trends based on short records are very sensitive to the beginning and end dates and do not in general reflect long-term climate trends. As one example, the rate of warming over the past 15 years (1998–2012; 0.05 [–0.05 to 0.15] C per decade), which begins with a strong El Niño, is smaller than the rate calculated since 1951 (1951–2012; 0.12 [0.08 to 0.14] C per decade). {1.1.1, Box 1.1} “
    from pg 5 of : https://www.ipcc.ch/…/asse…/ar5/syr/SYR_AR5_SPMcorr1.pdf
    3 hrs
    (Of course I don’t believe that +/-0.1 degree, but this is about using their numbers.)

  4. Will you take the analysis of someone who deals in world commodity markets? The price has peaked, get out now.

  5. The first thing I would ask about such a graph is if the computed average is even relevant. Since temperature does not vary linearly with power (w/m2) it is possible to arrive at different spatial temperature distributions that have identical average temperatures, but very different energy balances. For example, two points with temperatures of 280K and 320K would have an average temperature of 300K and an equilibrium radiance of 471.5 w/m2. But two points each at 300K would also have an average temperature of 300K, but an equilibrium radiance of 459.3 w/m2.
    So, with that in mind, the error bars not only render any conclusion about temperature trend being positive or not meaningless, the error range of the equilibrium energy balance is much larger due to the non linear relationship between the two. Since AGW is founded upon the premise that increasing CO2 changes the energy balance of the earth, attempting to quantify the manner in which it does so by averaging a parameter that has no direct relationship to energy balance renders the graph itself meaningless in terms of statistical accuracy and physics as well.

    • “The first thing I would ask about such a graph is if the computed average is even relevant.”
      In point of fact it is not actually an average of temperatures at all.
      Although most people who follow the climate debates dont get this ( in fact most guys who produce these averages dont get it )
      What is the global temperature average if it is not really an average.
      mathematically, it is a prediction. It is a prediction of what you would measure at unvisted locations.
      “Station observations are commonly used to predict climatic variables on raster grids (unvisited locations), where the statistical term “prediction” is used here to refer to“spatial interpolation” or “spatio-temporal interpolation” and should not be confused with “forecasting.” In-depth reviews of interpolation methods used in meteorology and climatology have recently been presented by Price et al. [2000], Jarvis and Stuart [2001], Tveito et al. [2006], and Stahl et al. [2006]. The literature shows that the most common interpolation techniques used in meteorology and climatology are as follows: nearest neighbor methods, splines, regression, and kriging, but also neural networks and machine learning techniques.”
      Spatio-temporal interpolation of daily temperatures for global land areas at 1 km resolution
      Milan Kilibarda1,*, Tomislav Hengl2, Gerard B. M. Heuvelink3, Benedikt Gräler4, Edzer Pebesma4, Melita Perčec Tadić5 andBranislav Bajat1
      So when you read that the global average for dec 2014 is 15.34 C, That means the following.
      if you randomly sample the globe with a perfect thermometer, an estimate of 15.34 will minimize your error.
      pick 1000 random places where you don’t have a thermometer. the prediction of 15.34 minimizes the error.

      • Steven,
        Thank you for that description. To be as fair as possible to BEST, it seemed they offered a reasonable evaluation of their analysis of 2014 being one the 5 warmest with some confidence. Kip Hansen asked that we stay on the discussion of the “uncertainty ranges, etc.” so to honor that as fully as possible would you be willing to discuss how the decision comes about to title the work product as” The Average Temperature of 2014 from Berkeley Earth” if indeed it’s predictive in nature and not inteded to be what the title suggests? In other words, why not title it The Predictive Average……….http://static.berkeleyearth.org/memos/Global-Warming-2014-Berkeley-Earth-Newsletter.pdf?/2014
        MET seemed to provide a more reasonable description of their work and confidence levels. NOAA and NASA not so much when taking even the most cursory view of their confidence levels leading to either a clear plan to mislead when compared with their headlines, or to provide AGW propaganda while lacking good scientific commentary.

      • Back before the new math, it used to be the case that the arithmetic mean was also a least squares best estimate, which would minimize randomly distributed errors. You may believe that methods that involve kriging, neural networks and machine learning techniques will produce a result that minimizes errors, but don’t expect me to believe that crap.

      • In point of fact it is not actually an average of temperatures at all.
        Although most people who follow the climate debates dont get this ( in fact most guys who produce these averages dont get it )

        I understand and agree to a certain extent with your point, though I find your assertion that even the guys who do the calculations don’t understand what it is they are calculating kind of amusing. That said, your point doesn’t change mine. My point is not about how you calculate an average for a given point in time, but what the change in that average implies. Call it an average, call it a prediction of a randomized measurement, as that measurement changes over time, due to the non linear relationship between temperature and power, the computed change is even less meaningful than the error bars would suggest. The change in the value cannot represent the change in energy balance because simple physics requires that cold temperature regimes (night, winter, high latitude, high altitude) are over represented and high temperature regimes (day, summer, low latitude, low altitude) are under represented.
        The raw value of the prediction as you have illustrated it is one thing, the change in that value another thing. That change isn’t directly related to the metric of interest (change in energy balance), no matter how you define it.

      • So lets take Antarctica as a place where we have every sparse data. You are claiming that an average temperature of 15.34C minimizes the estimation error. Highly unlikely particularly in the Antarctic winter. It is absurd to suggest that the temperatures at the South Pole and Death Valley can be considered as random numbers drawn from the same distribution with the same mean and the same variances. This is why people work with changes in temperature and not the actual temperatures themselves.

      • @bones : The fact is that using data infilling through the aforementioned approaches is not really new math. These approaches have been used for a considerable length of time. However, the problem here is that many people apply them without understanding the implications. Many of these approaches CANNOT be shown to minimize error – except under very specific circumstances. For example, ordinary Kriging is only an unbiased estimator if the process is stationary. What is worse is that there really is no justification for treating climate data as a stochastic process at all.
        In other disciplines, when we use these techniques for data analysis, we provide examples where they work and leave it to the user to decide about its appropriateness for their problem. But we make NO claims about the optimality of the approach. Because we know that we cannot do so. However, I have read FAR too many climate science papers that use advanced approaches solely for the purpose of justifying a claim about the underlying PROCESS that generated the data! If few claims can be made about the statistical characteristics of the DATA itself when using these approaches, almost nothing can be said about the PROCESS from the data.

      • @ Steven Mosher & davidmhoffer
        Just trying to get a handle on your points and the difference between them.
        Steven Mosher – what you are saying is similar to a case where you had two stations one in the tropics averaging 30° and one near the poles averaging 0°. Then the best estimate of the global average temperature would be 15° as that would minimise the error between the measurements?
        davidmhoffer – From an energy balance point of view, the best estimate for global average temperature would be the temperature of a sphere at uniform temperature having the same net energy balance?

      • mathematically, it is a prediction. It is a prediction of what you would measure at unvisted locations.

        On that basis, sampling theory will return a more realistic value than the current practice of adjusting stations to appear static.
        Simply assume that every station reading is one-off. That the station itself may move or otherwise change between one reading and the next, and any attempt at adjustments to create a continuous station record will simply introduce unknown error.
        There is no need. Since you are predicting the value at unknown points, a sample based on known points will suffice, while eliminating the possibility in introduced errors. All that is required is a sampling algorithm that matches the spacial and temporal distribution of the earth’s surface.

      • All that is required is a sampling algorithm
        ===========
        Pollsters would be turning in their grave if we did sampling the way climate science does. In effect climate science takes individual people, and instead of sampling them, tries to build a continuous record of their views over time. Every time their views jump sharply, they assume the person has moved, changed jobs, etc, so they adjust the persons views. Then they add all these adjusted views together to predict who will win the next election.

      • Reply to Steve Mosher ==> Does BEST have an estimate of what the expected error is that is being minimized? Is it +/- 0.1°C? +/- 1°C? more? less?

      • ferdberple
        February 2, 2015 at 7:08 am

        On that basis, sampling theory will return a more realistic value than the current practice of adjusting stations to appear static.
        Simply assume that every station reading is one-off. That the station itself may move or otherwise change between one reading and the next, and any attempt at adjustments to create a continuous station record will simply introduce unknown error.
        There is no need. Since you are predicting the value at unknown points, a sample based on known points will suffice, while eliminating the possibility in introduced errors. All that is required is a sampling algorithm that matches the spacial and temporal distribution of the earth’s surface.

        Fred,
        I went about this in a different way, I use the stations previous day’s reading as the baseline for creating an anomaly. Then I look at the rate of change at that station in small to large areas. No infilling, no homogenizing other than averaging the rate of change for an area. For Annual averages I make sure an included station has data for most of the year.
        You can read about it here
        http://www.science20.com/virtual_worlds
        code and lots of surface data
        https://sourceforge.net/projects/gsod-rpts/files/Reports/

      • Hi Stephen,
        Lots of doubts have been expressed about the ‘infilling’ algorithms, or as you express it predictions for a location that doesn’t exist.
        I have no idea whether this is even possible but couldn’t you randomly exclude locations and see how close the predictions are to measurements for those locations?
        Would that not also feed into your algorithms to improve the predictive skill?
        This is an honest question with no hidden agenda, I am genuinely curious.

    • I agree with you David. Averaging a quantity that is non linearly related to anything else is just asking for trouble.
      And the error in this case acts to reduce the calculated rate of radiation from earth’s surface from what it really is, thus bolstering the notion of warming. The hottest tropical deserts in summer daytime Post noon radiate at about 12 times the rate for the coldest spots on earth, at their coldest. Cold places do very little to cool the earth.
      g

      • Reply to GES ==> While I agree fully with the idea that “Averaging a quantity that is non linearly” produced is asking for trouble, I am try to get viewpoints on what a series of measurements over time that can all be covered with the Uncertainty Range for the metric means scientifically.
        What do you think on that issue?

    • David, I raised a similar point on the Talkshop recently:
      ““But in the climate rising temperature restores the balance between incoming and outgoing radiation. Warming acts against the feedbacks. It damps them down.”
      Do you mean the T^4 in SB laws? I really wonder whether the climate models take into account the enhanced T^4 effect during the SH summer. I think they use the 1/R^2 sun distance to adjust the 1AU value for TSI, but this higher power input that raises the SH summer temps then has a knock-on effect on the rate of outward radiation. Although the increase in outward radiation is fairly close to being linear for small increments in temp (1 degree more for SH summer than NH summer?) it isn’t exactly linear, especially being a fourth power.
      If this isn’t being accounted for then it’s where some of the missing heat is going- out into space during the SH summer.”
      MSimon replies to this with a few additional thoughts a few comments later.

  6. According to the Met Office:
    “It is not possible to calculate the global average temperature anomaly with perfect accuracy because the underlying data contain measurement errors and because the measurements do not cover the whole globe…”
    I would further qualify that by suggesting that there can be no error because there is no real entity to be measured. There is no such thing as a real temperature anomaly, no measuring station real temperature anomaly, and certainly no real global average temperature anomaly to be measured. The ” temperature anomaly” is a comparison to a modelled past baseline temperature at each location- it is a convenient fiction.
    So we need to establish what exactly is the REAL parameter before we can estimate uncertainties.

    • It seems to me that they expect a meaningful average based on poorly collected and therefore meaningless data. I think they are saying the correct answer can be found by taking the average of the incorrect answers. Sounds more like Voodoo than science.

  7. I read somewhere the the measurement error of surface thermometers used in the NOAA sites was about +/ – 0.1 degrees C. I don’t understand how HADCRUT could not have a greater error range. What am I missing?

    • Bob, taking multiple measurements tends to reduce error. Think of a single weather station. At any given moment the instantaneous temperature measurement will be off +/ – 0.1 degrees, but those errors can generally be expected to fall within a normal distribution unless the damn thing is situated next to an air conditioner. The more measurements recorded by that single station throughout the day, the lower the expected error when computing the daily mean. Even if the station only reports a min and a max on a daily basis, one can still roll the median values up to a monthly mean with a lower expected error than the instrument itself is capable of.
      When it comes to doing annual means for the entire globe, sample size still helps, but it’s not as clear cut as the single-station case, not least because number of samples and spatial distribution isn’t constant throughout the entire record. So that’s the interesting and … difficult … part of this discussion. While an individual data product might claim a given error range in the hundredths of degrees, it’s child’s play to compare two different products and find short term discrepancies in the tenths. Frex, the largest annual discrepancy between HADCRUT4 and GISTemp is 0.13 degrees and the standard deviation of the annual residuals from 1880-2014 is 0.05.

      • Good explanation. A simpler one that gets the point across about averaging errors is rolling a dice.
        We know that the average value when you throw a dice is 3.5 – however the “reading” from the dice could be anywhere between 1 and 6 so the “error” on an individual “reading” (throw of the dice) is +/- 2.5. However the more times you throw the dice and average the “readings” the closer the result will be to the true “reading” of 3.5. In other words the more times you sample the more the random errors cancel each other out. The “average of the errors” with an infinite number of throws will be zero. The more throws of the dice the lower “average of the errors” so the more confidence you can have in the result.
        So if you are using the same thermometer with the same inherent random error of +/-0.1c to measure the temperature in a 100 different places, the chances of all the measurements being +0.1c (throwing a six one hundred times) is extremely low. The random errors are “throws of the dice” and when averaged over all 100 instruments those errors will cancel out and tend towards zero.
        This only applies to multiple separate thermometers. An individual thermometer may have a systemic error meaning all its readings are +0.1c .-This is another reason to use temperature anomalies which dispense with the absolute temperature and just measure the change.
        The misleading bit in all the press hype is that it is not a “global average temperature” at all, it is a global average change in temperature. How useful it is to average a change from -25c to -24c in the Arctic and a change from +24c to +25c in the tropics I will leave others to judge!

      • Reply to Brandon Gates ==> Averaging (getting a mean) only reduces original measurement error for multiple measurement of the same thing at the same time. 100 thermometers in my yard, polled at exactly noon, with the results averaged will give me a more accurate temperature.
        Creating “means” for multiple measurements of a thing at different times does not reduce Original Measurement Error, it only disguises it. One still has to deal with the Original Measurement error itself in the end. This is another of my personal projects — it will be about as popular as my “Trends do not predict future values” post — in other words, viciously attacked by all comers, despite being true.

      • “The more measurements recorded by that single station throughout the day, the lower the expected error when computing the daily mean. Even if the station only reports a min and a max on a daily basis, one can still roll the median values up to a monthly mean with a lower expected error than the instrument itself is capable of.”
        Nope! This statistical operation does not lower the error of the instrument itself. That error is a physical characteristic of the instrument. Statistical operations performed on data collected from an instrument do not reach back through time and space to correct physical error sources in that instrument!
        Those statistical operation merely improve the PRECISION of the measurement. Precision and accuracy are independent characteristics. Precision refers to how fine the unit division are that we are able to record. Accuracy refers to how close those unit divisions are to true. Performing averaging on data value to reduce apparent noise in data collected over time from a thermometer or other measurement instrument can improve its precision. However, that improvement is only true and useful if the character of the noise is well understood and that improvement has been validated to achieve a correct value.
        Regardless of how accurate and precise we can make current temperature observations, our century old observation accuracy remains no better than about 1 degrees Celsius. When the starting point on your trend line has an accuracy of plus or minus 1 degree Celsius, claiming the an accuracy for the slope of that trend better than 1 degree is bogus.

      • Kip Hansen,

        Averaging (getting a mean) only reduces original measurement error for multiple measurement of the same thing at the same time. 100 thermometers in my yard, polled at exactly noon, with the results averaged will give me a more accurate temperature.

        As TLM already pointed out, the ultimate goal of this exercise is to arrive at a gridded mean anomaly product on monthly and annual time frames. Not high noon on June 20th, 2014 in Topeka Kansas. One needn’t homogenize for the answer to the latter, just look it up. Thing is, that tells you butkus about what 30+ year global trends are doing. Climate and weather are apples and oranges in much the same way that precision of a single measurement is a completely different animal from error estimates for tens of thousands of observations.

        Creating “means” for multiple measurements of a thing at different times does not reduce Original Measurement Error, it only disguises it.

        Of course not. The only thing which reduces original measurment error is better instrumentation. Since we can’t go back and do it over with the latest in high-precision thermometers, we’re pretty much stuck. None of that changes the reality that more samples leads to better estimates.

        One still has to deal with the Original Measurement error itself in the end.

        Ya. Error bars. They get smaller as n gets bigger. That’s why weather station data tell you how many observations were used to calculate the daily summary statistics.
        GaryW,

        Statistical operations performed on data collected from an instrument do not reach back through time and space to correct physical error sources in that instrument!

        No kidding. And for cripes sake, the local real temperature often fluctuates from minute to minute more than the precision of the damn instrument. Modern ones anyway. Best we normally get is what, hourly data? Do we need to go to picoseconds to keep you guys happy? It’ll cost you.

        When the starting point on your trend line has an accuracy of plus or minus 1 degree Celsius, claiming the an accuracy for the slope of that trend better than 1 degree is bogus.

        See again: HADCRUT4 is not intended to tell you how cold it was last night in Mankato, Minnesota to plus or minus a gnat’s nose hair. Best it will do is give you an estimate for a grid square of the monthly min, max and mean. Since the subject of this thread is the global mean anomaly, you need to be thinking about the number of grids, the number of thermometers in each grid, the number of days in a month, and the number of hours in a day. Think law of large numbers, and again review the concept that climate is the statistics of weather over decades lengths of time, not how warm it was three point two oh six seconds ago to three decimal places.

      • Gates says:
        …not how warm it was three point two oh six seconds ago to three decimal places.
        But that is exactly the kind of argument we always see from the warmist side. Even if you accept the astounding accuracy claimed, which records global T to within tenths and hundreths of a degree, the planet’s temperature has fluctuated by only 0.7º – 0.8ºC over the past century and a half.
        That is nothing! Skeptics are constantly amazed that such a big deal is made over such a tiny wiggle.

      • Reply tyo Brandon Gates ==> “Error bars. They get smaller as n gets bigger.” That is true only in statistics and for Confidence Intervals. Original Measurement Error can not be reduced by division or averaging when the measurement is of different things at different times. If OME is +/- 1°C for the individual measurements (of different things at different times) then in the end, you have your metric mean +/- 1°C. You can’t make it go away through arithmetic.
        The Met Office is surprisingly candid on this point, admitting forthrightly:

        The accuracy with which we can measure the global average temperature of 2010 is around one tenth of a degree Celsius. “

        This is what they consider the Original Measurement Error, NOT a statistical Confidence Interval, but a statement of accuracy of measurement. They state it as a Maximum Measurement Accuracy which is the obverse side of the Original Measurement Error coin. Even the entire global data set of two entirely different metrics (Global Air Temperature at 2 meters) and Global Sea Surface Temperature can not erase the Original Measurement Error.

        • Kip Hansen commented

          Reply tyo Brandon Gates ==> “Error bars. They get smaller as n gets bigger.” That is true only in statistics and for Confidence Intervals. Original Measurement Error can not be reduced by division or averaging when the measurement is of different things at different times. If OME is +/- 1°C for the individual measurements (of different things at different times) then in the end, you have your metric mean +/- 1°C. You can’t make it go away through arithmetic.
          The Met Office is surprisingly candid on this point, admitting forthrightly:
          The accuracy with which we can measure the global average temperature of 2010 is around one tenth of a degree Celsius. “
          This is what they consider the Original Measurement Error, NOT a statistical Confidence Interval, but a statement of accuracy of measurement. They state it as a Maximum Measurement Accuracy which is the obverse side of the Original Measurement Error coin. Even the entire global data set of two entirely different metrics (Global Air Temperature at 2 meters) and Global Sea Surface Temperature can not erase the Original Measurement Error.

          This is a question I’ve had for a long time. NCDC’s GSoD data set is said to be to 1 dp, so 70.1F +/- 0.1F for instance was yesterday’s temp. On the same station today, it’s 71.1F (+/- 0.1F). But, if I compare them it’s 70.1 (70.0-70.2) – 71.1 (71.0-71.2) so the difference is 1 +/- 0.2 because they add, right?
          But what happens if I do the same for a 3rd day at that same station, 72.1F (+/-0.1), when calculating the difference AB and then BC, B can’t both simultaneously be both +0.1 and -0.1, ie AC has to maintain X +/- 0.2 no matter how many subsequent measurements we string, as long as they are continuous, correct?
          I believe when you average anomalies that are base lined against another average of the same measurements with +/-0.1 precision (do I have this correct, precision as apposed to accuracy)?
          So, when I average a large number of the differences as I described above together, doesn’t my precision increase? But to what (honestly I’m not sure)?
          But I also don’t think my accuracy has increased beyond +/-0.1 at best, if it’s really not +/-0.2 or worse.

      • I commented:

        I believe when you average anomalies that are base lined against another average of the same measurements with +/-0.1 precision (do I have this correct, precision as apposed to accuracy)?

        I didn’t finish this, got side tracked.
        When you calculate the anomaly on a baseline of averaged measurements, don’t the errors add?
        So today’s 71.1 +/-0.1 is compared to the 30 year average of measurements collected to 1 dp, so wouldn’t each anomaly be X.x +/- 0.2?

      • dbstealey,

        But that is exactly the kind of argument we always see from the warmist side.

        That’s exactly why this warmie cringes when “the hottest year evah” [1] is uttered by his fellows.

        Even if you accept the astounding accuracy claimed …

        I think techncially we’re talking about precision, as has already been pointed out. Especially since we’re dealing with anomaly calculations which moots what any individual absolute readings are.

        … which records global T to within tenths and hundreths of a degree, the planet’s temperature has fluctuated by only 0.7º – 0.8ºC over the past century and a half.

        Considering that 3-4 degrees lower and we’d be in the neighborhood of an ice age again, 0.8 C is up to a quarter the way there … in the opposite direction. The Eemian interglacial was 2 degrees higher than the Holocene. Sea levels were some 6-8 meters higher. 0.8 C is a puny 40% of 2 degrees. Not even worthy of being called a pimple on a midget’s bottom.

        Skeptics are constantly amazed that such a big deal is made over such a tiny wiggle.

        Alarmunists [2] note with amusement that contrarians are fascinated with the puny 0.25 degree discrepancy between CMIP5 and observations. What is The Pause if but a tiny wiggle in the grand scheme of things?
        We could have endless amounts of mirth discussing who’s trying to have it both ways here, yes?
        ————————
        [1] 38% chance according to one press release. I’ve already forgotten which. GISS I think.
        [2] We weren’t supposed to talk religion on this thread according to its author. Yeah right, like that was ever going to happen.

      • Reply to Mi Cro ==> On additive errors in subsequent measurements. 70.1 (70.0-70.2) – 71.1 (71.0-71.2)
        Today 71.1 (+/- 0.1) Yesterday 70.1 (+/- 0.1)
        Range Range
        71.2 — 71.0 70.2 — 70.0
        Averages UP Average Average of DOWN
        70.7 70.6 70.5
        70.6 (+/- 0.1)
        Average the actual measurements, average the ups (maximum if both error are up), average the downs (minimums if both are down). The answer is the mean with with original measurement error still in place.

      • Kip Hansen,

        “Error bars. They get smaller as n gets bigger.” That is true only in statistics and for Confidence Intervals.

        Yup. Statistics is how we quantify estimates of measurement, and other, error. One way to improve estimates based on measurements is to gather a bunch of them and take a mean. This is basic, standard, old as the hills statistical practice. The specifics of this particular application are:
        http://www.metoffice.gov.uk/hadobs/crutem3/HadCRUT3_accepted.pdf
        Measurement error (ϵob) The random error in a single thermometer reading is about 0.2 °C (1σ) [Folland et al., 2001]; the monthly average will be based on at least two readings a day throughout the month, giving 60 or more values contributing to the mean. So the error in the monthly average will be at most 0.2/√60 = 0.03 °C and this will be uncorrelated with the value for any other station or the value for any other month. There will be a difference between the true mean monthly temperature (i.e. from 1 minute averages) and the average calculated by each station from measurements made less often; but this difference will also be present in the station normal and will cancel in the anomaly. So this doesn’t contribute to the measurement error. If a station changes the way mean monthly temperature is calculated it will produce an inhomogeneity in the station temperature series, and uncertainties due to such changes will form part of the homogenisation adjustment error.
        0.2 °C single measurement error (best case!) improves to 0.03 °C IN AGGREGATE … nearly a whole order of magnitude. The HADCRUT4 paper you linked to contains similar language, and heavily references Brohan (2006).

        Even the entire global data set of two entirely different metrics (Global Air Temperature at 2 meters) and Global Sea Surface Temperature can not erase the Original Measurement Error.

        The estimated error of a mean is not original measurment error, the latter which nothing will ever change. The former can be estimated. By the law of large numbers and the central limit theorem, a sufficiently large sample will easily have a smaller estimated error of the mean than one single measurement. The arithmetic for that is laid in the text I quoted above. I don’t know how much more clear I can be on the distinctness between these two things. Apples cannot be conflated with oranges.

      • TLM,

        A simpler one that gets the point across about averaging errors is rolling a dice.

        Which is a very elegant illustration of the concept which I greatly appreciated reading. Unfortunately I note that from the very first reply that it’s anything but a sure-fire way to get the point across.

      • Reply to Brandon Gates ==> We will have to leave this issue for another time as it is not resolving.
        We are talking past one another in some way.
        I suggest that we might refer to someone like Wm Briggs for the statistical explanation of why OME must be included in the statement of results, even mathematical and statistical means. If OME is +/- 0.1°C, then whatever you derive from any number of these data (ten or ten million) then you must state MyMean(+/-0.1°C).
        I do understand that you believe this is not the case — well, you and some others — but you are talking, I think, about something like the precision of a derived mean. Derive all you like, at whatever precision you like, but at the end, you must add the +/- 0.1°C of the OME/Original Measurement error/Uncertainty Range for the metric and method to have a scientifically true statement.

    • The Dept of Commerce directive (National Weather Station Instruction 10-1302) for air temperature measurement specifies a standard for minimum and maximum temperatures between -20 and 115 degree Fahrenheit of +/- 1 degree Fahrenheit (2 degree outside that range). So a single daily minmax temperature has a 90% confidence accuracy spec of +/- 2 degree F at best.
      The actual uncertainty (as opposed to the single reading spec) would be determined statistically.
      The precision, as opposed to accuracy, is a specific to the construction of the thermometer. A precision of +/- 0.1 degree (F or C) would be unremarkable.

  8. If this were a process control chart the yellow band would be the control limits for the process.
    It isn’t a process control chart.
    I’d say the yellow band is misleading unless it is clearly identified.

    • Reply to Greg Locock ==> You are right of course, but there is a limited amount of data that can be typed onto the chart. Fully described in the text as:

      This graph is created from data directly from the UK Met Office, “untouched by human hands” (no numbers were hand-copied, re-typed, rounded-off, krigged, or otherwise modified). I have greyed-out the CRUTEM4 land-only values, leaving them barely visible for reference. Links to the publically available datasets are given on the graph. I have added some text and two graphic elements:
      a. In light blue, Uncertain Range bars for the 2014 value, extending back over the whole time period.
      b. A ribbon of light peachy yellow, the width of the Uncertainty Range for this metric, overlaid in such a way as to cover the maximum number of values on the graph.

      That seemed a bit much for the graphic. – kh

  9. Your third chart is interesting. Granting the Met office their claimed +/- 0.1 deg C error bars, theHadCrut 4 averaged temps from 1997 to 2014 are all statistically indistinguishable (fall within the error bars, save for 1999 and 2000, which fall slightly below. A nice demonstration of the ongoing Pause or Hiatus in warming.
    And I agree with those who suspect that the +/- 0.1 deg C error bars are optimistic.
    Cheers — Pete Tillman
    Professional geologist, amateur climatologist

  10. If I were measuring a parameter in my lab that plotted like that my conclusion would be that nothing had changed between the ordinate and abscissa.

    • Reply to Scott Scarborough ==> Thank you for your input — you say that the ordinates (in this case, average global temperatures) do not significant change over the time period shown. (Which mirrors my conclusion as well).
      What is your field of research?

  11. From the text:
    “… above the long-term (1961-1990) average.
    This makes no sense. The 30 year period in the sense of “climate normals” is usually the most recent – ending in zero – set of 30. That is fine with me when used by the local paper or TV, and those folks now use 1981 – 2010.
    Because the average (mean) of the more recent set will be higher than the out-of-date set, a person might get the idea that they are not being honest.
    Further, for this sort of research, why does not long term include all the data up through the period of interest. That would include 2014.
    And finally, it seems to me assumptions about randomness and distribution type are being violated – but that is above my pay grade.

    • Reply to John F. Hultquist ==> Yes yes yes….we see this type of thing all the time. Different folks use different forks ( in this case, 30-year time periods).
      The Intro graph at the top is from Climate.gov and uses the 2oth Century (1901-2000) average!

  12. There are a couple of points to make before anyone attempts to answer that question.
    The first is that the uncertainty range as shown is simply a conventional artifact of the distribution in the errors. In fact what you have is a mean and a probability distribution for the estimate of global average temp based (solely) on measurement errors. It is perhaps more useful to imagine a z axis that shows the probability density when working with these estimates.
    Second the answer is going to be all about the question. If the question is what are the odds that a prior year exceeded 2014 there are a series of probability calculations to be done comparing each year’s probability density with 2014’s. The diagram (and in particular a)) doesn’t help here. If the question is what is the strength of evidence that we are in fact measuring the same quantity (each year’s sample having the same distribution of errors), then b) is suggestive, but little more.

    • HAS,
      “In fact what you have is a mean and a probability distribution for the estimate of global average temp based (solely) on measurement errors.”
      I don’t think it is. It is based on Brohan 2006. Measurement error is a small part, because of the large number of readings in the average. The main component is a spatial sampling uncertainty, based on the finite number of points sampled. IOW, the range of values you might get if you could repeat the measurements in different places.

      • Nick
        It is a semantic point. In order to measure the global average temp you need to interpolate values where you don’t have readings. This IMHO is part of the process of measuring global average temps.
        I note that Mr Mosher is dining out on a similar point above, but elsewhere I’ve seen him run the line that everything is an inference from reality, from direct measurement to arcane model outputs.

      • Reply to Nick Stokes and HAS ==> The Met Office states clearly that their Uncertainty Range is based on two papers:
        Quantifying uncertainties in global and regional temperature change using an ensemble of observational estimates: the HadCRUT4 data set
        Colin P. Morice, John J. Kennedy, Nick A. Rayner, and Phil D. Jones
        and
        Reassessing biases and other uncertain ties in sea-surface temperature observations measured in situ since 1850, part 2: biases and homogenisation
        J. J. Kennedy , N. A. Rayner, R. O. Smith, D. E. Parker, and M. Saunby
        Both available as linked, online, free.

      • Reply to Nick Stokes ==> You might re-check that. The SST paper doesn’t even reference Brohan (2006 or 2009). The HADCRUT4 paper explicitly states it abandons the Brohan 2006 method and instead ” the
        method used to present these uncertainties has been revised. HadCRUT4 is presented as an ensemble data set in which the 100 constituent ensemble members sample the distribution of likely surface temperature anomalies given our current understanding of these uncertainties.”
        Links to the papers are above.

      • Kip,
        “The HADCRUT4 paper explicitly states it abandons the Brohan 2006 method”
        That is the relevant paper – your post is about HADCRUT4. And they have modified the method. But I was talking about the uncertainty model. And of that they say:
        “The models of random measurement error, ?, and sampling error, ?, used in this analysis are exactly as described in Brohan et al. [2006].”
        And that is where you will find the full description.

      • Reply to Nick Stokes ==> So, I think we agree now … HADCRUT4, the data set being discussed here, does not use the Brohan 2006 model, but instead:
        “The uncertainty model of Brohan et al . [2006] allowed conservative bounds on monthly and annual temperature averages to be formed. However, it did not provide the means to easily place bounds on uncertainty in statistics that are sensitive to low frequency uncertainties, such as those arising from step ch
        anges in land station records or changes in the makeup of the SST observation network. This limitation arose because the uncertainty model did not describe biases that persist over finite periods of time, nor complex spatial patterns of interdependent errors.
        To allow sensitivity analyses of the effect of possible pervasive low frequency biases in the observational near-surface temperature record, the method used to present these uncertainties has been revised. HadCRUT4 is presented as an ensemble data set in which the 100 constituent ensemble members sample the distribution of likely surface temperature anomalies given our current understanding of these uncertainties. This approach follows the use of the ensemble method to represent observational un
        certainty in the HadSST3 [Kennedy et al ., 2011a, 2011b] ensemble data set “

  13. I would assert that the measurement error is only useful in determining the variance of the measurement, not the process. Therefore, before claiming that there is any change in the underlying process, one needs to also take into account the variance of the process. Clearly the process variance is MUCH greater than the measurement error.
    What matters here is the process, not the measurement. Until the variance in the process can be accounted for, these temperature measurements, however small the measurement variance, CANNOT be used as evidence of a warming trend that is attributable to increasing CO2.
    Climate scientists, almost universally, have this data analysis process backwards. They use this data (with small measurement error) as EVIDENCE for their claim when in fact the uncertainty (variance) in (their assumptions about) the process is such that they cannot even make the claim – let alone point to evidence.

    • Jeff I am with you on this.
      “Climate scientists, almost universally, have this data analysis process backwards. They use this data (with small measurement error) as EVIDENCE for their claim when in fact the uncertainty (variance) in (their assumptions about) the process is such that they cannot even make the claim – let alone point to evidence.
      The problem I have with what I like to call “the physical evidence” is that the average is a mixture of data and interpolation.
      Suppose we made no interpolations at all – just used the available data. There would be a calculated average temp with some small degree of uncertainty. If the temperature changed one could make a claim that the places where the temperature rose (increased) were, per unit area, higher or lower than some other parts of the total group. Well, that is OK, however the final number and any change in it over time still has value because it is based on measurements. Changes are real.
      Area-weighting is obviously attractive but the temperature varies within an area because it is the topography (etc) that has strong effects. Interpolation of data trying to account for topography is risky and increases overall uncertainty so it should be avoided.
      If the average temperature were calculated without area weighting, the result would be inaccurate, but as precise as possible because it involved no modelling, which is to say, no guessing what the data should be.
      We don’t really care what the actual temperature is because no one experiences it. We are constantly in flux. Our experience is a daily rise and fall between 2-3 degree ‘limits’. Now suppose that average temperature changed. The interesting thing is the change, not the magnitude, and not any modelling based on raw data applied to unmeasured areas. My point is that a comparative calculated temperature is more valuable and repeatable as it contains no modelling.
      Repeating 1000 data readings at the same place would be valuable. Estimating what the numbers would be at 1000 points that were not measured cannot possibly be as accurate as using 1000 real measurements.
      Trying to extend the temperatures confidently to other unmeasured places is a stretch when the values of the changes are so small. I am interested to see the rise in any set of values – as many as possible and as untouched as possible. I am not immediately concerned with the rate of rise for the whole (which requires measuring the whole). I want to see a measured rise across as wide a spectrum of conditions as possible. If there is a measured rise, we have something to talk about. Rejigging past temperatures, area weighting and interpolation are useful for forecasting what the numbers should be in the unmeasured areas but that is not as helpful as a comparative contract. It doesn’t have to be ‘certified’ but it has to be as comparatively precise and accurate as possible. It is Delta T that matters, not T. The best Delta T comes from measurements, not modelled numbers.

      • Exactly. The assumption of anthropogenic warming has already been made by most of climate science. Therefore they assume that they have already accounted for all sources of variance other than measurement error through the use of their models. As a result we have the situation that we have today in climate science – the models are correct and it is the data that is wrong.

      • Crispin in Waterloo
        February 1, 2015 at 8:29 pm

        The problem I have with what I like to call “the physical evidence” is that the average is a mixture of data and interpolation.

        This is my problem too. Due to the sparsity of actual measurements the so called Global Average Temperature is more based on the interpolation and homogenization algorithms. Averaging only removes errors if the errors are random not systemic and using one set of algorithms inherently leads to systemic errors. So a validation approach should be taken.
        Run the interpolation and homogenization algorithms for every station that you have accurate measurements for as if that station was not there. Then compare the values generated to the actual measured values. This will result in errors of various sizes. Assuming that the error bars show the potential errors in the from these algorithms then the largest of these errors both positive and negative – worldwide – become the error bar values of the ‘Global Average Temperature’. This would be a far more real world defensible approach,
        Those coming up with ‘better methods’ for assessing error should always validate their mathematical models against real world data. They are climate _scientists_ after all

    • Reply to JeffF and Crispin ==> All good points…for another essay though.
      Appreciate your input.
      Nonetheless, if these were your data points from an acceptable method and process, what would their lying within the Uncertainty Range band mean in your fields?

      • OK, so IF we had a justifiable reason to assume that the data was distributed in a certain way, we could employ a statistical test on the RAW DATA to determine whether or not any data point was statistically significantly different from any other data point. That is really just basic hypothesis testing. I could therefore make a justifiable claim that, for example, 2014 was statistically significantly warmer than 2012, etc. – but ONLY under the aforementioned assumptions. Depending on the distribution, I would use a different test. The focus, in my field, therefore, is in the validity of the CLAIM. Statistics is a formally defined mathematical discipline and as such, statistical claims are no different than the results from calculus or linear algebra, in my opinion.
        Now, IF you start monkeying with the data or adding in additional assumptions, you are likely inserting uncertainty, not reducing it. What I would tend to do there is to add confidence to my “adjusted” data, to reflect the fact that I am no longer certain that the data is reflecting the actual underlying process – because I am just “guessing” or modeling the process. At NO point would I assume that I was more confident in my adjusted data than the confidence I have in the raw data. That point is KEY. When I adjust the data, I am guessing at how to do it. Therefore my confidence decreases.
        As a final note, a claim that 2014 is warmest because I find, by whatever approach, that 2014 is the MOST LIKELY to be the warmest, when all of the differences are within the margin of error is, statistically, unjustifiable. If I state my assumptions and find that 2014 is not statistically significantly warmer than other years, then I CANNOT KNOW whether it is the warmest or not. My statistical test is just not good enough. End of story.

  14. As with many others, I think the claim of 0.1C error bars is unfounded and unbelievable. There are just too many sources of error. Further, even if currently that is true, there’s not a chance it is true for data pre-1940 or so. Does anyone know how the Met Office calculated this error bar?

  15. This is not a discussion of “Was 2014 the warmest year?” or any of its derivatives. Simple repetitions of the various Articles of Faith from either of the two opposing Churches of Global Warming (for and against) will not add much to this discussion and are best left for elsewhere.

    I have less expertise than you so I can’t add much other than to say I love the approach.

  16. “…The accuracy with which we can measure the global average temperature of 2010 is around one tenth of a degree Celsius…”

    I’ve stated it before and I’ll reiterate it here; that accuracy level is impossible and frankly irrational.
    Normally the first chore is to calculate the error possibilities for every datum/datum source.
    e.g. A temperature station capturing temperature / weather data,
    – the equipment installed, the related conditions surrounding the station,
    – the quality and training of the staff,
    – method and consistency of data entry,
    – any transcription of the data
    Not forgetting the time of day and frequency. These error statistics are cumulative; while initially centered on individual temperature readings, for many/most stations these error stats remain for long periods of time.
    As a quick translation, government climatologists have been actively analyzing and re-analyzing temperature records adding in corrections for many reasons.
    These corrections do not eliminate nor minimize the error! Instead they are somewhat definitive identification of error ranges; including these ranges as temperature adjustments effectively double error ranges whenever they are made because of assumptions.
    When adding temperature ranges from different stations, a more accurate station does not improve the error range for less accurate stations though it might decrease overall average station deviation.
    Whenever errors and deviations are calculated for information gained through a process, errors are calculated for every process component;
    e.g. data capture,
    – data storage,
    – data lookup,
    – data processing,
    – data transmission followed by storage, lookup, processing,
    – data analysis,

    And no! These stages are not error proof. Error rates for every stage in a process are multiplied against error rates in all other process stages for a cumulative process error. This is the process engineers use to determine accuracy, effectiveness and efficiency for industrial processes.
    If station placement or upkeep introduce a two to three degree error at certain times of the day, the final temperature anomaly can no be less than that two to three degree error. As Willis and Steven McIntyre have pointed out, Met Office reaching a .1C precision level does not mean that they’ve reached a .1C accuracy level; only that they’ve introduced enough numbers to overwhelm lower precision numbers.
    Bluntly, Met office and others are ignoring overall accuracy and error rates while claiming their pseudo precision as accuracy or error levels.
    Their error bars should include ranges for assumed temperature adjustments, observed station deficiencies, (all non-certified stations should be listed as unimproved stations until proven otherwise), Other processes should either have identified error rates or an error assumption based on sampling.

    • Yes. It is important to separate precision from accuracy. You can have low variation and all your measurements are still far from the real value. See http://www.mathsisfun.com/accuracy-precision.html
      The trend is more important than the actual value of the global average temperature which is difficult to calculate from the existing data. I would like to see trends of individual weatherstations and then calculate average global trend.

    • ATheoK, I agree.
      Global temperature is a meaningless number. If we fear heating or cooling, the things to track are the sectors of the planet where Ice Ages create glaciers. If these select areas are cooling, the glaciers and snow cover increases.
      This is why global warmists loved to talk about the melting glaciers and ice sheets until it was obvious, these are now growing rapidly in both the North and South Poles. So now they talk about ‘global temperatures’.
      Measuring this with a very fine scale so the tiniest of mathematical ‘rises’ of 0.001 are ‘detected’ is pure fraud. The ‘error bars’ are meaningless since we have no ‘global temperature’ but rather, a mosaic of temperature zones that never move up or down entirely in tandem.
      We do know all ice ages happen again and again, they are ten times longer than interglacials and that they all, without exception, end very abruptly like a light being turned on (that light being the sun).

    • ATheoK,
      This childish ignorance of accuracy and precision concepts extends through a lot of climate science.
      There are related questions about GCMs and ensembles. An individual modeller might make several runs, the error bars should be outside all of them. For a CMIP, all modelling centres should submit these (wide?) error bars so that an estimate of overall accuracy can be attempted. If they are so stupid as to take an ensemble average, as they do, and error estimates based on the statistics of precision, the most meaningful count is the one about the number of wrong steps they have incorporated into the fictitious result.
      I cut my teeth on similar concepts felated to ore grade estimates at develoling mines, from sparse drill hole assays. The significant difference is that it is easy to lose, or fail to make, large amounts of money if you let your heart rule over reality.
      If there is unceftainty, correct procedures will usually be found for uncertainty/error estimates in the tomes from the French Bureau of Weights and Measures.
      I have not ever seen that authority quoted in a cli sci paper.

  17. The graph you shows would be interpreted as 1998, 2005, 2010, and 2014 are statistically in a dead heat. There is nothing else that can be said from a science view point. For anyone who wants 2014 to the Hottest ever,” then it’s all just Feynman’s Cargo Cult Science for them.
    Joel O’Bryan
    PhD, UMass Med School, BioMedical Science
    BS, USAF Academy, Civil Engineering
    Footnote: I have long realized that the scientific truth in areas matters not to people who align with a “Progressive” ideology. That includes Professors whom I have worked with in the past.

    • Reply to Joel O’Bryan ==> Thanks for your professional biomedical viewpoint. You draw this conclusion from the top set of blue lines, which form the Uncertainty Range for the 2014 value? But you wouldn’t include 2006, 2007 or 2004? If not, why not?

  18. Statistical confidence intervals deal with sample size error and assume ceteris paribus for all other potential sources of error. Obviously, all other things are not equal and many other sources of error may be at work making the very small changes being discussed ridiculous for any decision making purposes.

    • Reply to Jim G ==> The Met Office Uncertainty Range is not strictly and only a statistical confidence interval. See their FAQ and the two papers quantifying this value linked in the FAQ answer.

      • Thanks, read it. Not real impressed as it is another model to estimate uncertainty and goes so far as to caution the reader to check against other sources. From what I was able to discern it does not cover the myriads of other sources of error that could be involved in producing these miniscule temp anomalies.

        • Jim G commented on

          Thanks, read it. Not real impressed as it is another model to estimate uncertainty and goes so far as to caution the reader to check against other sources. From what I was able to discern it does not cover the myriads of other sources of error that could be involved in producing these miniscule temp anomalies.

          So, we’re now augmenting the models of surface temp, with the output of gcm’s that are validated against the output of the surface temp model.
          That sounds like a splendid idea!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
          I noticed up above in warrenlb’s post http://wattsupwiththat.com/2015/02/01/uncertainty-ranges-error-bars-and-cis/?replytocom=1851270#respond
          that we have the error down under 0.01C, see it’s working already!
          You are all just jealous you didn’t come up with it!

    • JimG,
      The ceteris paribus violation that I find fascinating in global temperature sets is in the Time of Observstion TOBs corrections. Example, a temperaure max read at 0900 hrs probably reflects the previous day’s peak. That is OK at the Equator, where the term previous day has climate meaning. It is not the case at the Poles, where nights and days are half a year long. The concept has variation in accuracy as you move from Equator to Pole.

      • The entire concept of tiny anomalies in temperature having any usefulness based upon the data observations available is ludicrous on its face. The only thing even more ridiculous is making multi billion tax dollar expenditures and huge negative impacts upon national economies based upon these same observations, and of course, their adjustments of same.

        • Jim G commented on

          The only thing even more ridiculous is making multi billion tax dollar expenditures and huge negative impacts upon national economies based upon these same observations, and of course, their adjustments of same.

          It’s criminal.
          If this was just a bunch of naive scientists fooling themselves with simulators I would just laugh and poke fun at them, but when they decided to use them to save the planet by crashing modern society into a brick wall fulfilling greens fantasy of stopping human development, I take that personal.

  19. These statistics leave a lot to be desired. First, the measures come from a number of different sensors (IE land and ocean sensors), and measure not one but 1000’s of different locations each with its own response to short and long term oceanic/atmospheric weather pattern parameters. So for me, I can’t really say much about what the graph says. The data pool is so fraught with inconsistencies and lack of variable control as to be useless.

    • Pam, a friend of mine many years ago worked on a cattle ranch in Alberta and he described to me a Chinook in February that he could ride from 15F to 40F in a few dozen steps and weave in and out of the warm and cold air. There the accuracy of any measurement at a spot would be + or – ~12F.

      • I’ve seen more than that. In the Arizona desert, and many other places, temp inversions of > 20° F are common. Pilots that fly frost control missions over sensitive crops (which I did for years) deal with this all the time. The ground observer reports a temp of -1 C so you fly over him at about 50 feet altitude and a few seconds later his temp is +4 C. I have seen the temp from the runway to the top of the control tower at various airports vary by > 10° C many times at night. Point being, what is the “real” temp there?
        I should point out that the surface temp increase in frost control has nothing to do with engine heat. That’s trivial. It’s the downwash from the aircraft breaking up the inversion. The wingtip vortex tends to settle and spread out, dragging warmer air from above the inversion down to the ground and spreading it out.
        The larger point as it applies to this discussion is that if little ol’ me in a 4000 pound airplane can raise the temp of a 640 acre farm field by several degrees C, what effect on the temp records worldwide has the massive increase in air traffic and heavy jet traffic late night and pre-dawn at airports with official stations had? I strongly suspect that a LOT of airport based “Official Low Temps” were higher than they otherwise would have been had the air traffic not stirred up the inversion layers pre-dawn.
        So we can all blame UPS and FedEx for it… [grin]

      • Reply to Gary Pearse and Bill Murphy ==> Thank you both for your Real World experience input on very local temperature variance. I have experienced the same in the ocean with snorkling and scuba diving.
        Love the frost control flights story!

      • @ Bill Murphy, that makes me wonder what happens to measured air temperatures down wind of wind turbines.

  20. Glad you asked – you have scientific curiosity which seems to be lacking among those thousands of “climate scientists” of the 97 percent. Before I get down to specifics, let me remind you of the advice that Ernest Rutherford, father of the nuclear atom, gave to his colleagues. “If you find that your work requires statistics, you should have been doing something else.” Next, lets look at your CRU & Hadley Centre graph. It is worthless. HadCRUT, NCDC, and GISS have collaborated to falsify the temperature record since 1979. I discovered this this was done to the eighties and nineties temperatures when I was writing my book. I even put warning about it into the preface but nothing happened and nobody had any comments. After the book went to press I discovered that all three had used common computer processing and unbeknownst to them, the computer left identical footprints on all three data-sets. These comprise sharp upward spikes at thr beginnings of years. Comparing their temperatures to satellite temperatures reveals that they gave the eighties and nineties an upward slope amounting to 0.1 degrees Celsius in 18 years. Satellite data do not show this warming. ENSO was also active at the time and created five El Nino peaks, with La Nina valleys in between. To determine the mean global temperature in such a situation you have to draw a straight line from an El Nino peak to the bottom of a neighboring La Nina valley and mark its cente with a dot. These dots define the global mean temperature for the corresponding calender date. I did this for all the El Ninos in that wave train and found that the dots lined up in a horizontal straight line. This proves the eighties and the nineties were another no-warming zone, equivalent to the current hiatus/warming and equally as long. This means that the total no-warming time since the beginning of observations in 1988 is 36 years, three-quarters of the time that IPCC has officially existed. There really was a quick step warming there that started in 1999, in three years raised global temperature by a third of a degree Celsius, and then stopped. This is the only warming we have had since 1979. In the presence of the fake warming it is hard to find but you can easily find it in satellite records. It is responsible for all twenty-first century temperatures being higher than the twentieth century, with the exception of the super El Nino. But the fake warming triple alliance was not content to stop with the eighties and nineties biy continued their manufactured warming in the twenty-first century. This id obvious fom the graph thatbyou show. In satellite records the twenmty-first century is flat (with the exception of the La Nina of 2008 and El Nino of 2010 which cancel one anpther). The ground-based three, on the other hand, exhibit the same temperature rise they showed in the eighties and nineties. This rises the right hand end of the graph enough to make 2014 the warmest year according to their calculation. Or does it? In their graph the El Nino of 2010 is higher than the super El Nino of 1998 which is impossible. The two El Nino peaks are poorly resolved and it is obvious that only thanks to the continued use of fake warming is this reversal of temprerature values possible. The warmest place winner is thus the super El Nino of 1998 and not that lowly 2014 as advertised.

    • Dr. Richard Feynman said (wrote) in his famous “Cargo Cult Science” 1974 CalTech commencement address the following:

      Nature’s phenomena will agree or they’ll disagree with your theory. And although you may gain some temporary fame and excitement, you will not gain a good reputation as a scientist if you haven’t tried to be very careful in this kind of work. And it is this kind of integrity, this kind of care not to fool yourself that is missing to a large extent in much of the research in Cargo Cult Science.” (my bold)

      I think that Dr Feynman’s statement from 1974 is quite prescient for what we are seeing today in Climate Science, i.e. the new Cult Cargo Science. Mainstream NASA GISS, NOAA, and US Doe, and UK CRU, and Aus BoM scientists bask in the temporary limelight of “fame and excitement” of their latest alarmist rhetoric.. rhetoric that pleases their political paymasters, and green coterie, while their integrity slips away month-by-month, year-by-year, into the drain.
      Today’s Climate Scientists are lost in a wilderness of their own deceit, many in search of grants… rent-seekers, a temporary fame and excitement. They double down now on their dishonesty, hoping that nature will somehow prove them right, but each year’s passing makes their deceptions even more visible. They collaborate with data keepers to adulterate the data records to further the deception and preserve their reputations. History will not be kind to the Climate Science charlatans that infest NOAA< GISS, GISS etc.

      • Note: to those who are unfamiliar with Dr Feynman’s Cult Cargo Science analogy, I offer this explanation from the book, “The Pleasure of Finding Things Out – The Best Short Work of Dr. Richard Feynman.” By (of course) Dr Richard Feynman, Helix Books, Perseus Publishers, Cambridge, Mass, 1999:

        (From Dr Feynman’s 1974 commencement speech to CalTech):
        In the South Seas there is a Cargo Cult of people. During the war (WW2), they saw airplanes land with lots of good materials, and they want the same thing to happen now. So they arranged to make things like runways, to put fires along the sides of (those)”runways”, to make a wooden hit for a man to sit in, with two wooden pieces on his head like headphones and bars of bamboo sticking out like antennas – he’s the controller- and they wait for airplanes to land. They’re doing everything right. The form is perfect. It looks like it did before. But it doesn’t work. No airplanes land. So I call these things Cargo Cult Science, because they follow all the apparent precepts and forms of scientific investigation, but they’re missing something essential, because the planes don’t land.”

        The climate is not adhering to their precepts and forms of investigation. Precisely becasue they have failed to follow the scientific method and allow for alternative hypotheses, and indeed the validity of the null hypothesis, that the variability in temperature we are seeing is mostly natural in causation. Yet GISS, NCDC, DoE/LLNL modelers, and UK’s CRU, and all the other believers they have convinced to follow them, await for the airplanes to land.

  21. Well, since you asked for opinions from all fields I may as well have a go at this.
    The stated “accuracy” of the temperature record is in reality a precision and is in fact useless as a basis for comparing years against each other, especially over many decades.
    Lets say we have two yard sticks (meter sticks for folks outside the USA), one is made of wood and one is made of Invar (a iron/nickel metal alloy that is very stable size wise when the temperature changes). Each has 1000 divisions scribed along it’s edge which yields measurements as small as 1 millimeter. This is the precision of the yardstick. You can use either one (wood/metal) to measure a length and report the length to within +/- 1/2 mm (assuming your eyesight is good enough to see which scribe line is closest to what you are measuring).
    Now, the problem is that the accuracy of these two example measuring instruments is vastly different, the wooden one will shrink and swell as the humidity changes, perhaps by 5 percent (50 millimeters). But the Invar one will be very stable with temperature and humidity changes. Note that both instruments are fit for certain purposes, a wooden yard stick is probably fine if you are cutting cloth to sew together into a dress with tolerances of plus or minus 10 millimeters (a little less than half an inch) but you sure would not want to use one for constructing an aircraft.
    One way to reconcile this would be to calibrate each instrument against a standard instrument before each use. Of course that is costly and requires that the place where the instrument is used has the same temperature and humidity as the place where it is calibrated, not convenient at all.
    The other way to do this is to have very precise and stable instruments which can be calibrated for accuracy on a periodic basis. In industry, measuring instruments that are “mission critical” (i.e. if it’s wrong there could be loss of life, or limb, etc) are generally under a strict re-calibration procedure. In this type of system every instrument goes through a periodic recalibration. For example voltage meters are sent to a calibration facility every year and re calibrated. In the old days there usually some adjustable resistors (potentiometers) inside the meter. The meter would be calibrated against a single standard for that company and the resistors adjusted. Then a seal would be put on the instrument to tell if anybody changed it’s calibration. Modern voltage meters usually contain a microprocessor and some memory where calibration factors are stored. By changing the values of these calibration factors the reported measured value can be made to match the standard.
    Manufacturing processes for critical items almost always start with an instruction to list the “cal status” of all instruments used in the tests. No professional would sign off on test data for mission critical items with test instruments that are “out of cal”. This is called traceability, and you can be sure if an airplane crashes they will go back through all the manufacturing and maintenance records to make sure somebody did not adjust the auto-pilot (for example) using a voltage meter that was “out of cal” (not just a hypothetical example, it has happened)
    The whole problem with the temperature data sets is that this critical calibration step has always been missing. These things got installed with very little in the way of calibration and traceability involved, They where mostly used at airports in the beginning to make predictions about how the weather might change during an upcoming flight. The temperature sensor network was never designed to be accurate enough to determine the “hottest year”.
    Until and unless the climate science community can provide full traceability, i.e. how often where all these sensors calibrated, including in-situ errors (AC units installed after the sensor was installed) we can safely assume that the “Accuracy” of these measurements is probably ten or one hundred times worse than the stated precision. Therefore the temperature data is probably only good to +/- 1 degree (F) at best, over the length of the record (150 years or so) it is probably only good to 2 or 3 degrees and that is being generous.
    Arguing about one year being 0.01 degrees “warmer” than all the rest of the temperature record is silly.
    Cheers, KevinK.

    • And when you look at Anthony’s revelatory work on the stations in AC exhausts, asphalt parking lots, etc that are used by climatologists in the US, knowing that these are the best in the world, I would say your 2 or 3 degrees is about right.

    • Reply to Kevin K ==> Very good input. You may be interested to know that temperatures in the 1960s are officially reported (in the USA) in single degree F increments. 60, 64, 72, 02, etc. There is no degree of accuracy, not even 1/2 degrees. Thermometers were likewise “standardized” but not celebrated ever, as I understand it.
      I am working, as mentioned before, on investigating this aspect of Original Measurement Error. A lot of this is covered in the HADCRUT4 papers.

    • Another point about instrument calibration issues. In critical applications, it is not adequate to simply use an instrument with an unexpired calibration sticker. The instrument in question will be check for accuracy when it is turned in for its next calibration check. If it is found out of spec, a notice will be issued to the users and any critical measurements will be re-checked with another calibrated instrument. The out of spec instrument will then be re-adjusted to be in spec.

      • Gary, yeah, been there done that. I helped calibrate an expensive satellite (many ten’s of millions of dollars) using supposedly calibrated test instruments.
        Then, AFTER it was launched an audit found one test instrument that was “out of cal” (by a month). Boy that got lots of attention from the customer; “You sold us a multimillion dollar instrument that was not fully calibrated……” and “We want our money back….”
        Yeah, a real feces hitting the rotating approximately planar surfaces moment…. I had to go back through years of test data and demonstrate that the “calibration error” was still small enough that the total system still met the specifications.
        You can be sure I triple check all those calibration stickers now…
        Calibration, it’s a good thing.
        Cheers, KevinK.

  22. You should also pay attention to the advice of Ernest Rutherford. Mathematical masturbation, graphical or otherwise, is much too prevalent on both sides of the climate debate. Looking at the actual temperature observatios, even those that have been manipulated, irrespective of source, instead of the anomalies, shows the irrelevance of much of the statistical discussion.

      • One quote often attributed to Ernest Rutherord is:
        “If your experiment needs statistics, you ought to have done a better experiment.”
        That is quite apt, in the context of climate ‘science’

      • My interpretation of Rutherford’s advice is: Don’t waste your time with sophisticated error analysis, but invest it in thinking about better instrumentation and better methods of data reduction. Electronic thermometers are better than mercury thermometers. ARGO buoys are better than measurements by ships, although they can’t replace them. Temperature measurements by satellite are ideal but they don’t measure the surface temperature. Weather models which are used for weather forecast can also be used to bring both techniques together and to fill missing areas. Future is the main issue, not the past.

  23. Go back a step for a more fundamental question. To calculate the average height of persons in a room do we measure only the tallest and shortest – no. Why is the midpoint between the maximum and minimum outliers for a day called an “average”? Next step – anything in modern climate science that’s called an average for a month or year or a grid region, etc is compiled from these “averages” that would not be called averages in any other field.

    • Reply to Gary in Erko ==> See Zeke Hausfather’s explanation of adjustments at Climate Etc.
      The question you ask is “answered” by adjustments — I do not wish to discuss that can of worms (or is a a barrel of monkeys?) here.

  24. A general question about error margins. Suppose we want the average of A +/-a and B +/-b. The average of A & B is (A+B)/n where n=2, but how do we calculate the error margin for this average, or extending that, where there are more than two terms with different error margins?

    • Gary, in some engineering fields the problem of disparate error sources (i.e. one part of the system contributes MOST of the error) is analyzed with “RSS” math. This is “Root Sum Squared” math, each error source is assigned an error value, each value is squared and a sum taken, then the root of the sum is calculated.
      When the error sources are not correlated this technique works quite well. For example if you build a complex optical lens (like the Hubble Telescope) you can measure the “wavefront error” of each element of the lens. This is a measure of how far away each optical surface is from the desired “perfect shape” and is usually measured in microns. If you “RSS” all those error terms together you get a very representative value for the wavefront error of the entire optical system.
      This has worked well for subsystem errors that are not correlated. For example, if you order 100 pieces of lumber cut into 1 foot long pieces (from 100 different sawmills) and then stack them together you will find that the final height is very close to 100 feet.
      Of course, if you order 100 pieces of lumber from the same sawmill the chances are that there is a systemic bias and your final “stack up” will be off by many feet.
      Cheers, KevinK

      • Kevin & George – thanks. I’d forgotten this from high school maths. From various graphs I reckon the error margins quoted for ‘world temperature anomalies’ are a more simple product of an estimate of number of gauges in use in various periods. They quote a fixed error margin for maybe 20 years, then a slightly larger margin for the preceding decade or two. They don’t look like a legitimate calculation or even an approximate estimate based on likely errors of each gauge.

      • Reply to Gary in Erko and KevinK ==> The Uncertainty Range used by the Met Office is a VRG (very rough guess) based on two exhaustive papers, whose links I’ve given a couple of times above. See the FAQ link given in the essay, find the “warmest year” answer, the papers are linked there as well.

  25. “The accuracy with which we can measure the global average temperature of 2010 is around one tenth of a degree Celsius.”
    Delusional.

    • Reply to jorgekafkazar ==> Credit should be given to Met Office UK for making any reasonable, clear-cut statement about Uncertainty Range, even if many think that it is “way too small”.
      For example, BEST shows much smaller CIs for current averages.

  26. “Liars figure and figures lie” is an old but accurate statement that could be applied to both sides of the AGW argument. However there should be no “argument” or “consensus” or “belief” when it comes to science…..No? Once again we enter the realm of politics.

  27. Mostly it is all fantasy.
    Proclaiming a significant signal smaller than the noise.
    The uncertainty range offered by these “experts” who claim they can MEASURE the average global temperature, appears to be only that of the statistical massaging of their chosen input.
    The claims of 0.01C differences in these created anomalies is comedy.
    Very low comedy.
    The error range of the recorded temperatures from weather stations, is another matter.
    A 1/10 of a degree error (uncertainty) range is wildly optimistic.
    Or is the choice of words revealing;”0.1C is the 95% uncertainty range”?
    Then there is the change of instrumentation, I read recently a mercury in glass thermometer can take 2 or 3 minutes to stabilize on temperature rise and up to 10 minutes to stabilize on a drop in temperature.
    The electronic resistance thermometer stabilizes either way in 2.5 to 10 seconds.
    If these response times are accurate, we can expect multiple “record” high temperatures since the transition to automatic stations.
    The state of the past data is such that we do not know what trends are afoot, if any.
    Probably the broad indicators like global sea ice are the best we can do.

    • Not only that John, that’s all we have to do. If the job at hand is to determine if we are heading for a significant and dangerous warming or cooling, or an inundation of sea water, there is no need for all these homogenizations and fudge factors and crustal rebound calculations. If the signal is significant, a half a dozen thermometer around the equatorial zone, and the North and South Poles would be enough. We could measure sea level rise with and ax handle once every 10 years at high tide. The way they go about it! O.1C error, 2.5mm a year…..after all the kriging and friging is a total expensive farce.

  28. Global temperature data is heterogeneous. Overall errors cannot be less than measurement errors. Most data prior to recently had a recording accuracy of +/-0.5 deg C. Overall errors cannot be less than this. When UHI and a host of other influences are taken into account, I’d be surprised if most data over the past century was better than +/- 1 deg C.

    • Reply to Tony ==> I believe you are quite correct — “Overall errors cannot be less than measurement errors.” Original measurement error does not magically disappear, nor does averaging reduce it (unless one is making multiple measurements of the same thing at the same time with multiple measuring devices.)

      • Kip wrote;
        ” Original measurement error does not magically disappear, nor does averaging reduce it (unless one is making multiple measurements of the same thing at the same time with multiple measuring devices.)”
        Actually the way it works is; IF you make multiple measurements with a SINGLE instrument at a SINGLE location at a frequency that is significantly different from the frequency content of the noise source you can REDUCE THE NOISE by averaging.
        BUT averaging never increases the accuracy, NEVER, NEVER, NEVER. Try making one hundred measurements with my proverbial “wooden yardstick” then average them all together, they will still likely be less accurate than one measurement with my Invar yardstick,
        Oh, the Invar yardstick is not an imaginary item, surveyors used to use them for very accurate work, turns out surveyors work outdoors were temperatures and humidity varies a bit depending on the weather, whoops of course I meant climate. Perhaps climate change is making surveying less accurate ? (add it to the list).
        See here where they discuss “leveling rods” (like a vertical yardstick) made of Invar;
        http://en.wikipedia.org/wiki/Invar
        Cheers, KevinK

  29. Not so long ago the AGW’ers were all in a rage about Venice sinking below the Adriatic. A check indicates Venice is alive and well as the Acqua Alta.
    I would hazard that the internet enable mercury thermometers of the NCDC and their “High Resolution High Fangled Thermometer Network” will be alerting us to impending calamity for decades, or at least until soon after the next Presidential Election. Ha Ha.
    In a year or so the University of Arizona “Graduate Committee” of the graduate student at UA whose “research” paper neglected more than 90% of the long standing (about 30 years) GPS stations and focused on only those stations deployed in 2005 to current (lacking calibration or even error checking), will not be kind.

  30. In philosophical terms, the 18 data points are “specific instances” of a “general rule” aka “scientific theory.” By itself the illustration provides one without the means for getting from the specific instances of it plus the related “background information” to the general rule. To get to the general rule is, however, the objective of a scientific study.
    To describe the events underlying the general rule is a requirement for getting to it from the specific instances of it plus the background information. Climatologists persistently fail to describe these events. Thus, climatological research persistently fails to produce general rules. In the absence of these rules the climate remains uncontrollable. Misunderstandings fostered by interest groups have led the makers of public policy to think they can control the climate when they are incapable of doing so.

  31. How can error bars be talked about when there is no log of the adjustments that have been made to the temperatures that will be used in calculating the averages. Who made the adjustments? What formula was used to make the adjustment? When as the adjustment made? Maybe a Harry read me file should be made (Oh I just did the change). Every temperature measurement should be open to absolute scrutiny. Every point if seen to be an outlier needs to be calibrated. We are spending Billions of dollars on some estimates made by a few “programmers”. Without the detailed logs none of the data would be acceptable in any professionally run lab.
    I can hear the bleats of these “programmers” as well they did not have the budget or scope to do professional work. If not then publicly say so. Oh I forgot this is a political agenda.

  32. “The David Whitehouse essay included this image – HADCRUT4 Annual Averages with bars representing the +/-1°C uncertainty range:”
    “It should read as The David Whitehouse essay included this image – HADCRUT4 Annual Averages with bars representing the +/-0.1°C uncertainty range:”

    • reply to ashok patel ==> Boy, you are absolutely right — my big big typo! Thank you! -kh
      Moderator : Please make this correction in the original essay. It should read:
      “The David Whitehouse essay included this image – HADCRUT4 Annual Averages with bars representing the +/-0.1°C uncertainty range:””

  33. Error bars are usually an estimate of the spread of data due to imprecise measurements and the random error. You could randomly remove 10-20% of the data and redo the calculations. The spread will give you an idea of the random error.
    Systematic errors are different and aren’t usually represented in error bars. You identify them and fix them. You also include the uncertainty in the amounts needed to adjust the data through to your final result.
    On top of that, the average of surface temperature measurements is just that, not the global temperature. There can be degree differences between means of hourly readings and the mean of max and min measurements if a cold front moves in through the late afternoon of evening. There can be a few degrees difference of the minimum temperature overnight due to the dew point. There can be a few degrees difference between the ground and the station on a cold and dry night . If you want an indicator of how the climate is changing, treat the max and min separately (and the months).

  34. There are multiple things going on here. First of all, modern meteorological thermometers have measurement error of +/- 0.1°C. That is a fact. This means that any averaging of thermometer readings can never have a better accuracy than +/- 0.1 degrees. From time to time I see someone claiming that beacuse there are thousands of readings every day this measurment error can be reduced by dividing the error with the square of the number of measurements, but that is just wrong. In order to use that method you need multiple measurments at each site at the same time, and that does not happen.
    Now as I said, if the global mean temperature was just an average of all available stations then the error would be +/- 0.1 degree, and it could be higher if some stations had higher measurement error. I believe it is William Briggs who have repeatedly stated that when you have the measured temperatures there is no uncertainty involved – the “95% unceratinty range” does not apply here. You know what you have measured, within the stated accuracy of your equipment.
    BUT, the global mean is not an average of all readings available. It is an average of something else. This something else is the infilling that is going on to create the notion that we “know” the temperature in each grid cell on the entire surface of the globe. This procedure must increase the uncertainty, both physically AND mathematically. My claim is that once they start the infilling there is no meaningful way of performing statistics on the results. Some stations will have a much higher impact on the overall “average” in this way and that will affect the result.
    Others have mentioned the fact that the relationship between the radiative balance of the earth and temperature is non-linear, which of course makes the change in the global average less than meaningful in that respect.
    Basically they should state the average of the station values. There is enough problems with that, what with all the station changes and such. But at least that would give an average of the real measurements. I am not saying that it would give any more meaning, I am just saying that it would at least be something.

    • Valland says:
      “From time to time I see someone claiming that beacuse there are thousands of readings every day this measurment error can be reduced by dividing the error with the square of the number of measurements, but that is just wrong. In order to use that method you need multiple measurments at each site at the same time, and that does not happen.”.
      So every thermometer is only used for one measurement then.
      I did not know that. Now I do. And I did not know that measurements from different thermometer with the same measurement error will not do anything to measurement error.

      • Your reply is really not very enlightening. You do know that most meteorological stations have one or two thermometers, don’t you? And you do realize that two simultaneous measurements really does not reduce measurement error? It is true that each thermometer of the same type and make has the same accuracy. If you make measurements of the same parameter simultaneously with several thermometers you will be able to reduce the measurment error. It is important to realize that since temperature is changing both spatially and temporally the measurments have to be simultaneous in both space and time. Meteorological measurements are neither. That is why measurment error prevails.

      • Anders:
        Just one quibble:
        “…It is true that each thermometer of the same type and make has the same accuracy…”
        I would agree that each thermometer of the same type and make has the potential for the same accuracy.
        Without proper calibration procedures and tracking data that potential is never achieved. Nor are the climastrologist’s accuracy assumptions valid or relevant whenever they announce their silly calculations.
        As far as rooter and his loathsome buds, they are here to distract, irritate and just plain waste commenter time and space. Just ignore rooter; nothing irritates time wasters more than not getting a response from the commenters they are trying to impede.

      • ” If you make measurements of the same parameter simultaneously with several thermometers you will be able to reduce the measurment error. It is important to realize that since temperature is changing both spatially and temporally the measurments have to be simultaneous in both space and time. Meteorological measurements are neither. That is why measurment error prevails.”
        Seems like Anders Valland is saying that measurement error changes with temperature and location. And therefore the only way to reduce measurement error is to have lots of simultaneous measurement at the same location.

      • To AtheOK: I agree with you, I deliberately kept the issue of calibration etc out of this. Doing real life measurements is a messy job 🙂
        To rooter @ February 2, 2015 at 9:05 am: Either you are deliberately misunderstanding (your “style” of writing implies this), or you really are in the dark when it comes to measurements. Or both at the same time. So instead of me trying to explain this to you, could you please tell us just how the measurement error should be handled in these types of measurements? How does it propagate in the calculations, rooter?

      • Vallend says:
        “To rooter @ February 2, 2015 at 9:05 am: Either you are deliberately misunderstanding (your “style” of writing implies this), or you really are in the dark when it comes to measurements. Or both at the same time. So instead of me trying to explain this to you, could you please tell us just how the measurement error should be handled in these types of measurements? How does it propagate in the calculations, rooter?”
        This is very basic Valland. Take one measurement. Accuracy of the instrument describes how well that instrument can give the correct reading. This error is +- and random. Not a systematic bias. Take two or more measurements and the randomness of the error will cancel out the errors. Because it is not a systematic bias. That error in measurement has nothing to do with different time or location. That has nothing to do with how the effect of that measurement error is reduced with many measurements.

      • To rooter February 3, 2015 at 4:34 am :
        The measurement error contains a systematic and a random element. You have no way of knowing which is which. That is why the level of accuracy, or measurement error, is stated for the instrument. Since we are talking about measurement of air, which is an ever changing fluid with varying thermodynamic properties (mostly due to the amount of water vapour and liquid), you need multiple simultaneous measurements in time and space to be able to reduce measurement error. You need to read up on error handling and especially the preconditions for the special rule you are applying.
        Tell me, rooter. In my current work we do pressure and temperature measurements inside an engine cylinder during combustion. This amounts to thousands of measurements during a few seconds. Do you believe we would pass peer review if we said that this alone made our measurements close to infinitely accurate as you claim that we could (remember, this is not climate science)?
        And do you seriously believe that my home thermometer is infinitely accurate since I have read it several thousands of times since I bought it? Because that is really what you are saying. The more I look out the window and see a number, the better the accuracy because somehow the error the instrument had last time is cancelled out by the error it has now. That is bollocks.

      • Anders Valland says:
        “Tell me, rooter. In my current work we do pressure and temperature measurements inside an engine cylinder during combustion. This amounts to thousands of measurements during a few seconds. Do you believe we would pass peer review if we said that this alone made our measurements close to infinitely accurate as you claim that we could (remember, this is not climate science)?”
        That is very interesting. Why do you need thousand of measurements during few seconds?

      • Anders Valland says:
        “And do you seriously believe that my home thermometer is infinitely accurate since I have read it several thousands of times since I bought it? Because that is really what you are saying. The more I look out the window and see a number, the better the accuracy because somehow the error the instrument had last time is cancelled out by the error it has now. That is bollocks.”
        Of course the accuracy of the instrument will be better with repeated readings. But repeated reading will improve the accuracy of the mean of the reading. Incredibly basic. You could check that yourself. Do daily readings for one month of your thermometer. As accurate as the thermometer can give. Make another series with rounding of those readings to the nearest degree. That is: decrease the accuracy (+-1). Then compare the mean of the two series. Check if the mean of the least accurate series can deviate one degree from the mean of the more accurate series.
        You will probably be surprised by the answer.

      • Again, it will not decrease the mean of the reading. You are stating that if I read -3.5 °C +/- 0.1°C today and then I read -1.6°C +/- 0.1°C tomorrow that the mean would be 2.6°C +/- 0.07°C. That is silly, and contrary to all current knowledge in this area.
        I think we can now say that we have enough samples of rooter to say that his precision is low, and his accuracy even worse. It does not really matter what you believe, rooter. Your belief does not change facts in this matter. Read up, or go on believing.

      • rooter says: “That is very interesting. Why do you need thousand of measurements during few seconds?”
        Seriously?

      • Anders Valland says:
        “Again, it will not decrease the mean of the reading. You are stating that if I read -3.5 °C +/- 0.1°C today and then I read -1.6°C +/- 0.1°C tomorrow that the mean would be 2.6°C +/- 0.07°C. That is silly, and contrary to all current knowledge in this area.”
        You’re getting there Valland. Random errors cancel out.

      • Anders Valland asks:
        “rooter says: “That is very interesting. Why do you need thousand of measurements during few seconds?”
        Seriously?”
        Yes, seriously. Why thousands of measurements?

      • As I said, we are measuring temperature during combustion in an IC engine. That is why we get some thousand points in a few seconds.

      • No, rooter, the mean can not be more accurate than the individual measurements when you are measuring different things. If I had two simultaneous measurements each day it could do that.
        But I only have one each day. You really should read up on this.
        If this is the way you handle knowledge I guess you will be getting into trouble quite often. Nature really does not care about your beliefs as anyone versed in experiment will tell you.

    • “BUT, the global mean is not an average of all readings available. It is an average of something else. This something else is the infilling that is going on to create the notion that we “know” the temperature in each grid cell on the entire surface of the globe. This procedure must increase the uncertainty, both physically AND mathematically. My claim is that once they start the infilling there is no meaningful way of performing statistics on the results. Some stations will have a much higher impact on the overall “average” in this way and that will affect the result.”
      Averaging is infilling.

      • Consider temperature indexes with gridcells and not interpolation between cells. Take a grid cell. Compute the average of the temperature stations. The whole gridcell will get that average. Two stations or 50 stations. Areas inside that gridcell without measurements will be infilled with the average of the stations.
        Do the same with a hemisphere. The average for that hemisphere will consist of the average from the gridcells with temperature measurements. The gridcells without measurements will be infilled with the average of the gridcells with measurements.

      • Comment to rooter, February 2, 2015 at 8:54 am: Averaging is infilling ONLY if you consider gridcells. The way I described it, it is not infilling.

      • Anders Valland says:
        “Averaging is infilling ONLY if you consider gridcells. The way I described it, it is not infilling.”
        Well. You have not described your kind of averaging. Averaging without some kind of areaweighting is infilling in the same way as the making the average of a gridcell is infilling the gridcell’s value with the average of the measurements from that gridcell. The only difference is the size of the gridcell. A simple average of all the measurements is using the whole globe as one big gridcell. And some areas with many measurements will be given bigger weight than the should have.
        Perhaps this is the time for Valland to formulate his alternative?

        • rooter commented

          And some areas with many measurements will be given bigger weight than the should have.

          One could look at this differently, first is that more measurements actually means there is less uncertainty in those over sampled areas, maybe they deserve more weight.
          When I create averages of larger areas I don’t adjust for weighting, but I’m not really trying to generate a field value (and I have 1×1 averages if it’s important), I’m trying to describe the response of a large number of sensors that i have no control over what their location is.

      • My “alternative” was given further up: calculate the mean of the station values. I qualified that by stating something about its meaningfulness, I could probably say that it would make just as much sense as the current methods used for infilling.

      • Anders Valland says:
        “My “alternative” was given further up: calculate the mean of the station values. I qualified that by stating something about its meaningfulness, I could probably say that it would make just as much sense as the current methods used for infilling.”
        Then Valland says it is ok with infilling. That is the same as infilling one grid with the mean of the stations in that grid. Except that he uses a very big grid. In his case the whole world.
        And he adds one big error. The mean of those station values will be strongly affected by the fact that there are many more stations in continental US and Europe. Those areas will be given too much weight. That is the worst infilling method.

      • Since we are looking for a global mean of measurements temperaturen it does not constitute an error to take the simple average of all available stations. It is just one other method, not an error. People seems to be preoccupied by the changes in the mean value and that can also be accomplished here. We are not looking for an accurate absolute value, thus the area weighting is meaningless. It makes just as much sense to use the simple mean as to construct any fancy infilling and weighting.

    • “Basically they should state the average of the station values. There is enough problems with that, what with all the station changes and such. But at least that would give an average of the real measurements. I am not saying that it would give any more meaning, I am just saying that it would at least be something.”
      That is what the temperature indexes are. With some kind of area weighting. Which is of course necessary because different number of stations worldwide. A simple average of stations would therefore give too much weight to the continental US.

      • That is why I explicitly stated that there are problems with it, rooter. And why I stated that it would not necessarily give any meaning.

      • Anders Valland says that temperature indexes don’t give any meaning. What do Anders Valland prefer then? Nothing? A simple average of stations an no area weighting?

    • Reply to Anders and others ==> “… therefore the only way to reduce measurement error is to have lots of simultaneous measurement at the same location.”
      Of the same thing, at the same time, in the same place. Yes, exactly.
      Using 100 standardized thermometers atop six foot poles planted in my backyard with the readings marked down at exactly the same moment by 100 robots readers with 100% accuracy — averaging those readings will produce a closer to true temperature of my backyard with a reduced measurement error.
      Using 100 standardized thermometers atop six foot poles planted in 100 different backyards with the readings marked down at exactly the same moment by 100 robots readers with 100% accuracy to take measurements — one does not reduce original measurement error by averaging.
      I’d like to read counter-comments from statistics-trained professionals.

    • If, we agree that we can not increase the precision beyond a single dp, with surface data that is +/- 0.1F, then the collective change in temp for 95 million samples from 1940 to 2013 is effectively 0.0 for change in min temp and 0.0 for the change in max temp (calculated min temp change = -0.097392206 max = 0.001034302).
      This is what the stations actually measured.
      Now, because the surface has an annual temperature cycle, you need as close to a full year as possible for that cycle to cancel. I select each stations by year with a minimum of 240 daily sample that year, if a year has less than 240 that year for that station is excluded.
      Over that same period daily rising temp average is 17.5F , and the following nights average falling temps are 17.6F
      Global warming is entirely a product of the processing methods used on the data.

  35. 10 years as a croupier in the Casino Industry. Probability mathematics is a wonderful thing. There was a good reason for Douglas Adams to use the idea to build the 2nd most advanced space ship ever created or to be created in his Hitchker series (restaurant mathematics is behind the most advanced one!). In short, the graph means diddly squat. It doesn’t have enough “rolls of the dice”, enough “spins of the wheel, “hands of cards”, etc to discern anything with confidence. I have personally witnessed 15+ Red, Black, Odd, Even, etc (take your pick) combinations in a row on multiple occasions and though statistically unlikely you have know way of knowing if this graph is showing each point in its statistically most likely, least likely or something inbetween, position because you don’t have enough data on your graph.

      • Reply to wicked… ==> You aren’t the first to make homonymic typos here….no need to apologize. Even my fingers sometimes type the wrong “there” or “its”.
        It is simply “the graph”, not mine however. Its significance is much in doubt, whether it means anything at all given its number of data points.
        Thank you for the Gamblers-view of it!

  36. The first thing I would do is to ask where the graph came from. Look at the raw data first. Check the method used to process the raw data into the display “data”. Look to see if features were added that weren’t in the raw data. My understanding is that the methods used are extremely unreliable, involving subjective measures such as comparing “adjacent” sites and making adjustments to make sure they show a similar trend. Odd choice of adjacent sites (as seen for Albury), creating climbing slopes out of negative or neutral slopes. Anyone looked at Paraguay, lately?
    Only then is it worth asking questions about the graph. My first observation is that the question of what year is highest is pointless. They are all much the same. 300 degrees K, give or take a small amount of noise. It is hard to go past that.

  37. You can’t calculate the error bars without understanding the errors. UKMO are playing games as all the climate people are. It is possible/likely that the real error bars would show no significant warming for 150 yrs.

  38. Just a point on your yellow band, though. You should really use yellow blocks rather than a flat band to create your graph. The blocks would show the margin of error for each point individually. You could then plot the “worst case scenario” for each side of the debate. You could manufacture a plausible (but unlikely) graph showing a rising trend or a declining trend within those blocks to fit whichever view of the world best suits.

    • reply to wicked…. ==> If I were graphing the data for general use, you would be absolutely correct. The above is NOT the proper way to show an Uncertainty Range.
      However, I wanted to show how many points of the 18 year period would “fit inside of” the Uncertainty Range — thus this illustration. I purposefully called it an “illustration” and not a graph for this very reason.

  39. The Met Office TV weather forecasts have recently started telling us that the night-time temps it shows are in the country and that in towns the temperatures will be several degrees higher. They don’t say that it’s UHI but I can’t think what else it could be. So even the Met Office is now admitting that its recorded/predicted temperatures vary widely depending on whether they’re in town or country. What possibility is there that they could calculate an accurate average temperature for one day for a district, let alone an average temperature for a year for the world?

    • And just looking at the weather map for the UK, ought to have convinced Lord Stern (who compiled a report on the disastrous consequences of global warming) that a change of 2 or 3 degC would be of no concern whatsoever for the UK.
      Temps in Scotland would be more like those in Northern England, those in Northern England more like the Midlands, the Midlanda more like the South West, the South West more like the Sout East, the South East more like the Channel Isles, and the Channel Isles more like Brittany/Normandy. Whats not to like about that?
      Wouldn’t all parts of the UK greatly benefit by such a temperature rise, Global Warming (there being no such thing since climate is regional) would be a god send for Northern Lattitude Countries such as the UK, Holland, Germany, Scandinavia, Canada etc.

  40. ‘What does this illustration mean scientifically?’
    My answer would be that this does not add anything to what we already know in the shape of the time series and simple summaries like slope of linear regression.
    Record setting does not occur in any serious science. It belongs to the world of entertainment, sport, advertisement, and hype. Record setting events are a statistical disaster because they are by definition severely dependent upon each other. They depend on the trivial start of record keeping, and they do not contain much objective information because what’s a record breaking event now, stops to be that when a new record is set. Perhaps their only useful function is to inform us about the range of a variable, like encountering somewhere a person of 120 years old with a valid birth certificate.
    For the statisticians wasting their time on these non-scientific issues: where do the error bars come from and do these also apply at a series of outliers? You can be sure that the bars must be huge and must also contain bias.

  41. It’s worse than they thought! A brief look at any undergraduate text on propagation of experimental errors shows that if independent experimental values of T1 and T2 have errors of dT1 and dT2, then T1-T2 has an error of sqrt (dT1**2 + dT2**2). That is, if dT1 = dT2 = dT, then T1 – T2 has an error of 1.4*dT. If the wildly optimistic assumption of dT = 0.1 is accepted, then the error in their difference is +/-0.14. Also, numbers should never be quoted with more significant figures than their experimental error, so to say T1 – T2 =0.01 is completely invalid.

  42. well as a non scientist, but truly pondering matters with what i cll “logic reasoning” i would answer these questions you state as following:
    “2014 has a slightly little more posibility of being the warmest year then 2010 and 1998 and this possibility goes in diminishing order with all the other years on the graph that near the error margin area you colored.”
    i would thus say statistically according to the values 2014 has the most chance of being just a tiny teenie weenie warmer then 2010 if you would bet for the “hottest year” but not of a significance.
    on question 2 it is a bit harder, but pure logic tells me that: If 18 values are in the domain of uncertainty range, then again you speak of possibilities. again it’s a game of what i call “chances”
    i see the error bars in a bit more unconventional way: i take in account what this error region means by thinking of “best guess with most chance of being correct” if the value would be error free.
    so let’s assume we did go for a betting game for this hadcrut as tomorrow we would have error free temperature results.
    in that way
    2014 would be the bookmaker’s choice as favorite (like the favorite horde in a horserun)
    then second would be 2010
    third 1998
    and so on
    However being able to cover 18 readings in an uncertainty field does not say it all. it also depends where the dots in that field are however it does say one thing: there is a possibility no matter how small that they all may have the same value.
    as result i got this conclusion: compared to 1982 – 1998, 1998 till 2014 did not made a signicicant rise outside the error bars region, while the episode 1982-1998 clearly did. Therefore as the current trend is within the error bar range, there is logically no significant change.

    • Reply to Frederik Michiels ==> The critical point is that the Uncertainty Range calculated by the Met Office is not your run-of-the-mill CI. Confidence Intervals are statistical animals and have (IMHO) very little to do with measured values (Steven Mosher tells us that his project, Berekley Earth, doesn’t produce averages but rather predictions — so I don’t know what their 95% uncertainty means at all).
      Read the Met Office FAQ statement in the original essay.
      While interesting, the probabilities about the data points are not, well, the point.

  43. NOAA states that the annual mean global surface temperature for 1907 is 16°C, and 1907 has the lowest annual mean global surface temperature of all the years from 1900 to 1997.
    http://www.ncdc.noaa.gov/sotc/global/1997/13
    NOAA also states that the annual mean global surface temperature for 2014 is 14.59°C, and 2014 has the highest annual mean global surface temperature of all the years from 1880 to 2014.
    http://www.ncdc.noaa.gov/sotc/global/2014/13
    Why would anyone believe anything, that NOAA publishes?
    In 1995, NASA claimed, that the current mean global surface temperature is 281 k (8°C).
    https://pds.jpl.nasa.gov/planets/special/earth.htm
    In 1998, NASA claimed, that the current mean global surface temperature is 15°C.
    http://www.giss.nasa.gov/research/briefs/ma_01/
    If we put these two together, the mean global surface temperature of the earth during the 90s is 11.5(+/-3.5)°C, or 285(+/-3.5) k.
    According to Carl Sagan and George Mullen, the mean global surface temperature of the earth in circa 1972 could have been either 289(+/-3) k (16(+/-3)°C)), or 281(+/-3) k (8(+/-3)°C).
    http://courses.washington.edu/bangblue/Sagan-Faint_Young_Sun_Paradox-Sci72.pdf
    It does not seem unreasonable to suppose, that all estimates of mean global surface temperatures come with a caveat of +/-(not less than 3)°C

    • Carl Sagan and George Mullen also mention 286degK to 288degK for the mean surface temperature.
      But rthe upshot of all of the above, is that no one has any real idea as to the average surface temperature of the globe within about +/- 5 degC, ie., about 12 degC +/- 5 degC.

  44. Perhaps you would be interseted in the work of Pat frank, highlighted by Chefio.
    http://meteo.lcd.lu/globalwarming/Frank/uncertainty_in%20global_average_temperature_2010.pdf
    ABSTRACT
    Sensor measurement uncertainty has never been fully considered in prior appraisals
    of global average surface air temperature. The estimated average ±0.2 C station error
    has been incorrectly assessed as random, and the systematic error from uncontrolled
    variables has been invariably neglected. The systematic errors in measurements from
    three ideally sited and maintained temperature sensors are calculated herein.
    Combined with the ±0.2 C average station error, a representative lower-limit
    uncertainty of ±0.46 C was found for any global annual surface air temperature
    anomaly.
    or this one
    http://multi-science.metapress.com/content/t8x847248t411126/fulltext.pdf

  45. As a chemist, I’d look at that graph and say ” do some more measurements and wait and see”
    As far as the process being a prediction not an actual measurement(Mosher), I had a very enlightening experience as an intern trying to extract and recover a protein from a raw material. I ran at least 100 extractions and the statisticians plotted the results for me. The recovery rate ran from as low as 20% up to a rather broad area showing 80% recovery. The real kicker than one experiment, directly in the center of that broad area, returned 98%. Despite repeated tries most of the experiments resulted in 80+/-% with a couple batches yielding 92-93%.
    That taught me not to trust various curve fitting exercises such as Mosher is talking about. There obviously were important variables in play that we had not discovered yet.
    It is pretty obvious that the same applies to the climate that there are large, important variables involved that are not amenable to study by averaging or curve fitting. As someone else pointed out, the GAT doesn’t mean anything when it is -126degC in central Antarctica, and probably not over -100degC for hundreds of miles. Any kind of average temperature has little to do with the energy balances of the actual climate processes.

  46. Another way to consider this issue is significant figures. Measuring data to three significant figures requires an accuracy of 1%, e.g. xy.z. An instrument with an accuracy of +/- 0.5 % would mean an uncertainty band of 1%. Now suppose you collect thousands of such data points. The result can not have more significant figures than the data. The average of a thousand xy.z data points can not be expressed as xy.zb. If I recall correctly statistical methods require that the result be rounded to two significant figures, e.g. xy. Insignificant trends would then be lost in the cloud of data, in the uncertainty bands.

    • “The result can not have more significant figures than the data.”
      You give no authority, and it simply isn’t true. The error of the mean is less that the error of the parts. Why do you think people go to the expense of assembling large datasets?
      The classic is polling. Each response is just binary – 0 or 1. Yet with 1000 responses, you get 2 sigfig meaningful data.

      • For those not bothered to look at the link here is a summary
        When adding or subtracting numbers, count the NUMBER OF DECIMAL PLACESto determine the number of significant figures. The answer cannot CONTAIN MORE PLACES AFTER THE DECIMAL POINT THAN THE SMALLEST NUMBER OF DECIMAL PLACES in the numbers being added or subtracted.
        When multiplying or dividing numbers, count the NUMBER OF SIGNIFICANT FIGURES. The answer cannot CONTAIN MORE SIGNIFICANT FIGURES THAN THE NUMBER BEING MULTIPLIED OR DIVIDED with the LEAST NUMBER OF SIGNIFICANT FIGURES.

      • Reply to Nick Stokes ==> Osborn quotes a text below.
        Large databases do not eliminate original measurement error — they don’t, really. 100,000 poorly measured data can not be transmogrified into 1 scientifically precise mean.

      • A C Osborn February 2, 2015 at 9:50 am
        “I don’t think so.”

        OK. Suppose you average 1000 numbers, all close to 1, expressed to 1 dp. You add them – you have, by those rules, a total of 1022.1. You divide by 1000 – that is exact, so by the division rules, the answer is 1.0221, according to those rules.

      • Reply to Nick Stokes 1:43 pm
        The division by 1000 is actually division by 1000.0000000… (as you say, it is exact, as it is a discrete value). Due to all of the other values being expressible to one decimal place, your sum is also correct. But, due to the division rules, your value of 1022.1/1000.0000000000000…. is 1.0. There is no way to resolve this further, due to the limited precision of the earlier measurements. The uncertainty may appear to be resolved below the precision level, but in actuality, it can never be less than precision. Thus, the measurement you show has to end up at 1.0.

      • Headcounts do not have physical units of measurement and your result is simply a nebulous proportion of a total. Big difference. Though the recording may be incorrect, the measurement error is zero. Please do not conflate reality with statistical hair splitting fit only for the rear quarters of a political donkey.

      • Polling is very different from measuring temperature, Nick. For starters, in polls you assume the answers are independent. And as you state, you can only get a given set of responses. If a person answers “yes” to a question, there is no measurment error. A very different animal.
        Besides, in a poll you are more interested in looking at the spread of data. The spread is not measurment error. Some people seem to confuse standard deviation in a sample with measurement error. These are two very different things.
        And finally you need to keep in mind that there is a difference in precision and accuracy when it comes to measurements. Picture a hunter with a rifle firing five shots at a target. The five shots are clustered on the target, but off by 10 cm to the left and slightly up. He thus has an instrument of precision (the clustering) but a problem with accuracy (off target). That is why issues of calibration also come into play here, keeping the real measurement error high.

    • “But, due to the division rules, your value of 1022.1/1000.0000000000000…. is 1.0.”
      No. The division rule says:
      “The answer cannot CONTAIN MORE SIGNIFICANT FIGURES THAN THE NUMBER BEING MULTIPLIED OR DIVIDED with the LEAST NUMBER OF SIGNIFICANT FIGURES.”
      (not my caps)
      The numerator has 5, the denom ∞. The minimum of those is 5, not 1.

      • Sorry. My error in explanation. Go back to my point on precision. Expand in a single step, rather than convoluting by using multiple steps. Thus: (1.2+1.5+0.7+….)/1000. Within the expanded set, you follow the traditional division rule of “what number has the lowest number of significant figures?” You don’t change the precision allowed just by adding additional steps. My thermo professor hammered us hard on this one.
        The minimum number of significant figures at any time in the calculation series you post is 2. Therefore, then answer cannot have more than two. Furthermore, it only ever goes out to one significant figure beyond the decimal place. There is no method for going beyond that.
        1.0 has two significant figures.
        I again apologize for my error and thank you for pointing it out.

      • “Therefore, then answer cannot have more than two.”
        No. The add rule is just about decimal points, and says nothing about sigfig. As you add (positive) numbers, the number of sigfigs can increase; the dp stays constant. That is how you build up 5 sigfigs. Division by 1000 doesn’t change anything, sigfigwise. It’s the same as converting mm to m.
        It’s just a rule, not a statistical method. It overestimates the accuracy of the mean.

      • “all close to 1, expressed to 1 dp.”
        Hence, the result is only significant to “1 dp.” which, here, we call one decimal point. End of story, Nick!! Maybe you never had to pass a class, requiring you to demonstrate knowledge of this fundamental point of Data Management, to subsequently become gainfully employed. Lots of us did, you should go back and reread the link, and consider its implications for your life…

  47. Kip, I don’t know what you are planning to do with this errors thing. Let’s accept the numbers as given with their error bars. The real objective in all this gets lost in all this unnecessary detail, adjustments and agonizing over how to calculate error. I understand the objective is to detect whether we are headed for death by fire, death by ice, death by inundation…
    To do this, there is no need to worry about errors in the monthly, yearly, decadely record. The raw data is perfectly capable of informing us over a period of a few decades what we have in store for us. We should of course do it as cleanly as we can. For this purpose, get rid of everything but well sited rural instruments – indeed, install a hundred pairs or triplets of good instruments in national parks around the world, each in a suitable micro field. If it is really important to know what danger lies ahead, zone a large area around the recording site permitting no building, pavement, etc and keep the shrubs from encroaching or whatever measures are deemed necessary. Maybe let satellites collect the data from them and pay guards to patrol the perimeter – please don’t mention cost in this!!! For sea level, go with the tide gauges or, if we are happy with GPS, this will do fine -millimetres per year for this objective are a measure of nothing important happening as are tenths of degree. Finally, before deployment or selection of existing, have a meeting of 3 -5 randomly selected unpaid volunteers to meet and decide on a fixed algorithm for processing the data.
    In summary, If a bolide 500m in diameter is heading toward earth, don’t go down to the sea with a micrometer to try to decide what the catastrophe will be like.

    • Reply to Gary P ==> I don’t plan to do anything at all….I’m just curious about the original point. I read lots of studies (climate, medical, clinical, psychological, etc). Uncertainty Ranges, Error bars, and CIs are often confused for one another, unidentified, undefined, and/or ignored altogether. Often they are statistically determined by Maths Package and have nothing to do with the actual measurements used in the experiment.
      I appreciate your participation here today.

  48. A blast from the past.

    [NOAA] – MONTHLY WEATHER REVIEW January 1907
    IS NOT HONESTY THE WISEST POLICY T
    It is wrong to mutilate or suppress the record of an obser-
    \ ntion of a phenomenon of nature, but it is also wrong to make
    a bad use of the record. In fact, it is the misuse of meteorological
    data, not the observing or publishing, that constitutes
    a crime against the community
    . Observation and careful research
    are to be encouraged as useful. Misrepresentations are
    to be avoided as harmful
    . The (‘ Independent Press ’’ as the
    ‘I Voice of the People ” should be not only IC Vox Populi ” but
    “Vox Dei ”, repressing all cheats and hoaxes, defending the
    truth and the best interests of the whole nation as against the
    self-interest of a few.-C. A.
    http://docs.lib.noaa.gov/rescue/mwr/035/mwr-035-01-0007b.pdf

    H/T
    https://stevengoddard.wordpress.com/2015/02/01/1907-it-is-your-patriotic-duty-to-rebel-against-climate-data-tampering/

  49. Subject: False Climate Claims by NOAA
    To: Climate-portal@noaa.gov,
    Climate-ClimateWatchMagazine@noaa.gov,
    Climate-DataAndServices@noaa.gov,
    Climate-Education@noaa.gov,
    Climate-UnderstandingClimate@noaa.gov
    Sirs and Mesdames;
    You have not been truthful:
    http://wattsupwiththat.com/2015/02/01/uncertainty-ranges-error-bars-and-cis/
    Your intentional and conscious deception of the public has severely damaged your credibility. You have done permanent damage to science and scientific endeavor.
    It is appalling, disgusting and disgraceful.
    Very truly yours,

  50. I am a medical scientist mostly dealing with identification of risk factors for clinical events using multivariate models.
    The first thing that comes to my mind would be to assess the distribution of data. It does not seem to me (but I don’t have the data set to check it) that surface temperatures are normally distributed (either spatially or temporary). In such a case to show confidence intervals with SD is likely not appropriate. Median and interquartile range should be used instead. In my limited experience in the files I never saw the details of how confidence intervals are calculated for temperature data sets.

    • Reply to Dr. Napolitano ==> Thank you for your input — I believe that your are correct that surface temperatures are not normally distributed in space or time. I’d like to read opinions from others on this point.
      Far above in the comments I link t the two papers used by Met Office UK to set their Uncertainty Range for this metric.

      • Kip, I tend to agree with Dr. Napolitano, how can the temperatures be normally distributed when they depend so much on local climatic variations, prevailing wind directions, Humidity, geologic position (close to hills/mountains etc), geologic conditions (Volcanic activity), current directions when coastal.
        I have seen a study somewhere, but can’t remember where, that coastal sites are controlled by the seas and have very different temps & ranges to inland sites.
        All the sites added together may end up as Normally distributed, but that loses so much data. I just don’t believe in a “Global Temperature”, surely all the sites should be individually analysed for trend and the decision of warming/cooling be based on the majority trend.
        As to the business of gridding or Krigging it is absolute crap and the perfect tool for deception.

      • Reply to Dr. Napolitano and A C Osborn ==> I suppose it would be possible to see if strong>any of the temperature data was normally distributed — there are sources for the gridded means used to arrive at HADCRUT4. of course, the gridded means are arrived at themselves by formulas that may force a normal distribution by smoothing and adjusting based on the assumption of normality.
        Any deep data people still reading here? Does the HADCRUT4 process force an assumed normal distribution on the gridded data? (Does that question even make sense?)

        • Kip Hansen commented

          Reply to Dr. Napolitano and A C Osborn ==> I suppose it would be possible to see if strong>any of the temperature data was normally distributed — there are sources for the gridded means used to arrive at HADCRUT4. of course, the gridded means are arrived at themselves by formulas that may force a normal distribution by smoothing and adjusting based on the assumption of normality.
          Any deep data people still reading here? Does the HADCRUT4 process force an assumed normal distribution on the gridded data? (Does that question even make sense?)

          I have 1×1 gridded data based on NCDC’s GSoD data(land only) that’s straight averaged in csv files.
          here
          http://sourceforge.net/projects/gsod-rpts/files/Reports/LatLon%201×1%20Box/

      • I’m an analytical chemist and have dealt with QA and QC data for many years. The chart reminds me of a control chart, with upper and lower boundaries based on the precision of a method (or temp measurements here). Excursions beyond the boundaries for an analytical method suggest that a non-random factor has developed, or that a random, albeit low probability event has occurred. But weather is not a chemical method. Very good points have been raised about normal distributions. In a hypothetical Earth, we could set up a grid over the entire planet, say a million equally spaced sensor points, and we might find a normal distribution of temperatures across the globe. But that’s not the historical reality. It might be possible in the future with satellite monitoring. Given the irregular siting of sensors, skewed distributions could be expected, and non-normal statistics applied. Just today an article by some NPR reporters appeared comparing current temperatures in Minnesota with those observed 150 years ago…not one mention of possible sampling error due to historically sparse measurements vs more widespread current measurements. Not a mention of the heat island of the Twin City metro. NPR could use a little science.

      • Reply to Larry Potts ==> Yes, it is obvious that the data do not wander, during this time period, much out of the range identified as the Uncertainty Range fore the metric — data within that range can be said to be “the same”. Excursions tell us something is going on (or that our Uncertainty Range is too narrow.
        I think that modern satellite temperature measurement has show us the temperatures are not evenly spread and I have seen no evidence that it has been demonstrated that spatial grids show a normal distribution of temperatures that would be expected if temperatures were simply “cold at the poles and hot at the equator, give or take altitude”. This brings into question all of the infilling methods used to create HADCRUT4 (or BEST, or GISS). This doesn’t mean that we really expect the cold poles, hot equator model, only that the belief that temperature for point B can be determined by its distance from points A and C whose temperatures are know is probably false.

  51. As an engineer looking at that data plot, I would conclude: No change in the average over that time period, but sone unusual variations 1996-1998.

  52. Okay, I’ll give this a shot, though I doubt anyone will read it. I occasionally work uncertainty issues for a major wind tunnel organization, so you can decide if that’s relevant.
    Let’s say you have a value A for which you desire the uncertainty. To determine the value of A, you use an equation which draws on a variety of direct, uncorrelated measurements (x1, x2, x3,…). Thus, the equation is A(x1, x2, x3,…).
    Let us further assume that you have lucked out and all of your instruments have 95% confidence uncertainties provided, as well as NIST accuracy traceabilities. The uncertainties are labeled as: Ux1, Ux2, Ux3,…. The accuracies are just there for you to make sure you’re only reading to a reasonable number of decimal places, etc etc.
    Noting that I am going to use a standard letter d to denote a partial derivative, the uncertainty in value A is found by: UA = [{(dA/dx1)^2}(Ux1^2)+{(dA/dx2)^2}(Ux2^2)+{(dA/dx3)^2}(Ux3^2)+…]^(1/2).
    So, you need to know the equations being used, and all of the factors. A simple average of a three of temperatures where all of the sensors were good to +/- 0.1C, for instance, will give you an uncertainty of +/- 0.06C. This is where taking multiple measurements at the same point in 4D space becomes better than the known uncertainty.
    Incidentally, units are very important. While it doesn’t matter in this case (because the partial derivatives are simple), you really should always do your uncertainties in absolutes (Rankine, Kelvin, psia, atm, meters, feet, slugs, etc.). If you use relative/differential measures (F, C, psid/psig, in H20, mm Hg), you will actually misestimate your uncertainty automatically, unless you fix each location appropriately (really hard, easy to mess up, don’t do it).
    If, however, your values are correlated, or are given more complicated manipulations, generally the uncertainties will cause other problems. There is a way of manipulating the terms (normalizing, really) to give you sensitivity coefficients to tell you what you should improve to get the most bang for your buck. It could be that the uncertainty on the size of land plots is the worst thing. Altitude measurements could be (almost certainly are) horribly inaccurate. But improving those measurements might be of limited value if their influence is low.
    Figuring out the uncertainties on climate models would be a painful task. It is impossible if any equations are hidden. The same is true of experimentally derived data, such as global mean.
    As for the question originally posed: I’m not sure those options cover it. The blue lines are useful for showing that 2014 is (with 95% certainty) higher than three temperatures (1996, 1999, 2000).
    The yellow is probably not true, given that it is the same width as the blue lines. Generously assuming it to be true, however, it’s not terribly useful. Once you put the error bars on the three “outliers”, those measurements may have been within that error bound on the average. Therefore, all of the temperatures (to 95% certainty) fall within the 21st century average temperature range. The only real utility I see for that, however, is outlier rejection. The fact is, none of these years are (by themselves) terribly interesting.

    • Note sure I follow… and yes I read it! 😉

      So, you need to know the equations being used, and all of the factors. A simple average of a three of temperatures where all of the sensors were good to +/- 0.1C, for instance, will give you an uncertainty of +/- 0.06C. This is where taking multiple measurements at the same point in 4D space becomes better than the known uncertainty.

      According to this, averaging different measurement, the uncertainty is lower… this means that the more measurement you add the better the precision? So adding thousand of different points would provide infinite precision?
      When you say, taking measurements at the same point…. This is not what we are doing, it’s not multiple temperature measurement at the same probe that we average, it’s different probes, different location that we average.
      A quick test in excel of 2 and 3 probes, show that my uncertainty is simply added up, but I may be mistaken in my understanding. Other where using the SQRT(U1^2+U2^2+U3^2… Un^2) to find the final one. Even this method, the uncertainty grows.
      It’s been a long time since I studied maths, maybe a mathematician can help here?

      • Here’s a very basic example. So, using the three temperatures (t1, t2, t3) to find an average temp T, and assuming: Ut1 = Ut2 = Ut3 = 0.1C = 0.1K (converting to absolute value, which may matter in other places). T=(t1+t2+t3)/3. dT/dt1 = dT/dt2 = dT/dt3 = 1/3.
        Then UT = ((dT/dt1)^2*(Ut1)^2)+(dT/dt2)^2*(Ut2)^2)+(dT/dt3)^2*(Ut3)^2))^.5 = ((1/3)^2*(0.1K)^2+(1/3)^2*(0.1K)^2+(1/3)^2*(0.1K)^2)^0.5=(3*1/9*0.01K^2)^0.5 = (1/3*0.01K^2)^.5 = (0.0033K^2)^0.5 = 0.058K.
        Your example didn’t include a partial derivative.
        In my assumption above, I made the assumption that I had a nice NIST traceability. That kind of built into it something. Taking the infinite number of points will get you down to being able to find the true mean with certainty. However, you will still not know your accuracy because you won’t have dealt with your systematic errors. The NIST traceability lets you say that you are accurate to an accepted level. So, the NIST calibration helps you reduce the systematic error to something that is consistent with everyone else.
        The infinite number of data sample reduces your instrument random uncertainty. The idea here is that, if you took all of those points and had your error bars on all of them, there’s only one infinitesimal spot where all of the error bars overlap. You declare that to be the actual mean value to absolute certainty (infinite number of points, remember). But, you may still have systematic errors that can bias you.
        When you do the temperatures all over the place, with lots of probes, each sensor must be treated independently. Additionally, the cross-correlations must be accounted for. Furthermore, you must understand how your sensor works. For instance, I can’t assume that a pressure measurement in my home tells me anything about the pressure in my wind tunnel, no matter the uncertainty on the sensor; it is completely irrelevant to what I am measuring, but there isn’t an obvious number showing up on any calibration sheet or in any equation that tells me that fact. So, figuring out how many temperature probes you need, their uncertainty over a large area, and so forth, relies on being smart about the sensor and its ability to measure over large areas. Sensor density studies can be extremely time consuming and rather important (see the development of pressure-sensitive paints, for instance, to solve the problem of insufficiency in pressure taps on wings). Sadly, I don’t have an easy way to tell you how you should be approaching that, other than to make sure you’re really, really smart about how good a single point is for a “large” area.
        This is a decent primer on uncertainty: http://user.physics.unc.edu/~deardorf/uncertainty/UNCguide.html

  53. I have a high interest in this topic of error bars. CI’s I will have to leave for another day as I need to complete my studies in statistics first.
    I would like to approach how one might be able to research the metric of errors in temperature estimates. I am here going to approach the problem from two separate directions, or two parts:
    Part 1 : I mostly can only guess at how current estimates are derived. If only max and min temperatures are used, then that there introduces errors. Accuracy could be improved by using a higher resolution temperature data series and performing an integral over this data. I would like to explore – for myself – if this makes much of a difference.
    I would like to write about further steps towards means of getting greater accuracy, but I am going to have to leave that for another day as I am rather busy.

    Part 2 : To estimate how accurate an average surface temperature of the earth one might imagine doing an experiment with a global model of the earths weather in which the computer model has the answer to a high precision ( because it is essentially a mathematical model purposely defined with a pre-determined global temperature, for this experiment ). What we then do is locate the temperature stations within this model and use the same techniques as used by whoever, then compare to what the computer model temperature is.
    Eventually I am hoping to be able to use a historic weather recreation computer model to do what would probably be the best scientific method of calculating the average temperature of the earth, complete with error estimates / bars and all the way back to the first thermometer measurements.
    I would like to continue with ideas I have for part 2, but like I say, I am very busy.

  54. A portion of the content of this thread can be summarized with the help of the central limit theorem. It follows from the truth of the premises to this theorem that the sample mean is asymptotically normally distributed with standard deviation that varies inversely with the square root of the sample size; as the sample size increases toward infinity the standard deviation decreases toward zero. However, these premises are not necessarily true. Thus, for example, to increase the sample size toward infinity is not necessarily to decrease the standard deviation at all.

  55. Reply to Terry O ==> Quite right — your statement “… these premises are not necessarily true. Thus, for example, to increase the sample size toward infinity is not necessarily to decrease the standard deviation at all.”
    It is not necessarily true that the derived monthly station means or the yearly means are normally distributed at all. There is no scientific reason to believe so. The distribution is known to be locally, regionally, nationally and continentally skewed in various ways.
    Further, there is no real sample size….the local means are not themselves strict samples except in that they are individual numbers.
    These individual numbers come with their own accuracy range (original measurement error or uncertainty range) which can not be “divided into nothingness”. The whole Uncertainty Range, if it is a true representation of the original accuracy, remains after all the mathematics and statistics are done.

  56. I don’t see how the field of study affects the statistics but FWIW, I analyse data from MRI images to determine response (or lack thereof) to novel cancer treatments in patients over time.
    To directly answer Kip’s two questions:
    a) It simply means that 11 of the 18 data points are indistinguishable from the 2014 data point to within the precision of the measurement. I would use a statistical test such as the unpaired T-test to determine for each point what the level of (in)significance of difference from 2014 was. This does assume that the uncertainty in each measurement is normally distributed.
    b) The measurement is extremely reproducible and any trend observed would be insignificant.
    Plenty of other folks have described how these data aren’t really means, are probably inaccurate and almost certainly excessively precise which I agree with but that’s not what Kip asked.

  57. Reply to Robany ==> Thank you, Robany. I like your precise and concise answers.
    for (a), I would assume that the Uncertainty is the “same” for all the measurements, but would not assume, without some more evidence, that the uncertainty is normally distributed.
    and for (b), yes! That’s the message I see. If we were measuring something like Global Average Surface Temperature over Land and Sea, and had all these results from our various attempts (for a single moment in time) we would be happy — all out efforts produce the “same”: answer, so our method is reproducible. You are the first to mention this important point ==> We would expect data points this close on repeated measurements of an unchanging study object confirming the accuracy of our method.
    Good point!

  58. With only a half century, or so, of making measurements, when I come upon a question involving temperature measurement accuracy, I defer to my Sweet Old Boss (the SOB), who has me beat by a couple of decades. The SOB has a definitive statement on this: “It’s really easy to read a thermometer; it’s really hard to measure temperature”. So let’s consider some of the errors in measuring temperature.
    The first, of course, is error in reading the thermometer. This is probably the smallest error in the system, but is probably the one that gives the .1 C error bands in the charts. Most of the historic temperature data was taken with bulb thermometers, read by a human, and recorded on paper. With proper training, and regular retraining, .1C error bands are maybe achievable in the reading. I’m skeptical that it’s even that good in practice. I can’t even guess the errors in writing and transcribing the data.
    Major errors come from assuming the thermometer accurately measures the air temperature. The air surrounding a thermometer bulb has very low thermal mass, while the bulb is often a much higher thermal mass. The bulb is expected to be receiving no radiation from surrounding items of different temperatures, while it is also expected to be radiating no energy. This is while it is seated beneath a universe of near absolute zero, some several hundred degrees lower, or occasionally, a much nearer sun some million or so degrees higher. Of course, there is a little box around it which starts out painted white, which mitigates this radiation, somewhat, but certainly doesn’t eliminate radiation among the box and the bulb.
    Then, there’s the sampling error, like the one given when reporting election polls. This is a measure of how much the random error of taking only a sample of the population differs from what a census of the population would determine. This number is usually based solely on the number of samples taken. The number of samples taken for election polls, is often of the same magnitude as the number of temperature measurement sites in the US, particularly after the recent reductions in sites made by the US government. Note that this election error band is often quite high, and the predictions are often quite poor.
    The final error in the discussion of measurement errors is sampling bias, reasons either inadvertent, or designed, that the sample taken differs significantly from the intended population being measured. The number of these errors is too large to explore in any detail at all, so I’ll just list a few.
    • Urban Heat Island Effect
    • Sparse sampling in very large regions (such as the “missing “ arctic data)
    • Dominant sampling in populated regions
    • Changes in measurement and recording tools
    • Deterioration in these tools with age

    • Add yours here
    Probably the largest error in reported global temperature data is the tampering of the data after it is taken. This tampering is called “homogenization”, “calibration”, “analysis”, or just plain “adjustments”. This is what gives rise to the statement that “1934 temperatures have fallen considerably over the last twenty years”.
    My final conclusion is that the .1 C error bands are just as fictitious as Santa Claus. The SOB was right, as usual.

  59. Reply to Tom ==> I like this

    “It’s really easy to read a thermometer; it’s really hard to measure temperature”.

    Measuring the temperature of the Earth is a difficult and imprecise undertaking. For instance: “HadCRUT4 is presented as an ensemble data set in which the 100 constituent ensemble members sample the distribution of likely surface temperature anomalies given our current understanding of these uncertainties.” With that, they feel they can come within an Uncertainty Range of +/- 0.1°C which many here feel is way too narrow.

  60. Hello Kip,
    I had worked for years on the subject of measurement and other uncertainty in meteorologically data. See here for some details http://multi-science.metapress.com/content/12871126775524v2/. In addition I am Webmaster as well as VP of most read german speaking climate web blog EIKE (European Institute für Climate and Energy). I would like to go on contact with you, because that kind of questions you ask above are exactly those whom I am interested in. Best regards Michael Limburg

Comments are closed.