Guest Post by Kip Hansen
This post does not attempt to answer questions – instead, it asks them. I hope to draw on the expertise and training of the readers here, many of whom are climate scientists, both professional and amateur, statisticians, researchers in various scientific and medical fields, engineers and many other highly trained and educated professions.
The NY Times, and thousands of other news outlets, covered both the loud proclamations that 2014 was “the warmest year ever” and the denouncements of those proclamations. Some, like the NY Times Opinion blog, Dot Earth, unashamedly covered both.
Dr. David Whitehouse, via The GWPF, counters in his post at WUWT – UK Met Office says 2014 was NOT the hottest year ever due to ‘uncertainty ranges’ of the data — with the information from the UK Met Office:
“The HadCRUT4 dataset (compiled by the Met Office and the University of East Anglia’s Climatic Research Unit) shows last year was 0.56C (±0.1C*) above the long-term (1961-1990) average. Nominally this ranks 2014 as the joint warmest year in the record, tied with 2010, but the uncertainty ranges mean it’s not possible to definitively say which of several recent years was the warmest.” And at the bottom of the page: “*0.1° C is the 95% uncertainty range.”
The David Whitehouse essay included this image – HADCRUT4 Annual Averages with bars representing the +/-0.1°C uncertainty range:
The journal Nature has long had a policy of insisting that papers containing figures with error bars describe what the error bars represent, I thought it would be good in this case to see exactly what the Met Office means by “uncertainty range”.
In its FAQ, the Met Office says:
“It is not possible to calculate the global average temperature anomaly with perfect accuracy because the underlying data contain measurement errors and because the measurements do not cover the whole globe. However, it is possible to quantify the accuracy with which we can measure the global temperature and that forms an important part of the creation of the HadCRUT4 data set. The accuracy with which we can measure the global average temperature of 2010 is around one tenth of a degree Celsius. The difference between the median estimates for 1998 and 2010 is around one hundredth of a degree, which is much less than the accuracy with which either value can be calculated. This means that we can’t know for certain – based on this information alone – which was warmer. However, the difference between 2010 and 1989 is around four tenths of a degree, so we can say with a good deal of confidence that 2010 was warmer than 1989, or indeed any year prior to 1996.” (emphasis mine)
I applaud the Met Office for its openness and frankness in this simple statement.
Now, to the question, which derives from this illustration:
(Right-click on the image and select “View Image” if you need to see more clearly.)
This graph is created from data directly from the UK Met Office, “untouched by human hands” (no numbers were hand-copied, re-typed, rounded-off, krigged, or otherwise modified). I have greyed-out the CRUTEM4 land-only values, leaving them barely visible for reference. Links to the publically available datasets are given on the graph. I have added some text and two graphic elements:
a. In light blue, Uncertain Range bars for the 2014 value, extending back over the whole time period.
b. A ribbon of light peachy yellow, the width of the Uncertainty Range for this metric, overlaid in such a way as to cover the maximum number of values on the graph.
Here is the question:
What does this illustration mean scientifically?
More precisely — If the numbers were in your specialty – engineering, medicine, geology, chemistry, statistics, mathematics, physics – and were results of a series of measurements over time, what would it mean to you that:
a. Eleven of the 18 mean values lie within the Uncertainty Range bars of the most current mean value, 2014?
b. All but three values (1996, 1999, 2000) can be overlaid by a ribbon the width of the Uncertainty Range for the metric being measured?
Let’s have answers and observations from as many different fields of endeavor as possible.
# # # # #
Authors Comment Policy: I have no vested opinion on this matter – and no particular expertise myself. (Oh, I do have an opinion, but it is not very well informed.) I’d like to hear yours, particularly those with research experience in other fields.
This is not a discussion of “Was 2014 the warmest year?” or any of its derivatives. Simple repetitions of the various Articles of Faith from either of the two opposing Churches of Global Warming (for and against) will not add much to this discussion and are best left for elsewhere.
As Judith Curry would say: This is a technical thread — it is meant to be a discussion about scientific methods of recognizing what uncertainty ranges, error bars, and CIs can and do tell us about the results of research. Please try to restrict your comments to this issue, thank you.
# # # # #


As a chemist, I’d look at that graph and say ” do some more measurements and wait and see”
As far as the process being a prediction not an actual measurement(Mosher), I had a very enlightening experience as an intern trying to extract and recover a protein from a raw material. I ran at least 100 extractions and the statisticians plotted the results for me. The recovery rate ran from as low as 20% up to a rather broad area showing 80% recovery. The real kicker than one experiment, directly in the center of that broad area, returned 98%. Despite repeated tries most of the experiments resulted in 80+/-% with a couple batches yielding 92-93%.
That taught me not to trust various curve fitting exercises such as Mosher is talking about. There obviously were important variables in play that we had not discovered yet.
It is pretty obvious that the same applies to the climate that there are large, important variables involved that are not amenable to study by averaging or curve fitting. As someone else pointed out, the GAT doesn’t mean anything when it is -126degC in central Antarctica, and probably not over -100degC for hundreds of miles. Any kind of average temperature has little to do with the energy balances of the actual climate processes.
Reply to Phil Cartier ==> Thanks for the chemist viewpoint — and the enlightening story. –kh
Another way to consider this issue is significant figures. Measuring data to three significant figures requires an accuracy of 1%, e.g. xy.z. An instrument with an accuracy of +/- 0.5 % would mean an uncertainty band of 1%. Now suppose you collect thousands of such data points. The result can not have more significant figures than the data. The average of a thousand xy.z data points can not be expressed as xy.zb. If I recall correctly statistical methods require that the result be rounded to two significant figures, e.g. xy. Insignificant trends would then be lost in the cloud of data, in the uncertainty bands.
“The result can not have more significant figures than the data.”
You give no authority, and it simply isn’t true. The error of the mean is less that the error of the parts. Why do you think people go to the expense of assembling large datasets?
The classic is polling. Each response is just binary – 0 or 1. Yet with 1000 responses, you get 2 sigfig meaningful data.
I don’t think so.
http://www.usca.edu/chemistry/genchem/sigfig2.htm
For those not bothered to look at the link here is a summary
When adding or subtracting numbers, count the NUMBER OF DECIMAL PLACESto determine the number of significant figures. The answer cannot CONTAIN MORE PLACES AFTER THE DECIMAL POINT THAN THE SMALLEST NUMBER OF DECIMAL PLACES in the numbers being added or subtracted.
When multiplying or dividing numbers, count the NUMBER OF SIGNIFICANT FIGURES. The answer cannot CONTAIN MORE SIGNIFICANT FIGURES THAN THE NUMBER BEING MULTIPLIED OR DIVIDED with the LEAST NUMBER OF SIGNIFICANT FIGURES.
Reply to Nick Stokes ==> Osborn quotes a text below.
Large databases do not eliminate original measurement error — they don’t, really. 100,000 poorly measured data can not be transmogrified into 1 scientifically precise mean.
A C Osborn February 2, 2015 at 9:50 am
“I don’t think so.”
OK. Suppose you average 1000 numbers, all close to 1, expressed to 1 dp. You add them – you have, by those rules, a total of 1022.1. You divide by 1000 – that is exact, so by the division rules, the answer is 1.0221, according to those rules.
Reply to Nick Stokes 1:43 pm
The division by 1000 is actually division by 1000.0000000… (as you say, it is exact, as it is a discrete value). Due to all of the other values being expressible to one decimal place, your sum is also correct. But, due to the division rules, your value of 1022.1/1000.0000000000000…. is 1.0. There is no way to resolve this further, due to the limited precision of the earlier measurements. The uncertainty may appear to be resolved below the precision level, but in actuality, it can never be less than precision. Thus, the measurement you show has to end up at 1.0.
Headcounts do not have physical units of measurement and your result is simply a nebulous proportion of a total. Big difference. Though the recording may be incorrect, the measurement error is zero. Please do not conflate reality with statistical hair splitting fit only for the rear quarters of a political donkey.
Polling is very different from measuring temperature, Nick. For starters, in polls you assume the answers are independent. And as you state, you can only get a given set of responses. If a person answers “yes” to a question, there is no measurment error. A very different animal.
Besides, in a poll you are more interested in looking at the spread of data. The spread is not measurment error. Some people seem to confuse standard deviation in a sample with measurement error. These are two very different things.
And finally you need to keep in mind that there is a difference in precision and accuracy when it comes to measurements. Picture a hunter with a rifle firing five shots at a target. The five shots are clustered on the target, but off by 10 cm to the left and slightly up. He thus has an instrument of precision (the clustering) but a problem with accuracy (off target). That is why issues of calibration also come into play here, keeping the real measurement error high.
“But, due to the division rules, your value of 1022.1/1000.0000000000000…. is 1.0.”
No. The division rule says:
“The answer cannot CONTAIN MORE SIGNIFICANT FIGURES THAN THE NUMBER BEING MULTIPLIED OR DIVIDED with the LEAST NUMBER OF SIGNIFICANT FIGURES.”
(not my caps)
The numerator has 5, the denom ∞. The minimum of those is 5, not 1.
Sorry. My error in explanation. Go back to my point on precision. Expand in a single step, rather than convoluting by using multiple steps. Thus: (1.2+1.5+0.7+….)/1000. Within the expanded set, you follow the traditional division rule of “what number has the lowest number of significant figures?” You don’t change the precision allowed just by adding additional steps. My thermo professor hammered us hard on this one.
The minimum number of significant figures at any time in the calculation series you post is 2. Therefore, then answer cannot have more than two. Furthermore, it only ever goes out to one significant figure beyond the decimal place. There is no method for going beyond that.
1.0 has two significant figures.
I again apologize for my error and thank you for pointing it out.
“Therefore, then answer cannot have more than two.”
No. The add rule is just about decimal points, and says nothing about sigfig. As you add (positive) numbers, the number of sigfigs can increase; the dp stays constant. That is how you build up 5 sigfigs. Division by 1000 doesn’t change anything, sigfigwise. It’s the same as converting mm to m.
It’s just a rule, not a statistical method. It overestimates the accuracy of the mean.
I see what I was doing wrong. Quite right, Nick. Thanks for the patience.
“all close to 1, expressed to 1 dp.”
Hence, the result is only significant to “1 dp.” which, here, we call one decimal point. End of story, Nick!! Maybe you never had to pass a class, requiring you to demonstrate knowledge of this fundamental point of Data Management, to subsequently become gainfully employed. Lots of us did, you should go back and reread the link, and consider its implications for your life…
Kip, I don’t know what you are planning to do with this errors thing. Let’s accept the numbers as given with their error bars. The real objective in all this gets lost in all this unnecessary detail, adjustments and agonizing over how to calculate error. I understand the objective is to detect whether we are headed for death by fire, death by ice, death by inundation…
To do this, there is no need to worry about errors in the monthly, yearly, decadely record. The raw data is perfectly capable of informing us over a period of a few decades what we have in store for us. We should of course do it as cleanly as we can. For this purpose, get rid of everything but well sited rural instruments – indeed, install a hundred pairs or triplets of good instruments in national parks around the world, each in a suitable micro field. If it is really important to know what danger lies ahead, zone a large area around the recording site permitting no building, pavement, etc and keep the shrubs from encroaching or whatever measures are deemed necessary. Maybe let satellites collect the data from them and pay guards to patrol the perimeter – please don’t mention cost in this!!! For sea level, go with the tide gauges or, if we are happy with GPS, this will do fine -millimetres per year for this objective are a measure of nothing important happening as are tenths of degree. Finally, before deployment or selection of existing, have a meeting of 3 -5 randomly selected unpaid volunteers to meet and decide on a fixed algorithm for processing the data.
In summary, If a bolide 500m in diameter is heading toward earth, don’t go down to the sea with a micrometer to try to decide what the catastrophe will be like.
Reply to Gary P ==> I don’t plan to do anything at all….I’m just curious about the original point. I read lots of studies (climate, medical, clinical, psychological, etc). Uncertainty Ranges, Error bars, and CIs are often confused for one another, unidentified, undefined, and/or ignored altogether. Often they are statistically determined by Maths Package and have nothing to do with the actual measurements used in the experiment.
I appreciate your participation here today.
A blast from the past.
H/T
https://stevengoddard.wordpress.com/2015/02/01/1907-it-is-your-patriotic-duty-to-rebel-against-climate-data-tampering/
Subject: False Climate Claims by NOAA
To: Climate-portal@noaa.gov,
Climate-ClimateWatchMagazine@noaa.gov,
Climate-DataAndServices@noaa.gov,
Climate-Education@noaa.gov,
Climate-UnderstandingClimate@noaa.gov
Sirs and Mesdames;
You have not been truthful:
http://wattsupwiththat.com/2015/02/01/uncertainty-ranges-error-bars-and-cis/
Your intentional and conscious deception of the public has severely damaged your credibility. You have done permanent damage to science and scientific endeavor.
It is appalling, disgusting and disgraceful.
Very truly yours,
I am a medical scientist mostly dealing with identification of risk factors for clinical events using multivariate models.
The first thing that comes to my mind would be to assess the distribution of data. It does not seem to me (but I don’t have the data set to check it) that surface temperatures are normally distributed (either spatially or temporary). In such a case to show confidence intervals with SD is likely not appropriate. Median and interquartile range should be used instead. In my limited experience in the files I never saw the details of how confidence intervals are calculated for temperature data sets.
From what I have seen they make it up as they go along.
Apologies of the typo
“…. experience in the field”
Reply to Dr. Napolitano ==> Thank you for your input — I believe that your are correct that surface temperatures are not normally distributed in space or time. I’d like to read opinions from others on this point.
Far above in the comments I link t the two papers used by Met Office UK to set their Uncertainty Range for this metric.
Kip, I tend to agree with Dr. Napolitano, how can the temperatures be normally distributed when they depend so much on local climatic variations, prevailing wind directions, Humidity, geologic position (close to hills/mountains etc), geologic conditions (Volcanic activity), current directions when coastal.
I have seen a study somewhere, but can’t remember where, that coastal sites are controlled by the seas and have very different temps & ranges to inland sites.
All the sites added together may end up as Normally distributed, but that loses so much data. I just don’t believe in a “Global Temperature”, surely all the sites should be individually analysed for trend and the decision of warming/cooling be based on the majority trend.
As to the business of gridding or Krigging it is absolute crap and the perfect tool for deception.
Reply to Dr. Napolitano and A C Osborn ==> I suppose it would be possible to see if strong>any of the temperature data was normally distributed — there are sources for the gridded means used to arrive at HADCRUT4. of course, the gridded means are arrived at themselves by formulas that may force a normal distribution by smoothing and adjusting based on the assumption of normality.
Any deep data people still reading here? Does the HADCRUT4 process force an assumed normal distribution on the gridded data? (Does that question even make sense?)
Kip Hansen commented
I have 1×1 gridded data based on NCDC’s GSoD data(land only) that’s straight averaged in csv files.
here
http://sourceforge.net/projects/gsod-rpts/files/Reports/LatLon%201×1%20Box/
I’m an analytical chemist and have dealt with QA and QC data for many years. The chart reminds me of a control chart, with upper and lower boundaries based on the precision of a method (or temp measurements here). Excursions beyond the boundaries for an analytical method suggest that a non-random factor has developed, or that a random, albeit low probability event has occurred. But weather is not a chemical method. Very good points have been raised about normal distributions. In a hypothetical Earth, we could set up a grid over the entire planet, say a million equally spaced sensor points, and we might find a normal distribution of temperatures across the globe. But that’s not the historical reality. It might be possible in the future with satellite monitoring. Given the irregular siting of sensors, skewed distributions could be expected, and non-normal statistics applied. Just today an article by some NPR reporters appeared comparing current temperatures in Minnesota with those observed 150 years ago…not one mention of possible sampling error due to historically sparse measurements vs more widespread current measurements. Not a mention of the heat island of the Twin City metro. NPR could use a little science.
Reply to Larry Potts ==> Yes, it is obvious that the data do not wander, during this time period, much out of the range identified as the Uncertainty Range fore the metric — data within that range can be said to be “the same”. Excursions tell us something is going on (or that our Uncertainty Range is too narrow.
I think that modern satellite temperature measurement has show us the temperatures are not evenly spread and I have seen no evidence that it has been demonstrated that spatial grids show a normal distribution of temperatures that would be expected if temperatures were simply “cold at the poles and hot at the equator, give or take altitude”. This brings into question all of the infilling methods used to create HADCRUT4 (or BEST, or GISS). This doesn’t mean that we really expect the cold poles, hot equator model, only that the belief that temperature for point B can be determined by its distance from points A and C whose temperatures are know is probably false.
As an engineer looking at that data plot, I would conclude: No change in the average over that time period, but sone unusual variations 1996-1998.
Reply to Robber ==> Thanks for the engineering viewpoint — I tend to agree.
Okay, I’ll give this a shot, though I doubt anyone will read it. I occasionally work uncertainty issues for a major wind tunnel organization, so you can decide if that’s relevant.
Let’s say you have a value A for which you desire the uncertainty. To determine the value of A, you use an equation which draws on a variety of direct, uncorrelated measurements (x1, x2, x3,…). Thus, the equation is A(x1, x2, x3,…).
Let us further assume that you have lucked out and all of your instruments have 95% confidence uncertainties provided, as well as NIST accuracy traceabilities. The uncertainties are labeled as: Ux1, Ux2, Ux3,…. The accuracies are just there for you to make sure you’re only reading to a reasonable number of decimal places, etc etc.
Noting that I am going to use a standard letter d to denote a partial derivative, the uncertainty in value A is found by: UA = [{(dA/dx1)^2}(Ux1^2)+{(dA/dx2)^2}(Ux2^2)+{(dA/dx3)^2}(Ux3^2)+…]^(1/2).
So, you need to know the equations being used, and all of the factors. A simple average of a three of temperatures where all of the sensors were good to +/- 0.1C, for instance, will give you an uncertainty of +/- 0.06C. This is where taking multiple measurements at the same point in 4D space becomes better than the known uncertainty.
Incidentally, units are very important. While it doesn’t matter in this case (because the partial derivatives are simple), you really should always do your uncertainties in absolutes (Rankine, Kelvin, psia, atm, meters, feet, slugs, etc.). If you use relative/differential measures (F, C, psid/psig, in H20, mm Hg), you will actually misestimate your uncertainty automatically, unless you fix each location appropriately (really hard, easy to mess up, don’t do it).
If, however, your values are correlated, or are given more complicated manipulations, generally the uncertainties will cause other problems. There is a way of manipulating the terms (normalizing, really) to give you sensitivity coefficients to tell you what you should improve to get the most bang for your buck. It could be that the uncertainty on the size of land plots is the worst thing. Altitude measurements could be (almost certainly are) horribly inaccurate. But improving those measurements might be of limited value if their influence is low.
Figuring out the uncertainties on climate models would be a painful task. It is impossible if any equations are hidden. The same is true of experimentally derived data, such as global mean.
As for the question originally posed: I’m not sure those options cover it. The blue lines are useful for showing that 2014 is (with 95% certainty) higher than three temperatures (1996, 1999, 2000).
The yellow is probably not true, given that it is the same width as the blue lines. Generously assuming it to be true, however, it’s not terribly useful. Once you put the error bars on the three “outliers”, those measurements may have been within that error bound on the average. Therefore, all of the temperatures (to 95% certainty) fall within the 21st century average temperature range. The only real utility I see for that, however, is outlier rejection. The fact is, none of these years are (by themselves) terribly interesting.
Note sure I follow… and yes I read it! 😉
”
So, you need to know the equations being used, and all of the factors. A simple average of a three of temperatures where all of the sensors were good to +/- 0.1C, for instance, will give you an uncertainty of +/- 0.06C. This is where taking multiple measurements at the same point in 4D space becomes better than the known uncertainty.
”
According to this, averaging different measurement, the uncertainty is lower… this means that the more measurement you add the better the precision? So adding thousand of different points would provide infinite precision?
When you say, taking measurements at the same point…. This is not what we are doing, it’s not multiple temperature measurement at the same probe that we average, it’s different probes, different location that we average.
A quick test in excel of 2 and 3 probes, show that my uncertainty is simply added up, but I may be mistaken in my understanding. Other where using the SQRT(U1^2+U2^2+U3^2… Un^2) to find the final one. Even this method, the uncertainty grows.
It’s been a long time since I studied maths, maybe a mathematician can help here?
Here’s a very basic example. So, using the three temperatures (t1, t2, t3) to find an average temp T, and assuming: Ut1 = Ut2 = Ut3 = 0.1C = 0.1K (converting to absolute value, which may matter in other places). T=(t1+t2+t3)/3. dT/dt1 = dT/dt2 = dT/dt3 = 1/3.
Then UT = ((dT/dt1)^2*(Ut1)^2)+(dT/dt2)^2*(Ut2)^2)+(dT/dt3)^2*(Ut3)^2))^.5 = ((1/3)^2*(0.1K)^2+(1/3)^2*(0.1K)^2+(1/3)^2*(0.1K)^2)^0.5=(3*1/9*0.01K^2)^0.5 = (1/3*0.01K^2)^.5 = (0.0033K^2)^0.5 = 0.058K.
Your example didn’t include a partial derivative.
In my assumption above, I made the assumption that I had a nice NIST traceability. That kind of built into it something. Taking the infinite number of points will get you down to being able to find the true mean with certainty. However, you will still not know your accuracy because you won’t have dealt with your systematic errors. The NIST traceability lets you say that you are accurate to an accepted level. So, the NIST calibration helps you reduce the systematic error to something that is consistent with everyone else.
The infinite number of data sample reduces your instrument random uncertainty. The idea here is that, if you took all of those points and had your error bars on all of them, there’s only one infinitesimal spot where all of the error bars overlap. You declare that to be the actual mean value to absolute certainty (infinite number of points, remember). But, you may still have systematic errors that can bias you.
When you do the temperatures all over the place, with lots of probes, each sensor must be treated independently. Additionally, the cross-correlations must be accounted for. Furthermore, you must understand how your sensor works. For instance, I can’t assume that a pressure measurement in my home tells me anything about the pressure in my wind tunnel, no matter the uncertainty on the sensor; it is completely irrelevant to what I am measuring, but there isn’t an obvious number showing up on any calibration sheet or in any equation that tells me that fact. So, figuring out how many temperature probes you need, their uncertainty over a large area, and so forth, relies on being smart about the sensor and its ability to measure over large areas. Sensor density studies can be extremely time consuming and rather important (see the development of pressure-sensitive paints, for instance, to solve the problem of insufficiency in pressure taps on wings). Sadly, I don’t have an easy way to tell you how you should be approaching that, other than to make sure you’re really, really smart about how good a single point is for a “large” area.
This is a decent primer on uncertainty: http://user.physics.unc.edu/~deardorf/uncertainty/UNCguide.html
I have a high interest in this topic of error bars. CI’s I will have to leave for another day as I need to complete my studies in statistics first.
I would like to approach how one might be able to research the metric of errors in temperature estimates. I am here going to approach the problem from two separate directions, or two parts:
Part 1 : I mostly can only guess at how current estimates are derived. If only max and min temperatures are used, then that there introduces errors. Accuracy could be improved by using a higher resolution temperature data series and performing an integral over this data. I would like to explore – for myself – if this makes much of a difference.
I would like to write about further steps towards means of getting greater accuracy, but I am going to have to leave that for another day as I am rather busy.
…
Part 2 : To estimate how accurate an average surface temperature of the earth one might imagine doing an experiment with a global model of the earths weather in which the computer model has the answer to a high precision ( because it is essentially a mathematical model purposely defined with a pre-determined global temperature, for this experiment ). What we then do is locate the temperature stations within this model and use the same techniques as used by whoever, then compare to what the computer model temperature is.
Eventually I am hoping to be able to use a historic weather recreation computer model to do what would probably be the best scientific method of calculating the average temperature of the earth, complete with error estimates / bars and all the way back to the first thermometer measurements.
I would like to continue with ideas I have for part 2, but like I say, I am very busy.
A portion of the content of this thread can be summarized with the help of the central limit theorem. It follows from the truth of the premises to this theorem that the sample mean is asymptotically normally distributed with standard deviation that varies inversely with the square root of the sample size; as the sample size increases toward infinity the standard deviation decreases toward zero. However, these premises are not necessarily true. Thus, for example, to increase the sample size toward infinity is not necessarily to decrease the standard deviation at all.
Here, a new statistical study proving that the models are doing just fine in predicting global temperature trends:
http://phys.org/news/2015-02-global-slowdown-systematic-errors-climate.html
It appears that the last leg of the stool for those that dispute AGW has been removed.
Let’s look at the projected global trend from the first IPCC report, based on 1970 to today, and the best estimate for climate sensitivity of 0.81°C/W/m², or 3°C for a doubling of atmospheric CO₂ from 280 to 560ppm:
0.16°C per decade.
And the observed global warming trend from NASA’s GISTEMP temperature series:
0.17°C per decade.
It would be hard to get much closer agreement than one hundredth of a degree over a decade. So yes, models are accurate, as shown by the observations.
Agreed if you believe that the corrections, homogenization and other artificial artifact used on raw data did not create this warming artificially!
When I see graphs like this, I doubt it:
http://globalwarmingsolved.com/wp-content/uploads/2013/12/us_urban_trends.jpg
http://globalwarmingsolved.com/wp-content/uploads/2013/11/linear_trends.png
source: http://globalwarmingsolved.com/
Reply to Simon F ==> Very nice graphs!
There are a lot of questions about adjustments to the historical record.
This question, and a half dozen others, are why I narrowed this essay question to one single point.
Thanks for your input.
Reply to Terry O ==> Quite right — your statement “… these premises are not necessarily true. Thus, for example, to increase the sample size toward infinity is not necessarily to decrease the standard deviation at all.”
It is not necessarily true that the derived monthly station means or the yearly means are normally distributed at all. There is no scientific reason to believe so. The distribution is known to be locally, regionally, nationally and continentally skewed in various ways.
Further, there is no real sample size….the local means are not themselves strict samples except in that they are individual numbers.
These individual numbers come with their own accuracy range (original measurement error or uncertainty range) which can not be “divided into nothingness”. The whole Uncertainty Range, if it is a true representation of the original accuracy, remains after all the mathematics and statistics are done.
I don’t see how the field of study affects the statistics but FWIW, I analyse data from MRI images to determine response (or lack thereof) to novel cancer treatments in patients over time.
To directly answer Kip’s two questions:
a) It simply means that 11 of the 18 data points are indistinguishable from the 2014 data point to within the precision of the measurement. I would use a statistical test such as the unpaired T-test to determine for each point what the level of (in)significance of difference from 2014 was. This does assume that the uncertainty in each measurement is normally distributed.
b) The measurement is extremely reproducible and any trend observed would be insignificant.
Plenty of other folks have described how these data aren’t really means, are probably inaccurate and almost certainly excessively precise which I agree with but that’s not what Kip asked.
Reply to Robany ==> Thank you, Robany. I like your precise and concise answers.
for (a), I would assume that the Uncertainty is the “same” for all the measurements, but would not assume, without some more evidence, that the uncertainty is normally distributed.
and for (b), yes! That’s the message I see. If we were measuring something like Global Average Surface Temperature over Land and Sea, and had all these results from our various attempts (for a single moment in time) we would be happy — all out efforts produce the “same”: answer, so our method is reproducible. You are the first to mention this important point ==> We would expect data points this close on repeated measurements of an unchanging study object confirming the accuracy of our method.
Good point!
With only a half century, or so, of making measurements, when I come upon a question involving temperature measurement accuracy, I defer to my Sweet Old Boss (the SOB), who has me beat by a couple of decades. The SOB has a definitive statement on this: “It’s really easy to read a thermometer; it’s really hard to measure temperature”. So let’s consider some of the errors in measuring temperature.
The first, of course, is error in reading the thermometer. This is probably the smallest error in the system, but is probably the one that gives the .1 C error bands in the charts. Most of the historic temperature data was taken with bulb thermometers, read by a human, and recorded on paper. With proper training, and regular retraining, .1C error bands are maybe achievable in the reading. I’m skeptical that it’s even that good in practice. I can’t even guess the errors in writing and transcribing the data.
Major errors come from assuming the thermometer accurately measures the air temperature. The air surrounding a thermometer bulb has very low thermal mass, while the bulb is often a much higher thermal mass. The bulb is expected to be receiving no radiation from surrounding items of different temperatures, while it is also expected to be radiating no energy. This is while it is seated beneath a universe of near absolute zero, some several hundred degrees lower, or occasionally, a much nearer sun some million or so degrees higher. Of course, there is a little box around it which starts out painted white, which mitigates this radiation, somewhat, but certainly doesn’t eliminate radiation among the box and the bulb.
Then, there’s the sampling error, like the one given when reporting election polls. This is a measure of how much the random error of taking only a sample of the population differs from what a census of the population would determine. This number is usually based solely on the number of samples taken. The number of samples taken for election polls, is often of the same magnitude as the number of temperature measurement sites in the US, particularly after the recent reductions in sites made by the US government. Note that this election error band is often quite high, and the predictions are often quite poor.
The final error in the discussion of measurement errors is sampling bias, reasons either inadvertent, or designed, that the sample taken differs significantly from the intended population being measured. The number of these errors is too large to explore in any detail at all, so I’ll just list a few.
• Urban Heat Island Effect
• Sparse sampling in very large regions (such as the “missing “ arctic data)
• Dominant sampling in populated regions
• Changes in measurement and recording tools
• Deterioration in these tools with age
•
• Add yours here
Probably the largest error in reported global temperature data is the tampering of the data after it is taken. This tampering is called “homogenization”, “calibration”, “analysis”, or just plain “adjustments”. This is what gives rise to the statement that “1934 temperatures have fallen considerably over the last twenty years”.
My final conclusion is that the .1 C error bands are just as fictitious as Santa Claus. The SOB was right, as usual.
Reply to Tom ==> I like this
Measuring the temperature of the Earth is a difficult and imprecise undertaking. For instance: “HadCRUT4 is presented as an ensemble data set in which the 100 constituent ensemble members sample the distribution of likely surface temperature anomalies given our current understanding of these uncertainties.” With that, they feel they can come within an Uncertainty Range of +/- 0.1°C which many here feel is way too narrow.
Statistics: the uncertain quantification of uncertainty!
Hello Kip,
I had worked for years on the subject of measurement and other uncertainty in meteorologically data. See here for some details http://multi-science.metapress.com/content/12871126775524v2/. In addition I am Webmaster as well as VP of most read german speaking climate web blog EIKE (European Institute für Climate and Energy). I would like to go on contact with you, because that kind of questions you ask above are exactly those whom I am interested in. Best regards Michael Limburg
Michael Limburg commented
Michael, That was a very interesting prefix, is there anyplace I can get access to the complete paper(without having to pay for access)?
Hi Mi Cro, yes you can. Here ist the link: http://www.eike-klima-energie.eu/uploads/media/E___E_algorithm_error_07-Limburg.pdf Discussion welcome. My Email m.limburg@eike-klima-energie.eu
@Michael Limburg
The intro say ” It adds therefore a minimum additional systematic uncertainty of + 0.3 °C and – 0.23°C respectively to any global mean anomaly calculation.”
Are we to understand that we need to add this to the +- 0.2 uncertainty… so we would get +0.5 and -0.43 to global mean data-sets?
Thanks,
Yes, that is the result applying it to ± 0.2 K. But the minimum uncertainty is in reality much larger, than ± 0.2 K. To support this I cite Frank 2010 and 2011, as well as I defined a number of additional systematic errors which show up everywhere but nobody might be able (backwards in time) to quantify them. Here you can download the full paper. http://www.eike-klima-energie.eu/uploads/media/E___E_algorithm_error_07-Limburg.pdf