
Guest essay by Clyde Spencer
Introduction
The point of this article is that one should not ascribe more accuracy and precision to available global temperature data than is warranted, after examination of the limitations of the data set(s). One regularly sees news stories claiming that the recent year/month was the (first, or second, etc.) warmest in recorded history. This claim is reinforced with a stated temperature difference or anomaly that is some hundredths of a degree warmer than some reference, such as the previous year(s). I’d like to draw the reader’s attention to the following quote from Taylor (1982):
“The most important point about our two experts’ measurements is this: like most scientific measurements, they would both have been useless, if they had not included reliable statements of their uncertainties.”
Before going any further, it is important that the reader understand the difference between accuracy and precision. Accuracy is how close a measurement (or series of repeated measurements) is to the actual value, and precision is the resolution with which the measurement can be stated. Another way of looking at it is provided by the following graphic:

The illustration implies that repeatability, or decreased variance, is a part of precision. It is, but more importantly, it is the ability to record, with greater certainty, where a measurement is located on the continuum of a measurement scale. Low accuracy is commonly the result of systematic errors; however, very low precision, which can result from random errors or inappropriate instrumentation, can contribute to individual measurements having low accuracy.
Accuracy
For the sake of the following discussion, I’ll ignore issues with weather station siting problems potentially corrupting representative temperatures and introducing bias. However, see this link for a review of problems. Similarly, I’ll ignore the issue of sampling protocol, which has been a major criticism of historical ocean pH measurements, but is no less of a problem for temperature measurements. Fundamentally, temperatures are spatially-biased to over-represent industrialized, urban areas in the mid-latitudes, yet claims are made for the entire globe.
There are two major issues with regard to the trustworthiness of current and historical temperature data. One is the accuracy of recorded temperatures over the useable temperature range, as described in Table 4.1 at the following link:
http://www.nws.noaa.gov/directives/sym/pd01013002curr.pdf
Section 4.1.3 at the above link states:
“4.1.3 General Instruments. The WMO suggests ordinary thermometers be able to measure with high certainty in the range of -20°F to 115°F, with maximum error less than 0.4°F…”
In general, modern temperature-measuring devices are required to be able to provide a temperature accurate to about ±1.0° F (0.56° C) at its reference temperature, and not be in error by more than ±2.0° F (1.1° C) over their operational range. Table 4.2 requires that the resolution (precision) be 0.1° F (0.06° C) with an accuracy of 0.4° F (0.2° C).
The US has one of the best weather monitoring programs in the world. However, the accuracy and precision should be viewed in the context of how global averages and historical temperatures are calculated from records, particularly those with less accuracy and precision. It is extremely difficult to assess the accuracy of historical temperature records; the original instruments are rarely available to check for calibration.
Precision
The second issue is the precision with which temperatures are recorded, and the resulting number of significant figures retained when calculations are performed, such as when deriving averages and anomalies. This is the most important part of this critique.
If a temperature is recorded to the nearest tenth (0.1) of a degree, the convention is that it has been rounded or estimated. That is, a temperature reported as 98.6° F could have been as low as 98.55 or as high as 98.64° F.
The general rule of thumb for addition/subtraction is that no more significant figures to the right of the decimal point should be retained in the sum, than the number of significant figures in the least precise measurement. When multiplying/dividing numbers, the conservative rule of thumb is that, at most, no more than one additional significant figure should be retained in the product than that which the multiplicand with the least significant figures contains. Although, the rule usually followed is to retain only as many significant figures as that which the least precise multiplicand had. [For an expanded explanation of the rules of significant figures and mathematical operations with them, go to this Purdue site.]
Unlike a case with exact integers, a reduction in the number of significant figures in even one of the measurements in a series increases uncertainty in an average. Intuitively, one should anticipate that degrading the precision of one or more measurements in a set should degrade the precision of the result of mathematical operations. As an example, assume that one wants the arithmetic mean of the numbers 50., 40.0, and 30.0, where the trailing zeros are the last significant figure. The sum of the three numbers is 120., with three significant figures. Dividing by the integer 3 (exact) yields 40.0, with an uncertainty in the next position of ±0.05 implied.
Now, what if we take into account the implicit uncertainty of all the measurements? For example, consider that, in the previously examined set, all the measurements have an implied uncertainty. The sum of 50. ±0.5 + 40.0 ±0.05 + 30.0 ±0.05 becomes 120. ±0.6. While not highly probable, it is possible that all of the errors could have the same sign. That means, the average could be as small as 39.80 (119.4/3), or as large as 40.20 (120.6/3). That is, 40.00 ±0.20; this number should be rounded down to 40.0 ±0.2. Comparing these results, with what was obtained previously, it can be seen that there is an increase in the uncertainty. The potential difference between the bounds of the mean value may increase as more data are averaged.
It is generally well known, especially amongst surveyors, that the precision of multiple, averaged measurements varies inversely with the square-root of the number of readings that are taken. Averaging tends to remove the random error in rounding when measuring a fixed value. However, the caveats here are that the measurements have to be taken with the same instrument, on the same fixed parameter, such as an angle turned with a transit. Furthermore, Smirnoff (1961) cautions, ”… at a low order of precision no increase in accuracy will result from repeated measurements.” He expands on this with the remark, “…the prerequisite condition for improving the accuracy is that measurements must be of such an order of precision that there will be some variations in recorded values.” The implication here is that there is a limit to how much the precision can be increased. Thus, while the definition of the Standard Error of the Mean is the Standard Deviation of samples divided by the square-root of the number of samples, the process cannot be repeated indefinitely to obtain any precision desired!1
While multiple observers may eliminate systematic error resulting from observer bias, the other requirements are less forgiving. Different instruments will have different accuracies and may introduce greater imprecision in averaged values.
Similarly, measuring different angles tells one nothing about the accuracy or precision of a particular angle of interest. Thus, measuring multiple temperatures, over a series of hours or days, tells one nothing about the uncertainty in temperature, at a given location, at a particular time, and can do nothing to eliminate rounding errors. A physical object has intrinsic properties such as density or specific heat. However, temperatures are ephemeral and one cannot return and measure the temperature again at some later time. Fundamentally, one only has one chance to determine the precise temperature at a site, at a particular time.
The NOAA Automated Surface Observing System (ASOS) has an unconventional way of handling ambient temperature data. The User’s Guide says the following in section 3.1.2:
“Once each minute the ACU calculates the 5-minute average ambient temperature and dew point temperature from the 1-minute average observations… These 5-minute averages are rounded to the nearest degree Fahrenheit, converted to the nearest 0.1 degree Celsius, and reported once each minute as the 5-minute average ambient and dew point temperatures…”
This automated procedure is performed with temperature sensors specified to have an RMS error of 0.9° F (0.5° C), a maximum error of ±1.8° F (±1.0° C), and a resolution of 0.1° F (0.06° C) in the most likely temperature ranges encountered in the continental USA. [See Table 1 in the User’s Guide.] One (1. ±0.5) degree Fahrenheit is equivalent to 0.6 ±0.3 degrees Celsius. Reporting the rounded Celsius temperature, as specified above in the quote, implies a precision of 0.1° C when only 0.6 ±0.3° C is justified, thus implying a precision 3 to 9-times greater than what it is. In any event, even using modern temperature data that are commonly available, reporting temperature anomalies with two or more significant figures to the right of the decimal point is not warranted!
Consequences
Where these issues become particularly important is when temperature data from different sources, which use different instrumentation with varying accuracy and precision, are used to consolidate or aggregate all available global temperatures. Also, it becomes an issue in comparing historical data with modern data, and particularly in computing anomalies. A significant problem with historical data is that, typically, temperatures were only measured to the nearest degree (As with modern ASOS temperatures!). Hence, the historical data have low precision (and unknown accuracy), and the rule given above for subtraction comes into play when calculating what are called temperature anomalies. That is, data are averaged to determine a so-called temperature baseline, typically for a 30-year period. That baseline is subtracted from modern data to define an anomaly. A way around the subtraction issue is to calculate the best historical average available, and then define it as having as many significant figures as modern data. Then, there is no requirement to truncate or round modern data. One can then legitimately say what the modern anomalies are with respect to the defined baseline, although it will not be obvious if the difference is statistically significant. Unfortunately, one is just deluding themselves if they think that they can say anything about how modern temperature readings compare to historical temperatures when the variations are to the right of the decimal point!
Indicative of the problem is that data published by NASA show the same implied precision (±0.005° C) for the late-1800s as for modern anomaly data. The character of the data table, with entries of 1 to 3 digits with no decimal points, suggests that attention to significant figures received little consideration. Even more egregious is the representation of precision of ±0.0005° C for anomalies in a Wikipedia article wherein NASA is attributed as the source.
Ideally, one should have a continuous record of temperatures throughout a 24-hour period and integrate the area under the temperature/time graph to obtain a true, average daily temperature. However, one rarely has that kind of temperature record, especially for older data. Thus, we have to do the best we can with the data that we have, which is often a diurnal range. Taking a daily high and low temperature, and averaging them separately, gives one insight on how station temperatures change over time. Evidence indicates that the high and low temperatures are not changing in parallel over the last 100 years; until recently, the low temperatures were increasing faster than the highs. That means, even for long-term, well-maintained weather stations, we don’t have a true average of temperatures over time. At best, we have an average of the daily high and low temperatures. Averaging them creates an artifact that loses information.
When one computes an average for purposes of scientific analysis, conventionally, it is presented with a standard deviation, a measure of variability of the individual samples of the average. I have not seen any published standard deviations associated with annual global-temperature averages. However, utilizing Tchebysheff’s Theorem and the Empirical Rule (Mendenhall, 1975), we can come up with a conservative estimate of the standard deviation for global averages. That is, the range in global temperatures should be approximately four times the standard deviation (Range ≈ ±4s). For Summer desert temperatures reaching about 130° F and Winter Antarctic temperatures reaching -120° F, that gives Earth an annual range in temperature of at least 250° F; thus, an estimated standard deviation of about 31° F! Because deserts and the polar regions are so poorly monitored, it is likely that the range (and thus the standard deviation) is larger than my assumptions. One should intuitively suspect that since few of the global measurements are close to the average, the standard deviation for the average is high! Yet, global annual anomalies are commonly reported with significant figures to the right of the decimal point. Averaging the annual high temperatures separately from the annual lows would considerably reduce the estimated standard deviation, but it still would not justify the precision that is reported commonly. This estimated standard deviation is probably telling us more about the frequency distribution of temperatures than the precision with which the mean is known. It says that probably a little more than 2/3rds of the recorded surface temperatures are between -26. and +36.° F. Because the median of this range is 5.0° F, and the generally accepted mean global temperature is about 59° F, it suggests that there is a long tail on the distribution, biasing the estimate of the median to a lower temperature.
Summary
In summary, there are numerous data handling practices, which climatologists generally ignore, that seriously compromise the veracity of the claims of record average-temperatures, and are reflective of poor science. The statistical significance of temperature differences with 3 or even 2 significant figures to the right of the decimal point is highly questionable. One is not justified in using the approach of calculating the Standard Error of the Mean to improve precision, by removing random errors, because there is no fixed, single value that random errors cluster about. The global average is a hypothetical construct that doesn’t exist in Nature. Instead, temperatures are changing, creating variable, systematic-like errors. Real scientists are concerned about the magnitude and origin of the inevitable errors in their measurements.
References
Mendenhall, William, (1975), Introduction to probability and statistics, 4th ed.; Duxbury Press, North Scituate, MA, p. 41
Smirnoff, Michael V., (1961), Measurements for engineering and other surveys; Prentice Hall, Englewood Cliffs, NJ, p.181
Taylor, John R., (1982), An introduction to error analysis – the study of uncertainties in physical measurements; University Science Books, Mill Valley, CA, p.6
1Note: One cannot take a single measurement, add it to itself a hundred times, and then divide by 100 to claim an order of magnitude increase in precision. Similarly, if one has redundant measurements that don’t provide additional information regarding accuracy or dispersion, because of poor precision, then one isn’t justified in averaging them and claiming more precision. Imagine that one is tasked with measuring an object whose true length is 1.0001 meters, and all that one has is a meter stick. No amount of measuring and re-measuring with the meter stick is going to resolve that 1/10th of a millimeter.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
Classic misdirection. Alarmists claim “hottest year on record” and go crazy over the fact. Skeptics claim that the alarmist claim is stupid (which it is) because of the lack of precision of the determination in the first place. So skeptics “prove” the point. Who wins?
Its the alarmists who win. Because while the skeptics are bending over backwards to prove that the alarmists are “wrong”, the majority of the public can clearly see that 2016 is darn near the hottest on record (even if not precisely) and certainly since 2000 has been much hotter than in the 1950s regardless of the lack of precision. In the meantime – nothing is being mentioned about the significant issue that there is no reliable attribution that the general increase in temperatures (precise or not) is related to human-caused CO2 emissions.
Good post. You descibe proper treatment of accuracy and precision error where there is data.
The bigger uncertainty problem is that there are large swaths of land and ocean where there is no data at all prior to satellites commencing Dec 1978. So the global suface anomaly is largely an invented construct, not fit for purpose. And as shown by simple comparison of previous GAST ‘official’ estimates, both NOAA and NASA have significantl cooled the past and sometimes warmed the present.
Thank you Rud,
I could have gone into the issues about sampling protocol, but at over 2500 words I was already concerned about people complaining about falling asleep while reading it. I was just taking umbrage at NASA and NOAA reporting anomalies with two, three, and even four significant figures beyond what the instrumentation reports. More egregious is that even if precision were to increase with sampling size, then it implies that there is a minimum number of samples that have to be taken before the subtraction step won’t truncate anomalies. That is, I couldn’t report today’s temperatures as an anomaly because there hadn’t been enough additional precision accumulated. The bottom line is that I don’t think that those analyzing the data have carefully examined their assumptions and justified their methodology.
My question with all this has been “How do you come up with the significant digits to use?”
If you average a million samples measured to the tenths of a degree and get a “calculator figure” of 10.643, your mean is still given in tenths: 10.6 +/- 0.05. If you use those million samples to improve on the accuracy of the mean, and get an uncertainty of +/- 0.0005, does the mean become 10.600 +/- 0.0005, or do you get to use the “calculator figure” of 10.643 +/- 0.0005?
Understand frustration that the public debate often seems misdirected, misinformed, and pointless. But it is not. Challenging and staying in the fight has brought us to a healthy inflection point. Far from being settled the science will now be debated without the heavy hand of government on the scales.
It ain’t over. It has just started in earnest.
Forrest,
I suspect that you are right that they don’t know what they don’t know. That was one of the motivations for me to write the article. Most of the practicing scientists are young enough to have been my former students, or their children.
I haven’t heard anyone cover it “properly”. Probably my biggest concern about the way the temperature data is handled is the way they try to stitch the record into continuous records. The problem is that this gives an illusion of precision where there is none. It is all done with good intentions, or at least, I think that’s why they started it. The break-point alignment hides the very real station move uncertainty without ever actually dealing with it. Creation of virtual stations based on these methods creates additional “certainty”, Ironically by introducing something that doesn’t even exist.
And stitching together the record the way they do does something rather interesting. Note, these steps might not all be in the right order but their impact is the same in the end.
STEP 1: Normal maintenance (and most of this already happened long ago), the stations are moved, usually because of encroachment of urban influences, which of course leads to pronounced Urban Heat Island impacts. Sometimes this happens more than once in a region.
STEP 2: Processing, in the attempt to make the now broken record continuous, they perform break-point alignment. The assumption is that the data is accurate, but it is skewed. Because it is normally to adjust for urbanization, the past temperature alignment results in cooling of the past, bringing the hot UHI end of the record up to the cooler UHI free temperature. Often urbanization begins anew tainting the record more.
STEP 3: They now officially adjust for UHI all at once. They pat themselves on the back because they have good correlation with the raw data. But in reality the only thing the UHI adjustment has done is remove most (but not all) of the accumulating error from break point alignment. The UHI is still there, hidden by overly complicated processes.
The reality is that the urbanization history is too difficult to factor in. The closest thing we could do to reality is calculate daily temperatures from whatever thermometers are available and perform the same spacial processing to account for all the holes. When we were done we’d have a much less precise “product” with a known warming bias. And likely could not say with ANY certainty if it was warmer than the last warming period ending in the mid 1940s.
They also assume that urban stations are better quality because there are fewer gaps in the urban record. So they adjust rural stations to better match the “good” stations.
Then, because unadjusted SSTs don’t match the phony land “data”, they boost the sea surface “data”.
The whole book-cooking, criminal enterprise is corrupt and corrupted.
Don’t get me started on the idiocy of trying to combine air temperature readings with the so called Sea Surface Temperature measurements.
First off, that’s a real apples and oranges comparison.
Beyond that, the very definition of “sea surface” has changed over time as first it was measured with canvas then metal buckets. Then it was measured from sea water intakes at a completely unrecorded and constantly changing depth.
Well, Forrest already mentioned the data diddling and fiddling and, well you get the idea.
Then there is this:
“What’s in that MMTS Beehive Anyway?”
And they want us to trust false pretense of reliable numbers!?
No calibration after install.
No validation after install.
Zero regular checks for temperature error.
No certification for accuracy.
Professional measurement equipment is calibrated and certified regularly.
Gas pumps are calibrated and certified regularly.
Weight scales in grocery stores right up to truck weigh stations are calibrated and certified annually.
Elevators are inspected and certified annually.
Escalators are inspected and certified annually.
Ad infinitum
Yet none of these devices are elevated to global attention and advocacy!
Why is anyone listening to the freaks in the alarmist government cells!?
Let’s put them in some other cells and see how long before they turn state’s witness.
Reblogged this on Climate Collections and commented:
Summary
In summary, there are numerous data handling practices, which climatologists generally ignore, that seriously compromise the veracity of the claims of record average-temperatures, and are reflective of poor science. The statistical significance of temperature differences with 3 or even 2 significant figures to the right of the decimal point is highly questionable. One is not justified in using the approach of calculating the Standard Error of the Mean to improve precision, by removing random errors, because there is no fixed, single value that random errors cluster about. The global average is a hypothetical construct that doesn’t exist in Nature. Instead, temperatures are changing, creating variable, systematic-like errors. Real scientists are concerned about the magnitude and origin of the inevitable errors in their measurements.
Hifast,
So now I will be getting arrows in my back that I don’t even know where they are coming from! 🙂
Perhaps someone is familiar enough with the datasets to answer a couple of questions I have had about the average of global surface temperatures. One of the first things I do with some new data is look at the raw data, before any manipulation occurs and get a sense for how it is distributed.
1. If we wanted the average of global temperatures at a specific time, presumably half of the globe would be in darkness, the other half in daytime. It seems that if one wants to get at something thermodynamic (which we know an average temperature is not) we should at least try to get simultaneous measurements of surface temperature, which would at least be representative of the globe at a particular time. It seems taking readings at a particular local daytime over the globe, and even extrapolating them to some standard local time, is designed to maximize the average of the numbers. Subtle changes in this process over time could further insert trends in anomalies simply due to the averaging changes. Perhaps the local differences max-min should be tracked instead and the midpoint or some time average used to be consistent.
2. I have often wondered what the distribution of global temperature readings looks like. By this I mean simply rank ordering them and plotting the cumulative distribution or something fancier like a q-q plot. Using anomalies would not work since they are (I think) constructed from averages themselves and are not raw temperatures. Again, the issue of the time of measurement and method of choosing that time or extrapolating would arise, but if the data was available possible effects of such differences on the distribution could be viewed, it would be interesting just to look at the distribution, and be able to look at the distribution of measurements at the same universal time, the same local time, and something constructed from the min-maxes. Looking at that distribution, it would be interesting to examine the difference between a simple average and a median, which tends to be more robust to changes in extreme values, or the time behavior of the quantiles, which can say something a little deeper about the underlying distribution. Also, such quantities could be consistently evaluated over time without having to change any averaging techniques. If the distribution turns out to be bimodal or multimodal, that would be of interest as well and would suggest some things about possible biases in computed averages.
Does anyone know if this has been done somewhere?
“After more than 200 years of tradition, Armagh Observatory is looking at moving to automation. ”
http://www.bbc.co.uk/news/uk-northern-ireland-39564125
Thought it might be of interest.
Clyde, an excellent essay on the problem with use of the temperature record to draw conclusions on the ranking of “warmest years”. I’ve remarked on other threads, that if our purpose is to detect warming or cooling over time, we would be better off with a dozen or two high quality thermometer sites with 2 or three closely placed thermometers in pristine clear locations away from volcanoes etc. Collect data and be patient. Had we set out with this project in the 1970s when concerns were broadly expressed that we were headed for an ice age, we’d be over 40yrs into the plan.
To improve precision, we could have located half the thermometers north of 70 Lat where we’ve learned that a three times amplification of temperatures occurs in a warming world.
I think that our best bet now is to go with satellites designed for purpose. It doesn’t matter what the global avg temperature really is if we are looking for an early warning set up. Moreover, given your issues re precision, the unadjusted records for decently located sites with more than 75yrs records would serve to do the job. An analogy concerning this issue is that if sea level is going to rise a couple or more meters in a century as worriers believe, it makes no sense to be measuring it with a micrometer – a foot rule will do.
Gary,
I think that rather than fewer, we need more thermometers. From what I can tell, the Earth isn’t warming uniformly. Any year there are high and low anomalies. The only way we can be sure that we are capturing the trend is to have a proper sampling protocol. Trying to use airport thermometers, which were designed to let a pilot know if he was going to be able to become airborne or if he was going to encounter icing, doesn’t cut it for climatology.
Absolutely. A million standard thermometers at the same height above AGL, or one for every ~195 square miles of earth’s surface. Those moored in the ocean might be expensive hazards to navigation, but invaluable in recording the regions of the planet in which the gatekeeping book-cookers make up their most imaginative flights of fancy.
I’d prefer to have one for every square mile. Even that is probably to few to get a decent spatial accuracy.
Mark,
Of course more are better, with continuous readings, but one per sq mi IMO might pose a hazard to navigation at sea, or at least risk the destruction of valuable scientific apparatus by merchant vessels.
One per sq mi would mean almost 200 million stations.
To get a true reading of the atmosphere, the sensor network needs to extend vertically as well as horizontally.
What evidence do you have Clyde, when you say, “The global average is a hypothetical construct that doesn’t exist in Nature.” Is this an axiom you take on faith?
To come up with an accurate “average temperature” you would have to measure the energy contained in every molecule of the atmosphere at the same instant in time.
This can’t be done. The best you can do is take samples distributed in space and as close to the same time as you can manage.
In reality, you aren’t measuring the “temperature” the atmosphere, instead you are measuring the temperature of discrete points and making the assumption that the temperature of the points not being measured is close enough to the points that are being measured that the difference won’t matter.
This is another reason why the claims of 0.01 or even 0.001 C accuracy are absurd. The differences between your sensor and any spot within 100 feet, much less 100 miles is going to be orders of magnitude greater than that. There is no way to plot what those differences are. So you have to include a reasonable variance to account for what you can’t know.
MarkW, you are confusing the measurement of a single temperature with the estimation of a value of a population (i.e. sampling)
MarkW on April 12, 2017 at 12:16 pm
In reality, you aren’t measuring the “temperature” the atmosphere, instead you are measuring the temperature of discrete points and making the assumption that the temperature of the points not being measured is close enough to the points that are being measured that the difference won’t matter.
Sounds correct!
But my (very little) experience in processing temperature time series has teached me that the results of averaging processes based on far less points than I thought to be needed, are much nearer to averaging all points available than I ever had imagined.
Let us please consider UAH’s satellite temperature measurement record, which is available as a 2.5° grid over the planet (three southernmost and northernmost latitude zones excluded).
If instead of averaging all 66 x 144 = 9,504 cells, you select only 512 evenly distributed ones, you obtain a temperature series which here and there well does differ from the the full average, sometimes quite heavily.
But if now you build, over the two times series, 60 month running means (which in fact are for us of far bigger interest than single monthly anomalies) you see that they remarkably fit each to another:
http://fs5.directupload.net/images/170412/pj5cowvx.jpg
Conversely, there is few hope that you will obtain, for the global average, a result for running means more accurate than that obtained with the 9,504 cells when moving e.g. to a 1° grid.
The differences you will rather experience in small latitude or regional zones.
Michael,
If I were to ask you to determine some intrinsic property of a steel ball, such as its conductivity or coefficient of elasticity, would it make any difference where or when you measured the property? On the other hand, temperature varies with time, location, and elevation. It is, at best, something that is used to estimate the amount of heat energy, although that is rarely acknowledged.
Clyde, you correctly identified the global average as a “hypothetical construct.” You have failed to prove it does not exist. The number “7” is a hypothetical construct also, but you can neither prove it exists, nor can you prove it does not.
Michael,
Our system of counting and mathematical manipulation requires that the number that we have chosen to call “7” exist. It is fundamental to mathematics, as are all numbers. On the other hand, if I speculate that there is some approximation to the “speed of dark,” that number doesn’t necessarily exist, except in my imagination. Implicit in the acceptance of the idea that there is an average global temperature for a specified period of time, it assumes that there is a standard elevation for that temperature and it can be measured to infinite precision if we make an infinite number of measurements. There is no accepted standard elevation for temperature measurements (except with respect to the ground), and we can’t make an infinite number of measurements. What we are left with is an approximation to a hypothetical value. The question becomes, “To what purpose do we calculate said average, and what precision can we justify?” I maintain that we can know the defined average global temperature only very imprecisely, and with unknown accuracy.
Clyde, you blew it….”7″ does not exist, here’s a simple test…..point it out to me. You can’t. You can’t point to “7.” The symbol on a piece of paper or on your screen is not “7” its a representation of the construct. You can line up a bunch of things then attempt to associate a verbal sound (like “one” or “too” or “twee”) to collections of the things, but again, the verbal sound is a representation of a construct.
The point I was making is that “hypothetical constructs” do not exist, they are figments of our minds. The procedure for measuring the “average global temperature” is just that, a set of tasks one completes to arrive at some result that BY DEFINITION OF THE PROCEDURE” is the average global temperature.
Now, as to you question of “purpose” it’s pretty simple. Once you have the procedure defined, you repeat said procedure, and low and behold you discover that as time goes on the measurement you get is slowly rising.
Michael,
I’m not going to go down that semantics rabbit hole.
Good for you Clyde!
Here come the sophists…”Hey, look at that squirrel!”
Sad how the warmistas have been reduced to arguing over the meaning of words.
Both imprecision and inaccuracy are sources of error in measurements, but it is nigh impossible to determine from error analysis of a set of measurements (‘deconvolution’) which problem contributes how much to the total error. The situation is similar when one tries to consider the relative contributions of various factors to allegedly observed changes in global temperature.
The temperature problem is far more complicated because it requires ‘meta-analysis’ – the aggregation of data from various sources which are not homogeneous. The use of a non-comprehensive set of data aggregated from different types of measurements performed by different methods and protocols under different and often non-compatible circumstances totally fails the requirement for data homogeneity.
For example, the variances from each individual instrumental record, properly handled, should be added to obtain an overall variance for the aggregate.
Any attempt to synthesize a common result from a combination of disparate data sets such as urban, rural, marine, aerial, and satellite data sets becomes simply an exercise in arithmetic, and any attempt to assign significance to the composite is a self-deluding fantasy.
tadchem,
Yes, what you said! 🙂
Clyde. Thanks for posting this. It seems well done and nothing is obviously wrong. I have decided that I’m lousy at statistics and that an awful lot of folks are even worse. Moreover, I’m not sure I care how many standard errors can dance on the head of a pin, even if I truly understood how to do the math properly. So I’ll forego commenting on the comments.
But I would point out that global surface temperature as we currently define looks to be a truly unfortunate metric no matter what the precision/accuracy. It is very sensitive to ENSO excursions of warm water into the Eastern Pacific and also to poorly known past sea surface temperatures. IMHO, “Climate Science” really should consider replacing the current metric with something/anything that looks to meaningfully track long term warming/cooling of the planet
Don K,
I don’t claim to be good at statistics either. I had to go back to my text books and review material I had studied decades ago. Basically, I’m claiming that the Standard Deviation, and not the Standard Error of the Mean, is the appropriate metric for the uncertainty in global temperature measurements, and the anomalies derived from them.
“Basically, I’m claiming that the Standard Deviation, and not the Standard Error of the Mean, is the appropriate metric for the uncertainty in global temperature measurements”
Yeah … maybe. As I understand it (and I’m probably wrong), Standard Deviation is a measure of the dispersion of the data whereas Standard Error is the dispersion in the estimate of the arithmetic mean of the observations — How likely is it that a given observation is valid vs how likely would it be for an independent set of observations of the same system over the same timespan to yield the same result?
All of these issues are why my interest is in the explicit , as calculable as pi , physical audit trail between all the parameters we measure . As I most recently put it at http://cosy.com/#PlanetaryPhysics , I’m
I only got thru the implementation After a handful of APL expressions computing the mean temperature of a gray sphere surrounded by a sphere with an arbitrary radiant temperature map , which given the parameters of the Sun and our orbit gives a temperature of about 278.6 +-2.3 from peri- to ap-helion .
Even that non-optional computation is extremely poorly understood .
I have yet to have anyone either say “yes of course” , or offer an alternative algorithm , or an experiment test of the extension of the computation to arbitrary object
absorption=emissionspectrum presented at http://cosy.com/Science/warm.htm#EqTempEq .This field desperately needs to return to the classical experientially experimentally quantitative abstractions of its basis in applied physics .
We need YouTubes reasserting these quantitative realities with the simple brilliance of Rictchie’s experiment a hundred & eighty some years ago .
http://cosy.com/Science/AGWpptRitchie_Kirchhoff.jpg
A good and interesting article. However, I am not sure where your definition of precision comes from.
From http://www.itl.nist.gov/div898/handbook/glossary.htm#precision
We have:-
precision:
in metrology, the variability of a measurement process around its average value. Precision is usually distinguished from accuracy, the variability of a measurement process around the true value. Precision, in turn, can be decomposed further into short term variation or repeatability, and long term variation, or reproducibility.
It has nothing to do with Resolution and significant figures.
http://www.itl.nist.gov/div898/handbook//mpc/section4/mpc451.htm
NIST says:
Resolution:
is the ability of the measurement system to detect and faithfully indicate small changes in the characteristic of the measurement result.
I my language, precision is defined as how close you achieve the same measured value, if you keep repeating the measurement e.g. take thermometer from fridge to cup of tea, measure, record, repeat.
If your recorded tea temps are very close, you have a precise thermometer.
In my language, resolution is defined as – what is the smallest change in value my measurement system can respond to eg if I add drops of boiling water to my cup of tea, it will slowly increase in temperature. A higher resolution thermometer will respond and indicate to, say, 0.1 degree change, whereas a lower resolution device will respond and indicate, to say, 0.5 degree change. Nothing to do with the number of digits.
Your second diagram, the 4 cross hairs, is spot on, you can have a very precise sensor that is very inaccurate. Many folks don’t get that at first.
Steve1984,
The formal definition of precision has been changed in recent years. Unfortunately, in my opinion, it is a defective definition because, unlike with the use of significant figures, it is difficult to know what the actual or implied precision is. Note, however, in the definition that you have provided, that as the precision increases, it will be necessary to increase the number of significant figures to convey the increase in precision. That is why my diagram showed a finer scale on the top row than on the bottom row.
Your definition of precision sounds to me like repeatability. I equate resolution with precision.
Indeed?
How very odd.
Why did you not start at the beginning with what is metrology?
That doesn’t sound like meteorology, now does it?
From Merriam-Webster
“Definition of metrology
1: the science of weights and measures or of measurement
2: a system of weights and measures”
Next at the link you provided:
It looks like NIST understands precision, measurement and fuzzy sloppy concepts.
Amazingly, this also from the same link, just a different chapter.
And from your link:
Are you sure you read those links?
“One cannot take a single measurement, add it to itself a hundred times, and then divide by 100 to claim an order of magnitude increase in precision.”
But one can take 100 different independent measurements of the same thing and divide the S.D. by the square of 100 to get the std error. If you make 1000 measurements, you get to divide by about 30. And you can be smaller than the graduations on the measuring device. It’s just elementary statistics, in the early chapters actually.
And the S.E. is what is what is actually being used in Student’s T test to come up with probabilities, not the S.D.
1000 sensors, at different point of the globe are not measuring the same thing. So you can’t average them to get a more accurate reading.
They are measuring the globe. That is one thing.
You might consider changing your handle to the more apropos ReallyGullible.
RS, no they aren’t. They are taking measurements while on the globe. Not the same thing at all.
If I took one measurement on Earth, and another measurement on Venus, could I average them? After all, it’s just one solar system.
MarkW on April 12, 2017 at 2:59 pm
10 sensors, at different points of the area around Berlin, Germany are not measuring the same thing as well. Different places with different character (oooooh, UHI here ‘n there, horrible), different elevations, etc.
But weather prediction software packages on Internet do very well average and interpolate them and get accurate results.
That is the reason why temperature, rainfall and wind are so pretty good predicted for the place where I live, though there is no weather station available.
MarkW, you are perfect in sophism.
Note the quote by Smirnoff (1961) about the limiting conditions for increasing precision.
You must mean Smirnov? I thought he only worked with non-parametric statistics, which does indeed have limitations.
ReallySkeptical,
Smirnov and Smirnoff come out of different bottles. Did you look at the references?
I posted note about the non-optional quantitative physics to get from the output of the Sun to our surface temperature and the desperate need for YouTube worthy experimental re-confirmation of the chain of relationships here about an hour ago . But it’s not shown up so I posted it at http://cosy.com/Science/warm.htm#comment-3253159973 .
We need experiments a brilliant as this :
http://cosy.com/Science/AGWpptRitchie_Kirchhoff.jpg
Yes, indeed.
OT I imagine but,
A system in which water is piped into iron radiators works very well for distributing heat from a furnace to each room of a house.
If those pipes were just bars of solid iron, not so much.
If the radiators were just pools of water with no iron skin, again, not so much.
Together…wondrously good system.
I grew up in a big old house in which the system was originally gravity driven, with huge pipes near the furnace that got smaller as they branched out the various zones and individual rooms.
So logical, and so efficient…and a hundred and fifty years later still works like a charm, although a small pump now pushes the water along since the 6″ iron pipes that converged into the original boiler have been replaced with smaller copper ones that fit the new furnace.
I spent part of my life in an old 1700’s house that had a coal converted badly to oil boiler.
All my radiator did was make noise. Never a change in temperature. I considered buying that house from my Father, but that old boiler cost too much to run.
Since the temperature distribution is not a normal bell curve, the Gaussian method should be used to calculate the standard deviation.
Gavin claims that only a very small sample of stations are needed for a valid GASTA result.
https://realclimatescience.com/2017/01/gavin-schmidt-explains-why-noaa-data-tampering-is-illegitimate/
Since he makes up most of the “data” anyway, sure, why not? If one station can represent a radius of 1200 km, why not have just one station for every 4.5 million square kilometers, ie 113 stations? IIRC, which I might not, he once suggested that 50 stations would suffice.
Small sample is needed. For any grammar gurus out there ready to pounce.
Gloteus,
You spoil all my fun! 🙂
Another factor, which probably doesn’t belong in this discussion, is possible confirmation bias in CAGW activist compliers and curators of the data which makes up the supposed historical instrumental data.
I think that discussion of that bias belongs in every discussion of global temperature data.
Tainted data makes any conclusion based on it worthless.
There are serious issues with virtually all GAST estimates that have little to do with the accuracy of thermometers or the precision of station averages. The most salient question is how representative are the available stations, overwhelmingly urban world-wide, of the area in which they are located? And, since temporal variability of climate is the key issue, how is that variability affected by constantly changing the set of stations? In other words, what is the ultimate reliability of UHI-corrupted data when used in piece-meal fashion to manufacture “global” time-series. Nobody, least of all the index-makers, pay serious attention to these pivotal issues.
You forgot to mention that until recent years RECORDING accuracy in USA was +/-0.5 deg.
CRU’s Dr Jones et al’s calculation of +/-0.001 deg accuracies for HADCRUT data only works for HOMOGENEOUS data – which global temperature is not.
dradb,
I did mention that formerly temperatures were only reported to the nearest degree, which implies an uncertainty of +/-0.5 deg.!
Back in a previous millennium when I took chemistry, we measured temperatures with large mercury filled thermometers. We could calibrate them in baths of ice water. We could use magnifiers and verniers to read them. We our readings as good as 0.1°?
Now Imagine that you are an observer in a remote base in the 19th century. Are your instruments calibrated? Did you have a magnifier or a vernier with which to read them? How did you see them at night? did you hold a candle next to the thermometer? If you decided to stay indoors and make up the numbers on a really bitter night, would anybody have known?
If you weren’t feeling good and asked your wife or one of the kids (none of whom have had training in how to take a reading) to go take the reading for you, would anyone have known?
How often did thermometers get taken somewhere for another purpose and laid down flat afterwards?
Then when picked up, the mercury was separated. What to do!?
Why bang the thermometer bottom with something to break the bubble and drain all the mercury down.
It doesn’t take many hits before thermometers start shifting in their metal band mountings.
The main issue in the building up of global average temperature anomaly curve, accuracy & precision are not the main issues but the extrapolation and interpolation of data over space and time under climate system and general circulation patterns is critical issue. In other words data network.
Dr. S. Jeevananda Reddy
Not to mention blatant “adjustments” designed to cook the books. Now NOAA’s minions and imps put their thumbs on the scales of the raw data. The whole process from top to bottom is corrupted. The ringleaders and senior perps need to go to jail and the willing accomplices be fired.
“It says that probably a little more than 2/3rds of the recorded surface temperatures are between -26. and +36.° ”
Perhaps I am missing something here, or maybe these numbers should be in degrees C, but this seems unlikely, given that 40% of the globe is between the latitudes of the tropics of Capricorn and Cancer, where, unless one is high up on a mountain, it never gets as cold as 36 F.
Is this quoted range and percentage verified? Over 67% of global surface temperature readings are 36 F or lower?
Again, huge parts of the globe never get that cold…ever. Over half, conservatively. And another large part rarely gets that cold for more than a brief and occasional interval.
Menicholas,
What you missed is that I said that the median value, which would equal the mean if the distribution were symmetrical, was far enough below the commonly accepted mean to strongly suggest a long tail on the cold side. That is, you are correct that high temperatures are going to be more common in the real world.
Clyde,
Thank you for your reply.
I understand the long tail, and agree.
My only question is regarding that one single sentence…it just does not seem possible to me, if it refers to all measurements taken at all places and at all times of year.
Outside of polar regions, and in the places where most all of us live, temps are higher than that for at least half of the year…even at night.
Are they not?
I do not wish to be disagreeable, as I very much enjoyed your article.
I think the apparent discrepancy lies in the word “recorded.” The majority of recorded temperatures are from locations in the temperate zone of the northern hemisphere.
Right.
Temperate zones are known for being temperate, for much of each year. Temps near freezing are rare for many months of the year outside of the polar regions.
Menicholas,
Then let me rephrase my statement. IF the mean were 5.0 deg F, then one would expect that 68% of the readings would lie between -26 and +36 deg F, with a SD of 31. Knowing that the actual mean is closer to 59 deg F, it is evidence that the distribution is strongly skewed.