The Metrology of Thermometers

For those that don’t notice, this is about metrology, not meteorology, though meteorology uses the final product. Metrology is the science of measurement.

Since we had this recent paper from Pat Frank that deals with the inherent uncertainty of temperature measurement, establishing a new minimum uncertainty value of ±0.46 C for the instrumental surface temperature record, I thought it valuable to review the uncertainty associated with the act of temperature measurement itself.

As many of you know, the Stevenson Screen aka Cotton Region Shelter (CRS), such as the one below, houses a Tmax and Tmin recording mercury and alcohol thermometer.

Hanksville_looking_north — Hanksville, UT USHCN climate monitoring station with Stevenson Screen - sited over a gravestone. Photo by surfacestations.org volunteer Juan Slayton

They look like this inside the screen:

CRS_Max-Min_thermometers — NOAA standard issue max-min recording thermometers, USHCN station in Orland, CA - Photo: A. Watts

Reading these thermometers would seem to be a simple task. However, that’s not quite the case. Adding to the statistical uncertainty derived by Pat Frank, as we see below in this guest re-post, measurement uncertainty both in the long and short term is also an issue.The following appeared on the blog “Mark’s View”, and I am reprinting it here in full with permission from the author. There are some enlightening things to learn about the simple act of reading a liquid in glass (LIG) thermometer that I didn’t know as well as some long term issues (like the hardening of the glass) that have values about as large as the climate change signal for the last 100 years ~0.7°C – Anthony

==========================================================

Metrology – A guest re-post by Mark of Mark’s View

This post is actually about the poor quality and processing of historical climatic temperature records rather than metrology.

My main points are that in climatology many important factors that are accounted for in other areas of science and engineering are completely ignored by many scientists:

Human Errors in accuracy and resolution of historical data are ignored
Mechanical thermometer resolution is ignored
Electronic gauge calibration is ignored
Mechanical and Electronic temperature gauge accuracy is ignored
Hysteresis in modern data acquisition is ignored
Conversion from Degrees F to Degrees C introduces false resolution into data.

Metrology is the science of measurement, embracing both experimental and theoretical determinations at any level of uncertainty in any field of science and technology. Believe it or not, the metrology of temperature measurement is complex.

It is actually quite difficult to measure things accurately, yet most people just assume that information they are given is “spot on”. A significant number of scientists and mathematicians also do not seem to realise how the data they are working with is often not very accurate. Over the years as part of my job I have read dozens of papers based on pressure and temperature records where no reference is made to the instruments used to acquire the data, or their calibration history. The result is that many scientists frequently reach incorrect conclusions about their experiments and data because the do not take into account the accuracy and resolution of their data. (It seems this is especially true in the area of climatology.)

Do you have a thermometer stuck to your kitchen window so you can see how warm it is outside?

Let’s say you glance at this thermometer and it indicates about 31 degrees centigrade. If it is a mercury or alcohol thermometer you may have to squint to read the scale. If the scale is marked in 1c steps (which is very common), then you probably cannot extrapolate between the scale markers.

This means that this particular thermometer’s resolution is1c, which is normally stated as plus or minus 0.5c (+/- 0.5c)

This example of resolution is where observing the temperature is under perfect conditions, and you have been properly trained to read a thermometer. In reality you might glance at the thermometer or you might have to use a flash-light to look at it, or it may be covered in a dusting of snow, rain, etc. Mercury forms a pronounced meniscus in a thermometer that can exceed 1c and many observers incorrectly observe the temperature as the base of the meniscus rather than it’s peak: ( this picture shows an alcohol meniscus, a mercury meniscus bulges upward rather than down)

Another major common error in reading a thermometer is the parallax error.

Image courtesy of Surface meteorological instruments and measurement practices By G.P. Srivastava (with a mercury meniscus!) This is where refraction of light through the glass thermometer exaggerates any error caused by the eye not being level with the surface of the fluid in the thermometer.

(click on image to zoom)

If you are using data from 100’s of thermometers scattered over a wide area, with data being recorded by hand, by dozens of different people, the observational resolution should be reduced. In the oil industry it is common to accept an error margin of 2-4% when using manually acquired data for example.

As far as I am aware, historical raw multiple temperature data from weather stations has never attempted to account for observer error.

We should also consider the accuracy of the typical mercury and alcohol thermometers that have been in use for the last 120 years. Glass thermometers are calibrated by immersing them in ice/water at 0c and a steam bath at 100c. The scale is then divided equally into 100 divisions between zero and 100. However, a glass thermometer at 100c is longer than a thermometer at 0c. This means that the scale on the thermometer gives a false high reading at low temperatures (between 0 and 25c) and a false low reading at high temperatures (between 70 and 100c) This process is also followed with weather thermometers with a range of -20 to +50c

25 years ago, very accurate mercury thermometers used in labs (0.01c resolution) had a calibration chart/graph with them to convert observed temperature on the thermometer scale to actual temperature.

Temperature cycles in the glass bulb of a thermometer harden the glass and shrink over time, a 10 yr old -20 to +50c thermometer will give a false high reading of around 0.7c

Over time, repeated high temperature cycles cause alcohol thermometers to evaporate vapour into the vacuum at the top of the thermometer, creating false low temperature readings of up to 5c. (5.0c not 0.5 it’s not a typo…)

Electronic temperature sensors have been used more and more in the last 20 years for measuring environmental temperature. These also have their own resolution and accuracy problems. Electronic sensors suffer from drift and hysteresis and must be calibrated annually to be accurate, yet most weather station temp sensors are NEVER calibrated after they have been installed. drift is where the recorder temp increases steadily or decreases steadily, even when the real temp is static and is a fundamental characteristic of all electronic devices.

Drift, is where a recording error gradually gets larger and larger over time- this is a quantum mechanics effect in the metal parts of the temperature sensor that cannot be compensated for typical drift of a -100c to+100c electronic thermometer is about 1c per year! and the sensor must be recalibrated annually to fix this error.

Hysteresis is a common problem as well- this is where increasing temperature has a different mechanical affect on the thermometer compared to decreasing temperature, so for example if the ambient temperature increases by 1.05c, the thermometer reads an increase on 1c, but when the ambient temperature drops by 1.05c, the same thermometer records a drop of 1.1c. (this is a VERY common problem in metrology)

Here is a typical food temperature sensor behaviour compared to a calibrated thermometer without even considering sensor drift: Thermometer Calibration depending on the measured temperature in this high accuracy gauge, the offset is from -.8 to +1c

But on top of these issues, the people who make these thermometers and weather stations state clearly the accuracy of their instruments, yet scientists ignore them! a -20c to +50c mercury thermometer packaging will state the accuracy of the instrument is +/-0.75c for example, yet frequently this information is not incorporated into statistical calculations used in climatology.

Finally we get to the infamous conversion of Degrees Fahrenheit to Degrees Centigrade. Until the 1960’s almost all global temperatures were measured in Fahrenheit. Nowadays all the proper scientists use Centigrade. So, all old data is routinely converted to Centigrade. take the original temperature, minus 32 times 5 divided by 9.

C= ((F-32) x 5)/9

example- original reading from 1950 data file is 60F. This data was eyeballed by the local weatherman and written into his tallybook. 50 years later a scientist takes this figure and converts it to centigrade:

60-32 =28

28×5=140

140/9= 15.55555556

This is usually (incorrectly) rounded to two decimal places =: 15.55c without any explanation as to why this level of resolution has been selected.

The correct mathematical method of handling this issue of resolution is to look at the original resolution of the recorded data. Typically old Fahrenheit data was recorded in increments of 2 degrees F, eg 60, 62, 64, 66, 68,70. very rarely on old data sheets do you see 61, 63 etc (although 65 is slightly more common)

If the original resolution was 2 degrees F, the resolution used for the same data converted to Centigrade should be 1.1c.

Therefore mathematically :

60F=16C

61F17C

62F=17C

etc

In conclusion, when interpreting historical environmental temperature records one must account for errors of accuracy built into the thermometer and errors of resolution built into the instrument as well as errors of observation and recording of the temperature.

In a high quality glass environmental thermometer manufactured in 1960, the accuracy would be +/- 1.4F. (2% of range)

The resolution of an astute and dedicated observer would be around +/-1F.

Therefore the total error margin of all observed weather station temperatures would be a minimum of +/-2.5F, or +/-1.30c…

===============================================================

UPDATE: This comment below from Willis Eschenbach, spurred by Steven Mosher, is insightful, so I’ve decided to add it to the main body – Anthony

===============================================================

Willis Eschenbach says:

As Steve Mosher has pointed out, if the errors are random normal, or if they are “offset” errors (e.g. the whole record is warm by 1°), increasing the number of observations helps reduce the size of the error. All that matters are things that cause a “bias”, a trend in the measurements. There are some caveats, however.

First, instrument replacement can certainly introduce a trend, as can site relocation.

Second, some changes have hidden bias. The short maximum length of the wiring connecting the electronic sensors introduced in the late 20th century moved a host of Stevenson Screens much closer to inhabited structures. As Anthony’s study showed, this has had an effect on trends that I think is still not properly accounted for, and certainly wasn’t expected at the time.

Third, in lovely recursiveness, there is a limit on the law of large numbers as it applies to measurements. A hundred thousand people measuring the width of a hair by eye, armed only with a ruler measured in mm, won’t do much better than a few dozen people doing the same thing. So you need to be a little careful about saying problems will be fixed by large amounts of data.

Fourth, if the errors are not random normal, your assumption that everything averages out may (I emphasize may) be in trouble. And unfortunately, in the real world, things are rarely that nice. If you send 50 guys out to do a job, there will be errors. But these errors will NOT tend to cluster around zero. They will tend to cluster around the easiest or most probable mistakes, and thus the errors will not be symmetrical.

Fifth, the law of large numbers (as I understand it) refers to either a large number of measurements made of an unchanging variable (say hair width or the throw of dice) at any time, or it refers to a large number of measurements of a changing variable (say vehicle speed) at the same time. However, when you start applying it to a large number of measurements of different variables (local temperatures), at different times, at different locations, you are stretching the limits …

Sixth, the method usually used for ascribing uncertainty to a linear trend does not include any adjustment for known uncertainties in the data points themselves. I see this as a very large problem affecting all calculation of trends. All that are ever given are the statistical error in the trend, not the real error, which perforce much be larger.

Seventh, there are hidden biases. I have read (but haven’t been able to verify) that under Soviet rule, cities in Siberia received government funds and fuel based on how cold it was. Makes sense, when it’s cold you have to heat more, takes money and fuel. But of course, everyone knew that, so subtracting a few degrees from the winter temperatures became standard practice …

My own bozo cowboy rule of thumb? I hold that in the real world, you can gain maybe an order of magnitude by repeat measurements, but not much beyond that, absent special circumstances. This is because despite global efforts to kill him, Murphy still lives, and so no matter how much we’d like it to work out perfectly, errors won’t be normal, and biases won’t cancel, and crucial data will be missing, and a thermometer will be broken and the new one reads higher, and …

Finally, I would back Steven Mosher to the hilt when he tells people to generate some pseudo-data, add some random numbers, and see what comes out. I find that actually giving things a try is often far better than profound and erudite discussion, no matter how learned.

5 2 votes

Article Rating

240 Comments

Inline Feedbacks

View all comments

AJStrata

January 23, 2011 7:20 am

I hate to bust some long held beliefs, but more observations will NOT increase accuracy – not in this case. What Willis and others are discussing are an increased number of measurements of the same condition – close in time and space and general conditions. This is not the case for daily temp readings over the last century.
If I am measuring a mountain, yes the more measurements we take the more error is reduced. Each temp measurement has the same error bars each time unless you combine measurements from the same locale and time. There is no removal of error on a sparsely measured phenomena with high dynamics.
A quick and easy example is satellite orbit measurement. I can take 20 measurements in an hour from one source and remove a lot of the error because the forces that disturb the orbit do not act on this time scale. Or I can take 20 measurements over an hour from a couple of different sources and really drive out error.
But if I take 20 measurements over 20 days I do nothing to reduce the error in the computed orbit. At this cadence the system is changing due to inherent forces and I can no longer distinguish system changes from measurement error (or uncertainty or precision – whatever you favorite version is).
It is the spacial and temporal density of the measurements which drive out error, not the number.
Sorry folks, but that is the reality. Anthony, as you know I have been preaching from the error bar altar for years. This post cracked open the entire mathematical foundation of AGW and shown it to be shoddy and wrong. There is no reduction of error from daily measurements of a highly changing combination of forces (daily local temperature, long term global climate). Therefore the local temp error discussed in this post is the BEST one would ever get from the data taken in this manner.
On a global scale it will be even worse – 3-5°C, as I have predicted for a long, long time. Therefore the .7°C/century ‘rise’ in global temp is a statistical ghost, not a reality. Could be lower, could be higher – we don’t have the data to know.

Jose Suro

January 23, 2011 7:53 am

Great post Mark. Everything you’ve written is spot on. Measurements are all about accuracy and precision, and when computing means, bias and calibration.
With the (lack of) precision and accuracy of current and past weather instruments being well known, you cannot confidently express measurement results in a notation that goes beyond the precision of the instrument used to compile the results – Period. Those “.0#” temperature results are “falsely precise” and further, they falsely imply more accuracy in the results than is possible with the instrumentation.
Some say they can “Sigma” the data to death, but in the absence of a precise reference value the bias is unknown, calibration is not possible and therefore, those results are also not beyond suspicion. Those that do not agree with you are deluding themselves.
Best,
Jose

Peter

January 23, 2011 9:04 am

Elaborating on the arguments Mark T has already put forward, and putting things in a slightly different way to make them more readily understandable by us ‘engineering types’:
In any measurement system you have:
1) High-frequency noise – noise which is well above the frequency of the signal you’re trying to measure, and is the only type of noise which is relatively easily filtered out by averaging loads of measurements
2) Low-frequency noise – noise which is well below the frequency of the signal you’re trying to measure. This type of noise doesn’t really apply to temperature measurements, so can be ignored. (periods of thousands of years or more)
3) Pass-band noise – noise which is of similar frequency to the signal you’re trying to measure. This is the most problematic of all, as there are no good ways removing it without also degrading the signal. Most of the long-term errors, such as drift, ageing, urban creep etc fall into this category, unfortunately.
4) Discontinuities – such as the unavoidable boundaries of the measurement period. Attempts to remove these introduces noise of a frequency well within the pass-band.

PaulMurphy

January 23, 2011 9:04 am

There is an additional complicating factor: measurement error introduced by framing.
That’s a pun – intentional since expectation plays a big role in what the individual reporter sees and writes down, but the particular frame I have in mind right now is the Stevenson Screen.
These are made of wood, and as the paint chips off become increasingly effective as humidity moderators – in the same way that a stone fireplace holds heat long after the fire goes out, the wood retains humidity (positive and negative relative to the surrounding air) for some time after ambient conditions change.
As a result measurements taken inside a Stevenson Screen right after a rain squall tend to over estimate relative to “real” ambient, while those taken a bit later under-estimate ambient.
This does not average out over time or multiple locations – the actual “average error” (a dubious term) for a particular location and period depends on how often it rains, during what time of day, for how long, at what start/end temperatures, the state of the paint, how the measuring devices are mounted, and wind direction relative to the mounting points inside the box, among other factors.
Bottom line? One measurement is a guess, one million measurements amount to one million guesses – and either way there’s absolutely nothing to support change hypotheses couched in partial degrees.

Alfred Burdett

January 23, 2011 9:14 am

Lazy Teenager said:
Re: Observer Preconception Bias (OPB)
“… the people who make the measurements likely do not have an opinion about climate change.
Afterall most of the measurements are historical and pre AGW.”
The first presumption is bizarre. Do you know anyone without an opinion on climate change?
As for most of the measurements being historical and therefore being free of OPB, that only confirms the potential for OPB to create a difference between recent and more remote past temperature records that is unrelated to any actual difference in temperature.

Peter Plail

January 23, 2011 1:49 pm

Mark T
I commend your patience at trying to extract any coherent signal from the noise being generated by certain “contributors” to this thread.
I also thank you for the information imparted. I have learnt a lot, even if others refuse to.

R.S.Brown

January 23, 2011 3:21 pm

Yep, there’s the infamous iron rake head and
antlers on top of the box to give us that warm
feeling.

Brian H

January 23, 2011 5:37 pm

SimonJ says:
January 23, 2011 at 2:08 am
“the (min-max)/2 figure (I really can’t bring myself to say average, and I can’t find a mathematical function name for it)”

The term is used in two senses, but how about “median”?

me·di·an
…
3.
Arithmetic, Statistics . the middle number in a given sequence of numbers,
…
5.
Also called midpoint. a vertical line that divides a histogram into two equal parts.

The second version, “midpoint”, seems to match what is being done.

EFS_Junior

January 23, 2011 6:09 pm

To continue on with all this IID and LLN foolishness;
Mark T says:
January 22, 2011 at 9:50 pm
EFS_Junior says:
January 22, 2011 at 7:30 pm
http://en.wikipedia.org/wiki/Iid
AFAIK all distributions of errors in temperature measurements are Gaussian with zero mean. Therefore, all Gaussian distributions have the same probability distribution (e. g. Gaussian).
Wow, and it continues.
First of all, not all Gaussian distributions have the same probability distribution. If they don’t have the same variance, or the same mean, then they are not identical.
_____________________________________________________________
IID does not state that the PDF’s have tha EXACT same identity (1:1 relationship);
“In signal processing and image processing the notion of transformation to IID implies two specifications, the “ID” (ID = identically distributed) part and the “I” (I = independent) part:
(ID) the signal level must be balanced on the time axis;
(I) the signal spectrum must be flattened, i.e. transformed by filtering (such as deconvolution) to a white signal (one where all frequencies are equally present).
So it would seem to me that this implies symmatry of the PDF about the mean (ID);
To test this I ran another numerical experiment, with variances of 100, 1, 0.01 and 0.0001 for four uniform distributions (RAND() in Excel 2010).
Note that the distributions are symetric about their mean, but the variances are NOT EQUAL!
These four time series were than averaged, and guess what? The resulting distribution was uniform, with sigma equal to the RMS sum of squares/N (e. g. SQRT (sigma1^2+sigma2^2+sigma3^2+sigma4^2)/N.
Therefore variance1.NE.variance2.NE.variance3.NG.variance4
Thus the IID requirement (variances must be equal, but actually you would need to claim that all statistical moments would need to be the same as per your spurious identity argument) as you stated is incorrect.
Therefore say bye-bye to IID!
Now on to LLN.
Continuing on with this 4th numerical experiment I also varied N as follows, 16 < N < 65536, so if 65536 (2^16) is a large number, is 16 a large number? Actually YES, statistically speaking.
The results were the same for all N, for my example the final sigma is 1/N (1/4 = 0.25) for all N, for a handful of trials. Even a single trial was almost always spot on.
Therefore, say bye-bye to LLN!
_____________________________________________________________
I should note, btw, that the error associated with the minimum gradation is actually uniformly distributed, not Gaussian. Duh.
If you can prove that all of the errors are a) Gaussian, b) with the same mean, and c) with the same variance, then I will believe you, btw.
_____________________________________________________________
See comment above, (C) is NOT a requirement (You also left out all statistical moments above N = 2, why?).
_____________________________________________________________
As it stands, based on some of your other comments, I’m guessing you would not even know where to begin. Hey, consult with Dave Springer and maybe the two of you can publish something.
Each station’s temperature measurements are independent from all other stations, if they were not, than the two stations would, in fact, be the same station.
Um, no, that is not what independence means. I’m not even sure how to tackle this one because you clearly do not have sufficient background to understand. Independence simply means that their distribution functions are independent of each other, i.e., F(x,y) = F(x)F(y) and likewise for densities f(x,y) = f(x)f(y), which is analogous to probabilities in which two events are independent if P(AB) = P(A)P(B).
_____________________________________________________________
Actually I do know what independence is, it's a statistical ASSUMPTION!
Kind of like assuming a Gaussian distribution, it's a statistical ASSUMPTION!
So for a temperature record, if I record; 0, 0, 0, 0, ad infinitum, perhaps they would not be considered statistically independent?
And so for a temperature record, if I record; RAND(), RAND(), RAND(), RAND(), ad infinitum, perhaps they would be considered statistically independent?
Methinks you are NOT the statistical "scholer" you claim to be. 🙂

jaymam

January 23, 2011 6:55 pm

Solomon Green:
“As a start it might be intresting to plot the trends in Tmax and Tmin separately.”
NIWA in NZ have downloadable figures over about 40 years for 9am temperatures and Tmin, Tmax and something undefined that they call Mean which is probably (Tmin + Tmax)/2.
Tmin in NZ has increased slightly, while 9am and Tmax and Mean has stayed about the same.
Here are the 9am and (Tmin + T max)/2 temperatures over nearly 40 years at Ruakura in NZ:
http://i49.tinypic.com/5n3zex.jpg
Tmin has increased over that time, almost certainly because of the UHI effect of the nearby city at night.
I suggest that 9am temperature is a better measure than (Tmin + Tmax)/2 considering that NIWA and others are unable to work out a true mean for a day since they don’t make dozens of readings per day.
And obviously we have to work with whatever readings have been made over many years.

jon shively

January 23, 2011 7:29 pm

For some time I have been troubled by the effort to glean future predictions of temperature from extrapolations of temperature data from multiple geographical regions and locations within a region. The tacit assumptions made by the “data manipulators” have been that the time-wise trends do not have any systematic errors and therefore everything averages out. The above discussion shows that the climate community did not attempt to correct the temperature data and assumed that each measurement was absolute or sufficiently precise that it could be considered as an absolute value. Had they done so they would have come to realize that the data are not of sufficient quality to construct a statistical correlation because the errors in the actual data do not reflect the total uncertainty. To use the analogy of measuring the width of a human hair with a ruler with one mm precision to observe the growth of the hair’s diameter with time. If the AGW scientists believe that the temperature tends manifest the effect of increased CO2 in the atmosphere from man made sources and that all other sources of warming can be ignored, they still have convince the scientific community that they have properly accounted for all the systematic errors in the temperature data identified by the many contributors above. They haven’t yet!

Steven Mosher

January 23, 2011 7:47 pm

jayman:
“Tmin in NZ has increased slightly, while 9am and Tmax and Mean has stayed about the same. ”
as predicted by AGW theory.
just sayin.

Steven Mosher

January 23, 2011 8:08 pm

Here is a test that you all might consider. Again, note that I stress the importance of actually doing some computational work. ( WUWT needs more guys like Willis )
Take a look at CRN.
http://www.ncdc.noaa.gov/crn/
That’s a pretty good reference point for well done accurate measurement.
You probably have 100+ stations with up to 8 years of data.
Ok. That’s your baseline for good measurement. ( triple redundant)
Now, I want you to create a model of bad data collection, with all the kinds of errors you are worried about. That is take the CRN as “truth” and then simulate the addition of all the errors you imagine.
For each of those 100 stations you will then have an ensemble of stations, and envelop of what “might be” if the measurements are as screwed as as you fear.
Then realize that every CRN is PAIRED with an old station, so you can actually go look and see how close those ‘bad’ stations are to a superb station.
You’ll find that the old stations track the superb stations quite well and that your error estimations are too wide. This is NOT to say that the error estimations of Jones are correct, they are too narrow, but by looking at 700 station years of data (from CRN) along with the old stations they are paired with you can actually put numbers on your doubts.

Steven Mosher

January 23, 2011 8:40 pm

Dallas Tisdale (No relation to Bob, er that Bob) says:
January 22, 2011 at 5:42 am (Edit)
Mosher has it summed up pretty well. The overall error for a single instrument would impact local records as has been seen quite a few times. As far as the global temperature record, not so much. Bias is the major concern when instrumentation is changed at a site or the site relocated.
Adjustments to the temperature record are more a problem for me. UHI adjustment is pretty much a joke. I still have a problem with the magnitude of the TOBS. It makes sense in trying to compare absolute temperatures (where the various errors do count) but not so much where anomalies are used for the global average temperature record. Perhaps Mosh would revived the TOBS adjustment debate.
#########
TOBS. Let me review the issues with TOBS.
1. the adjustment IS required. if you dont adjust for TOBS you have a corrupt ‘raw’ record. When the time of observation changes you will infect the record with a bias.
that bias depends upon local factors. the adjustment is made by doing long range
empirical studies.
2. All TOBS adjustments are EMPIRICAL models. The model needs to be developed from empirical data and properly validated.
3. every empirical model comes with an error of prediction. For example, if you have a
9AM observation you will be estimating the temp at midnight from this 9AM observation. That estimation comes with a SE of prediction. in the US the SE is
on the order of say .25C. Every site needs and gets its own model.
So since ive been around people have been complaining about TOBS. The complaint goes like this:
“TOBS raises the temperatures, therefore it is suspect.” this is wrong. And when you make this complaint no one in science will listen. However, here are the real problems that people MIGHT listen too.
1. we only have the TOBS calculations for the USA. the USA is 2% of the land. How did other countries do TOBS adjustments. Karl’s paper on TOBS only concerns itself with USA validation. The model is empirical. You cannot generalize it to use it in
Asia without doing a validation for Asia ( it has input parameters for lat/lon). SO, we need to ask the question about TOBS outside the USA.
2. The SE of prediction is larger than the instrument error. Consequently error budgets need to be calculated differently for a station that has been TOBS corrected versus on that has not been corrected.
So basically, you have two issues with TOBS. Both of them are valid concerns. First, where is the documentation for how TOBS is employed outside the US? and second, how is the error of prediction propagated.
Personally, I think that they claim “TOBS is wrong” is weak and climate scientists can rightly ignore unsupported claims like this. But, it’s harder to ignore questions.
1. How are records outside the US TOBS corrected?
2. Where are the publsihed validation studies ( karl covers the US only)
3. how is the SE of prediction handled.
And realize that with #3.. jones, hansen etc.. NONE of them account for an uncertainty due to adjustment. Sadly, most people focus their criticism on the wrong point.

Michael Moon

January 23, 2011 9:11 pm

Dave Springer has made another error even larger than his average error. He is 54. He reports that winters now are milder than when he was young. Dave, Dave, Dave, HOW OLD IS THE PLANET? Oh, is it four billion years old or so? Gosh those anecdotal records of the “Weather,” wherever he was, for a few decades, are the telling blow, reveals all.
Steven Mosher, “a phenomena”? Some problems with plurals. Not germane, sorry. “Significant digits” seems to be the issue here. Simply put, an instrument designed and manufactured to be accurate to a certain level, when recorded and analyzed to a finer level, gives MEANINGLESS results. The money was not spent to acquire data to that accuracy. Climate science, claims to be a science, based on proving that the global average temperature has risen a few tenths, of a degree C, in the last 130 years, or so, or something. Much of the data is extrapolated, as we have no thermometers in the Arctic, not enough covering the oceans, and not very many in the Southern Hemisphere. And yet, with so much Adjusting of Archives, and so much Extrapolating of Records, and a really Bizarre group of “computer models,” we are asked to accept poverty of energy so that a nebulous potential “Climate Disaster” won’t happen in a hundred or more years.
When Mr. Obama loses in a couple of years, the new President will be tasked with ensuring that such clowns get no grants…………

Steven Mosher

January 23, 2011 9:39 pm

Solomon Green.
You simply cannot do the test with one station.
Try 190 stations for 10 years. I pointed you at that data.
try CRN 100+ stations for 5+ years.
they do part of the calculation for you
http://www.ncdc.noaa.gov/crn/newmonthsummary?station_id=1008&yyyymm=201101&format=web
The (Tmax+Tmin)/2 is an estimator of tave.
Since 1845 we have known that this estimator has errors.
The essential thing to know is that as long as you dont change your estimator the
average trend bias will be zero.
Start here:
Kaemtz LF. 1845. A Complete Course of Meteorology.
Then GIYF
You’ll end up reading a whole bunch of stuff from agriculture. You’ll even find studies that compare all the methods of computing Tave and comparisons of the bias.
Suppose I measure your height in the morning in shoes. Call this a heel bias
Suppose I measure you at night in shoes. same heel bias.
What’s the trend in your height?
Suppose I measure your height in the morning in barefeet.
Suppose I measure you at night in shoes.
What’s the trend in your height?
As the GHCN documentation points out there are about 101 ways to estimate the mean. What matters MOST is that you use the same estimator WITHIN a stations history. Some stations give reading every 3 hours, some every hour, some once a day.
If your estimating trend, you keep the method the same and you dont introduce a trend bias.
When you have a trend bias to show, you’ll have a publishable work worthy of attention. Until then you have a reading assignment that starts in 1845.

AusieDan

January 23, 2011 11:30 pm

Thanks Mark,
My first attempt at a career was in metrology, but I was (justly) sacked after eighteen months because of my clumsiness.
However, I understood and still understand the principles well.
(After picking myself up, which took a while, I went on to success in another field).
Your post rings a bell with me.
You have highlighted yet another reason to doubt claims about “the hottest year evah”.
Good work.

Pat Frank

January 23, 2011 11:31 pm

EFS_Junior, “as the specious uncertainty paper claims, all sigmas have zero meas as all instances are shown as +/- meaning symmetric about a zero mean.”
That’s pretty funny, confusing the results of an empirical Gaussian fit with a claim of pure stationarity.
“AFAIK all distributions of errors in temperature measurements are Gaussian with zero mean. Therefore, all Gaussian distributions have the same probability distribution (e. g. Gaussian).”
That one’s pretty scary, coming from a climate scientist. Field calibration of surface station thermometers against precision standards show the distribution of errors can be very far from Gaussian and very far from symmetrical about the empirical mean. You really need to read the methodological literature.
+++++++++++++++++++++++++++++++++++++
Mark T, congratulations, you’re doing a great job.

Will

January 24, 2011 1:23 am

Steven Mosher says:
January 23, 2011 at 7:47 pm
“jayman:
“Tmin in NZ has increased slightly, while 9am and Tmax and Mean has stayed about the same. ”
as predicted by AGW theory.
just sayin.”
As I have already pointed out, the oldest temperature record in the world, The Central England Temperature Record, shows the last 15 years as a cooling trend.
http://c3headlines.typepad.com/.a/6a010536b58035970c0147e1680aac970b-pi
Does AGW theory predict that?
Just sayin.

Jessie

January 24, 2011 2:58 am

Hot diggity!
What a great post Anthony. I wanted to be in the room watching it all, catching the mud slings. And what a great paper; thanks to Pat Frank and then Mark for your well researched, written and informative article. Fantastic.
And these posters:
Ian W says: January 22, 2011 at 8:58 am
richard verney says: January 22, 2011 at 12:46 pm
Philip Shehan says: January 22, 2011 at 2:56 pm
David A. Evans says: January 22, 2011 at 4:05 pm
Alfred Burdett ………………. and
E.M.Smith says: January 23, 2011 at 2:02 am ………….. This article is profoundly important.
Michael Moon says: January 23, 2011 at 9:11 pm
It’s about sensitivity and specificity, and as I understand it – if the instrument chosen (that’s specificity) is the wrong one, then no amount of measuring will answer your hypothesis. If one states their hypothesis! And doesn’t need to take multiple ‘data’, if in fact it exosted, for the aim of overwhelming. Type I and/or Type II errors?
So I reckon, like Monckton partially wrote in The Australian (national newspaper) this weekend, that it might be sufficient, it might be necessary but it ain’t temporal.
Climate Crisis ain’t necessarily so
http://www.theaustralian.com.au/national-affairs/climate/earths-climate-crisis-aint-necessarily-so/story-e6frg6xf-1225992476627
But the northern mob don’t have penguins I think. These little flippered scuba-birds have neat camouflage. Some call it mimesis. Some evolution.
Speaking of evolution, getting back to an earlier WUWT quibble on Jane Goodall and REDD
http://findarticles.com/p/articles/mi_qa3724/is_200206/ai_n9130751/
(c/- cartoonist Gary Larson, wiki)
I worked in the desert, near Giles meteorological station, Surveyor Generals corner, Western Australia. Long way from nowhere, planes was pretty well the way in n out, just like the mines (and the PNG and Yukon posters) and for our great soldiers overseas. It was work, not academic tourism.
Some of the Giles gals and fellas also worked in the Antarctica.
In the desert shade some days it was 50C. We got really adept at shaking down Hg oral thermometers- cos that is all we had for diagnostic instruments for the like of infections like kids pneumonia etc. So we had a reasonable baseline to work from (human body temp) and then measured +/- from there. Anyway, it was pointless for many months of the year as the thermometer would shoot up to 42 and even with 3 minutes under the tongue there was no accurate reading to be had. And in winter (mid-year) the mornings were freezing!
So after a while, in the cooler months when our primitive instruments actually did work (and we had them calibrated yearly or when they were dropped), we started to look at anomalies and combinations of our other observations thereof. We got pretty accurate. In fact we also tested against the pathology companies, who seemed to regularly change their parameters and instruments. Our predictions and subsequent prescriptive regimes worked pretty well in the time lag of pathology pecimens gettin gout an dresult sgettin gin by plane. Digital came after I left. But that doesn’t work so well in the heat either. And none of the measurements actually really improved these peoples’ health in the long term. Short term, plenty of lives were saved.
That’s lives of people.

Jessie

January 24, 2011 3:05 am

Oops.. spelling and grammer!
‘Our predictions and subsequent prescriptive regimes worked pretty well in the time lag of pathology specimens getting out and results getting in by plane.’
That was a two week time lag! We had to make [informed] decisions on the spot.

Keith Sketchley

January 24, 2011 6:47 am

Good article, thanks! (To Eschenback as well.)
Regarding the rounding example of “rounding” 15.55555556 to 15.55, that is actually truncating – rounding would give 15.56. (The rule is half or more is rounded up, less down. Rounding is supposed to average out, truncating will bias low.)
I think people are keeping more significant places to avoid introducing more error by the conversion process, but that should average out anyway.
A fundamental is that precision becomes more critical as the two values being compared get closer to each other, because inherent inaccuracies become a very large proportion of the result. (Or even swamp it – are larger than the result, so the result is meaningless. Comparison is usually subtraction of one value from the other.)
Since we are talking about small changes in temperature over time, where does that leave the claimed climate temperature trend?
Regarding funding bias (Eschenbach’s Soviet Example), Doctor Randy Knipping from ON told of a tribal group somewhere in the hills (perhaps South Asia) that he helped, who were renowned for living to quite old ages. Knipping noted the elders had a good diet and stayed active by taking care of children while parents worked the fields, so might be expected to live longer. But he also found that they inflated their age – because age was prestige. (Source: his presentation to pilots at a COPA annual event, Lethbridge AB, early 2000s.)
Otherwise, the newer study may be interesting for dendrology, which has been discussed extensively in this forum and Climate Audit. My understanding and memory is that access to water by roots may be a substantial factor in growth rate, even at the northern locations chosen to sample. (Hmm – if vegetation shifts due to average precipitation rate, might that affect water retention that trees depend on? (I’m thinking of low vegetation, including moss, that tends to retain moisture locally. Precipitation also affects erosion, which may remove useful soil, though that is probably most affected by peak precipitation rate which may depend on severity of storms. An example of the impact of moisture retention and soil may be the areas of Lilloet and Dawson Creek B.C., the latter having subsoil structure that tends to retain moisture – the difference in average rainfall is not great between the two places, yet the amount of vegetation is.)
My modest understanding of precipitation patterns is that moisture may increase with altitude initially, such as on the wet coast (like the Cascade mountains in WA state and coastal mountains north of Vancouver BC where rainfall rate is much higher at least partway up due to moisture-laden winds hitting the mountains – some ferocious rainfall rates occur north of Vancouver, such as near Lions Bay). I guess that may not hold true at much higher elevations.
(Rate probably depends a great deal on local topography – and the downwind side of the mountains is usually drier (as in the Cascades – Ellensburg WA for example).
AusieDan makes a good point on January 20 at 5:48am. Did I hear there were plants showing up in this very wet period that hadn’t been seen for decades?
I do like to laugh occasionally. Dianne, George Turner, and “jphn S.”: thankyou!

J. Bob

January 24, 2011 7:16 am

Dave Springer says:
“My suggestion was buy 50 different kinds of thermometers from WalMart, put them all in the same place, read them as best you can, average the readings, and you’ll get a result where the accuracy and resolution is better than any individual instrument in the whole lot.”
You might want to read my previous post. If all the thermometers were made in the same “batch”, they would most likely have a bias error. Normally this is due to the manufacturing process, materials, and human testing. So no how many readings you take, you will still have the bias error, assuming you even have a normal distribution.

Solomon Green

January 24, 2011 7:58 am

My thanks to SteveMosher. I have visited one of the sites hat he recommended http://www.ncdc.noaa.gov/crn/ and if one period of 23 days suffices (I think that it does) I am satisfied that as far as the US climate is concerned (Tmax + Tmin)/2 is a good approximation for Taverage (to within 1 degree C, which with rounding is as close as you are going to get). I am happy to accept his assurance that it applies throughout the globe.
The period is far too short but it looks, at first glance, that for some stations the correlation between the min and max trends deviate.
As a matter of curiosity I wonder why there are 23 stations sited in Colorado and 21 in New Mexico but only 7 in each of California and Texas.
PS. Mr. Mosher, as someone who graduated in statistics and has often needed to source data and apply statistical tests thoughout his working life, I have come accross a number of instances where (Xmax + Xmin)/2 does not give a good approximation to Xmean, no matter how long the duration. As some of the correspondents on this site have indicated it is a question of the distribution.

Mark T

January 24, 2011 8:32 am

Pat Frank says:
January 23, 2011 at 11:31 pm

Mark T, congratulations, you’re doing a great job.

Thanks, but my head couldn’t take it anymore. My favorite was (paraphrased) “I think it must be Gaussian QED.” Unbelievable.
I am fortunate that most of the data sets I work with (real world data, folks) contain i.i.d. noise and/or measurement errors. Not all, of course. The conversion of analog signals (typically radar or comm in my case) to digital data suffers from some of the same problems as measuring temperature would. Quantization error, for example, is analogous to the minimum gradation problem and results in a uniformly distributed error between two levels of quantization… usually.
The quantization process (as well as the sampling process itself) is actually non-linear resulting in differences across the full dynamic range of the part, e.g., the error between 10 and 11 may be different than the error between 60 and 61. As a result, not all errors cancel (we refer to it as a reduction in noise bandwidth, not canceling errors, btw) when averaging or integrating over time. There are a wide variety of other noises/interferences, though originating from different locations along the entire link, that do not cancel. In the end, you wind up with various spurs in the data, some related to the sampling process itself, others due to the environment in which the signal propagates, and others related to various factors contained within the rest of the “system” itself.
If you’re good, or just lucky, you can eliminate or work around most of the errors induced by your own system (EMI/EMC precautions in particular,) even some of the environmental impediments (through filtering, adaptive cancellation/equalization, etc.,) but some of them are there to stay and may corrupt your results. Knowing which are caused by i.i.d. processes is key to understanding how to deal with them.
Mark