The Metrology of Thermometers

For those that don’t notice, this is about metrology, not meteorology, though meteorology uses the final product. Metrology is the science of measurement.

Since we had this recent paper from Pat Frank that deals with the inherent uncertainty of temperature measurement, establishing a new minimum uncertainty value of ±0.46 C for the instrumental surface temperature record, I thought it valuable to review the uncertainty associated with the act of temperature measurement itself.

As many of you know, the Stevenson Screen aka Cotton Region Shelter (CRS), such as the one below, houses a Tmax and Tmin recording mercury and alcohol thermometer.

Hanksville_looking_north
Hanksville, UT USHCN climate monitoring station with Stevenson Screen - sited over a gravestone. Photo by surfacestations.org volunteer Juan Slayton

They look like this inside the screen:

NOAA standard issue max-min recording thermometers, USHCN station in Orland, CA - Photo: A. Watts

Reading these thermometers would seem to be a simple task. However, that’s not quite the case. Adding to the statistical uncertainty derived by Pat Frank, as we see below in this guest re-post, measurement uncertainty both in the long and short term is also an issue.The following appeared on the blog “Mark’s View”, and I am reprinting it here in full with permission from the author. There are some enlightening things to learn about the simple act of reading a liquid in glass (LIG) thermometer that I didn’t know as well as some long term issues (like the hardening of the glass) that have values about as large as the climate change signal for the last 100 years ~0.7°C – Anthony

==========================================================

Metrology – A guest re-post by Mark of Mark’s View

This post is actually about the poor quality and processing of historical climatic temperature records rather than metrology.

My main points are that in climatology many important factors that are accounted for in other areas of science and engineering are completely ignored by many scientists:

  1. Human Errors in accuracy and resolution of historical data are ignored
  2. Mechanical thermometer resolution is ignored
  3. Electronic gauge calibration is ignored
  4. Mechanical and Electronic temperature gauge accuracy is ignored
  5. Hysteresis in modern data acquisition is ignored
  6. Conversion from Degrees F to Degrees C introduces false resolution into data.

Metrology is the science of measurement, embracing both experimental and theoretical determinations at any level of uncertainty in any field of science and technology. Believe it or not, the metrology of temperature measurement is complex.

It is actually quite difficult to measure things accurately, yet most people just assume that information they are given is “spot on”.  A significant number of scientists and mathematicians also do not seem to realise how the data they are working with is often not very accurate. Over the years as part of my job I have read dozens of papers based on pressure and temperature records where no reference is made to the instruments used to acquire the data, or their calibration history. The result is that many scientists  frequently reach incorrect conclusions about their experiments and data because the do not take into account the accuracy and resolution of their data. (It seems this is especially true in the area of climatology.)

Do you have a thermometer stuck to your kitchen window so you can see how warm it is outside?

Let’s say you glance at this thermometer and it indicates about 31 degrees centigrade. If it is a mercury or alcohol thermometer you may have to squint to read the scale. If the scale is marked in 1c steps (which is very common), then you probably cannot extrapolate between the scale markers.

This means that this particular  thermometer’s resolution is1c, which is normally stated as plus or minus 0.5c (+/- 0.5c)

This example of resolution is where observing the temperature is under perfect conditions, and you have been properly trained to read a thermometer. In reality you might glance at the thermometer or you might have to use a flash-light to look at it, or it may be covered in a dusting of snow, rain, etc. Mercury forms a pronounced meniscus in a thermometer that can exceed 1c  and many observers incorrectly observe the temperature as the base of the meniscus rather than it’s peak: ( this picture shows an alcohol meniscus, a mercury meniscus bulges upward rather than down)

Another  major common error in reading a thermometer is the parallax error.

Image courtesy of Surface meteorological instruments and measurement practices By G.P. Srivastava (with a mercury meniscus!) This is where refraction of light through the glass thermometer exaggerates any error caused by the eye not being level with the surface of the fluid in the thermometer.

(click on image to zoom)

If you are using data from 100’s of thermometers scattered over a wide area, with data being recorded by hand, by dozens of different people, the observational resolution should be reduced. In the oil industry it is common to accept an error margin of 2-4% when using manually acquired data for example.

As far as I am aware, historical raw multiple temperature data from weather stations has never attempted to account for observer error.

We should also consider the accuracy of the typical mercury and alcohol thermometers that have been in use for the last 120 years.  Glass thermometers are calibrated by immersing them in ice/water at 0c and a steam bath at 100c. The scale is then divided equally into 100 divisions between zero and 100. However, a glass thermometer at 100c is longer than a thermometer at 0c. This means that the scale on the thermometer gives a false high reading at low temperatures (between 0 and 25c) and a false low reading at high temperatures (between 70 and 100c) This process is also followed with weather thermometers with a range of -20 to +50c

25 years ago, very accurate mercury thermometers used in labs (0.01c resolution) had a calibration chart/graph with them to convert observed temperature on the thermometer scale to actual temperature.

Temperature cycles in the glass bulb of a thermometer harden the glass and shrink over time, a 10 yr old -20 to +50c thermometer will give a false high reading of around 0.7c

Over time, repeated high temperature cycles cause alcohol thermometers to evaporate  vapour into the vacuum at the top of the thermometer, creating false low temperature readings of up to 5c. (5.0c not 0.5 it’s not a typo…)

Electronic temperature sensors have been used more and more in the last 20 years for measuring environmental temperature. These also have their own resolution and accuracy problems. Electronic sensors suffer from drift and hysteresis and must be calibrated annually to be accurate, yet most weather station temp sensors are NEVER calibrated after they have been installed. drift is where the recorder temp increases steadily or decreases steadily, even when the real temp is static and is a fundamental characteristic of all electronic devices.

Drift, is where a recording error gradually gets larger and larger over time- this is a quantum mechanics effect in the metal parts of the temperature sensor that cannot be compensated for typical drift of a -100c to+100c electronic thermometer is about 1c per year! and the sensor must be recalibrated annually to fix this error.

Hysteresis is a common problem as well- this is where increasing temperature has a different mechanical affect on the thermometer compared to decreasing temperature, so for example if the ambient temperature increases by 1.05c, the thermometer reads an increase on 1c, but when the ambient temperature drops by 1.05c, the same thermometer records a drop of 1.1c. (this is a VERY common problem in metrology)

Here is a typical food temperature sensor behaviour compared to a calibrated thermometer without even considering sensor drift: Thermometer Calibration depending on the measured temperature in this high accuracy gauge, the offset is from -.8 to +1c

But on top of these issues, the people who make these thermometers and weather stations state clearly the accuracy of their instruments, yet scientists ignore them!  a -20c to +50c mercury thermometer packaging will state the accuracy of the instrument is +/-0.75c for example, yet frequently this information is not incorporated into statistical calculations used in climatology.

Finally we get to the infamous conversion of Degrees Fahrenheit to Degrees Centigrade. Until the 1960’s almost all global temperatures were measured in Fahrenheit. Nowadays all the proper scientists use Centigrade. So, all old data is routinely converted to Centigrade.  take the original temperature, minus 32 times 5 divided by 9.

C= ((F-32) x 5)/9

example- original reading from 1950 data file is 60F. This data was eyeballed by the local weatherman and written into his tallybook. 50 years later a scientist takes this figure and converts it to centigrade:

60-32 =28

28×5=140

140/9= 15.55555556

This is usually (incorrectly) rounded  to two decimal places =: 15.55c without any explanation as to why this level of resolution has been selected.

The correct mathematical method of handling this issue of resolution is to look at the original resolution of the recorded data. Typically old Fahrenheit data was recorded in increments of 2 degrees F, eg 60, 62, 64, 66, 68,70. very rarely on old data sheets do you see 61, 63 etc (although 65 is slightly more common)

If the original resolution was 2 degrees F, the resolution used for the same data converted to  Centigrade should be 1.1c.

Therefore mathematically :

60F=16C

61F17C

62F=17C

etc

In conclusion, when interpreting historical environmental temperature records one must account for errors of accuracy built into the thermometer and errors of resolution built into the instrument as well as errors of observation and recording of the temperature.

In a high quality glass environmental  thermometer manufactured in 1960, the accuracy would be +/- 1.4F. (2% of range)

The resolution of an astute and dedicated observer would be around +/-1F.

Therefore the total error margin of all observed weather station temperatures would be a minimum of +/-2.5F, or +/-1.30c…

===============================================================

UPDATE: This comment below from Willis Eschenbach, spurred by Steven Mosher, is insightful, so I’ve decided to add it to the main body – Anthony

===============================================================

Willis Eschenbach says:

As Steve Mosher has pointed out, if the errors are random normal, or if they are “offset” errors (e.g. the whole record is warm by 1°), increasing the number of observations helps reduce the size of the error. All that matters are things that cause a “bias”, a trend in the measurements. There are some caveats, however.

First, instrument replacement can certainly introduce a trend, as can site relocation.

Second, some changes have hidden bias. The short maximum length of the wiring connecting the electronic sensors introduced in the late 20th century moved a host of Stevenson Screens much closer to inhabited structures. As Anthony’s study showed, this has had an effect on trends that I think is still not properly accounted for, and certainly wasn’t expected at the time.

Third, in lovely recursiveness, there is a limit on the law of large numbers as it applies to measurements. A hundred thousand people measuring the width of a hair by eye, armed only with a ruler measured in mm, won’t do much better than a few dozen people doing the same thing. So you need to be a little careful about saying problems will be fixed by large amounts of data.

Fourth, if the errors are not random normal, your assumption that everything averages out may (I emphasize may) be in trouble. And unfortunately, in the real world, things are rarely that nice. If you send 50 guys out to do a job, there will be errors. But these errors will NOT tend to cluster around zero. They will tend to cluster around the easiest or most probable mistakes, and thus the errors will not be symmetrical.

Fifth, the law of large numbers (as I understand it) refers to either a large number of measurements made of an unchanging variable (say hair width or the throw of dice) at any time, or it refers to a large number of measurements of a changing variable (say vehicle speed) at the same time. However, when you start applying it to a large number of measurements of different variables (local temperatures), at different times, at different locations, you are stretching the limits …

Sixth, the method usually used for ascribing uncertainty to a linear trend does not include any adjustment for known uncertainties in the data points themselves. I see this as a very large problem affecting all calculation of trends. All that are ever given are the statistical error in the trend, not the real error, which perforce much be larger.

Seventh, there are hidden biases. I have read (but haven’t been able to verify) that under Soviet rule, cities in Siberia received government funds and fuel based on how cold it was. Makes sense, when it’s cold you have to heat more, takes money and fuel. But of course, everyone knew that, so subtracting a few degrees from the winter temperatures became standard practice …

My own bozo cowboy rule of thumb? I hold that in the real world, you can gain maybe an order of magnitude by repeat measurements, but not much beyond that, absent special circumstances. This is because despite global efforts to kill him, Murphy still lives, and so no matter how much we’d like it to work out perfectly,  errors won’t be normal, and biases won’t cancel, and crucial data will be missing, and a thermometer will be broken and the new one reads higher, and …

Finally, I would back Steven Mosher to the hilt when he tells people to generate some pseudo-data, add some random numbers, and see what comes out. I find that actually giving things a try is often far better than profound and erudite discussion, no matter how learned.

w.

Get notified when a new post is published.
Subscribe today!
5 2 votes
Article Rating
240 Comments
Inline Feedbacks
View all comments
Solomon Green
January 24, 2011 9:24 am

Sorry, in my previous post (7.58) I meant to write 0.1 degree C NOT 1 degree C.

Mark T
January 24, 2011 9:27 am

Solomon Green says:

PS. Mr. Mosher, as someone who graduated in statistics and has often needed to source data and apply statistical tests thoughout his working life, I have come accross a number of instances where (Xmax + Xmin)/2 does not give a good approximation to Xmean, no matter how long the duration. As some of the correspondents on this site have indicated it is a question of the distribution.

Yes, this is true. This is particularly true if the waveform/distribution changes over time. For example, the typical “waveform” for day/night temperature cycles is somewhat sinusoidal. If, however, it changes such that the low portion lingers for a longer period of time, then using this equation will induce a bias because the “mean” will begin to adjust downward.
For the most part it does seem reasonable to assume this does not happen, or if it does, it happens slowly (as the orbit changes, for example, which is a pretty slow process.) Using the true mean would probably resolve this, but I don’t think that is currently feasible and certainly we would not be able to apply such a procedure to past data, so we would be starting over.
Mark

Mark T
January 24, 2011 9:31 am

Mark T says:

If, however, it changes such that the low portion lingers for a longer period of time, then using this equation will induce a bias because the “mean” will begin to adjust downward.

This is stated in an unclear manner. What I meant was that using (Tmax + Tmin)/2 may result in the same answer (because Tmax and Tmin may be the same) even though the true mean may be going down (or up, for that matter) over time.
Mark

Jeff
January 24, 2011 10:02 am

It would seem to be simple to build a glass themometer reading instrument that would be able to read the fluid level exactly the same way day in a day out …

January 24, 2011 10:16 am

Anthony’s post about my article is now off the first page at WUWT. However, for those interested, I’ve posted a point-by-point refutation of EFS_Junior’s criticism, here.

EFS_Junior
January 24, 2011 3:08 pm

Pat Frank says:
January 24, 2011 at 10:16 am
Anthony’s post about my article is now off the first page at WUWT. However, for those interested, I’ve posted a point-by-point refutation of EFS_Junior’s criticism, here.
_____________________________________________________________
And I’ve responded with a one point refudiation of that post. Quite airtight I might add, so further commentary will not be necessary at my end going forward. I’ve already devoted much more time to this topic than I should have.
It has been a good discussion though, and the time devoted was not wasted time. It has given me pause for thought, the numerical experiments were indeed helpful (to me at least).
IMHO, these discussions have only strengthened my technical opinion on the subject matter at hand.
In short, I’ve learned quit a lot, in spite of the deep differences in technical opinions we all have.
Thanks all. 🙂

January 24, 2011 7:24 pm

Everybody seems to assume that the people who take the observations are always HONEST.
US observers are Volunteers, unpaid. The Met Service in Russia in the 1980s suddenly had no money. The observers were unpaid. Why should they get out of bed in an unusually cold morning? What happens if they are ill, there is a blizzard, they want to go to a football match? Who checks the observations for accuaacy or believability? Many observers know that the bosses want to believe that temperatures are rising, so if they fake the results they had better not show that it is getting cooler. They might even be activists who slightly exaggerate, even unconsciously.
Automatic results are submitted only hourly, not continuously, so they no longer record the true maximum and minimum.
There is an inbuilt tendency throughout the entire sytem to make all the trends the same. They do this with the models. It is called intercomparison. This can also be unconscious or justified by elimination of “outliers” or “noise”. They do this with CO2 measurements too.
I suspect that some of the people who participate in smoothing out different records to give uniform, desired results have justified themselves in some of the earlier comments above.
On top of this there is the confusion documented in Clmategate.

January 24, 2011 9:23 pm

I may have missed it, but I saw no reference to the fact that temperatures were recorded to the nearest 0.5° until thermocouples appeared.
Also I didn’t see any reference to the fact that mercury freezes at -38.8°C. It becomes increasingly less malleable as it approaches that temperature and makes low temperatures with mercury thermometers of no value. The 18th century observers of the Hudson’s Bay Company using thermometers provided by the Royal Society were unaware of the problem. They replaced them with spirit thermometers and carried out experiments with the freezing of mercury. They did compare the results of mercury and spirit readings. In January 1822 they record, “Some quick silver that had been out some time ago for trying the cold was observed to be frozen while the thermometer was only 36 below zero which proves the weather to have been six degrees colder than per the thermometer.”

January 24, 2011 9:58 pm

Quite airtight I might add
Tedious it will be, but showing the poverty of your reply won’t be hard, EFS_Junior. Stay tuned.

January 25, 2011 3:38 am

As I have said here:
http://wattsupwiththat.com/2011/01/22/the-metrology-of-thermometers/#comment-580475
Because the so called “global warming signal” is half the margin of error it is impossible to know if this claimed signal is genuine, or as Dr. Gray points out, human bias, or indeed as Dr. Ball highlights, instrumental bias. Therefore the argument is a faux debate and cannot be resolved.
As I have said here:
http://wattsupwiththat.com/2011/01/22/the-metrology-of-thermometers/#comment-580783
“You cannot measure a 0.7º C average trend if your measuring equipment is not on average accurate to 1.3º C.”
Unless you meet certain criteria as patiently and correctly maintained by Mark T and Pat Frank in this thread, in that all errors in the measuring equipment are uniform to the extent that over time they cancel. This criteria has not and cannot be met.
Which renders the so called “global warming signal” meaningless.
I have also said,
“The irony of playing it this safe is that 0.7º C in 100+ years does not equate to an anomalous warming event. Particularly if the margin of error is 1.3º C.”
There is no definable “global warming signal”.
Which means:
There is no real world scientific evidence of “man made” CO2 induced “global warming”.
There is no real world scientific evidence that trace amounts of CO2 can force the 99% of the atmosphere into equilibrium with itself. In fact, quite obviously, it is the exact opposite which actually occurs.
See here:
http://wattsupwiththat.com/2011/01/22/the-metrology-of-thermometers/#comment-580999
I’d say this hoax is dead. It’s time to move on to the next hoax.
Bring on the ALIENS.

jaymam
January 25, 2011 1:04 pm

I meant to say: “Tmin in NZ has increased slightly, while 9am and Tmax has stayed about the same.”
Obviously Mean will have increased since Tmin has increased, and NIWA seem to be calculating Mean from Tmin and Tmax.
9am temperatures have actually reduced over NIWA’s preferred nine NZ sites:
http://i40.tinypic.com/353dra9.jpg
Go and get the data for yourself if you don’t believe that. The data is available for download free.
In NZ, Tmin has been increasing only in winter in urban areas, i.e. the places that NIWA has decided to choose for their temperature measurements.
By what process does AGW predict that Tmin will increase in urban areas in winter? Is AGW now admitting to an Urban Heat Island effect?

Jessie
January 25, 2011 8:25 pm

Will says:
January 25, 2011 at 3:38 am
Bring on the ALIENS.
Thank you Willis for your responses, However no sooner said than done…………..
They just have, and making money to boot.
http://www.theaustralian.com.au/travel/news/crop-circle-in-indonesian-rice-paddy/story-e6frg8ro-1225994485945
Less rice for the starving but at least the in-pocket petrol price will jump for the sight-seeing on motorbikes and the villagers will have a developed a new form of Grameen Banking. Investment in laser landforming and CAD instead of women and water buffalo?
http://www.rga.org.au/rice/growingau.asp

Ike
January 26, 2011 2:37 pm

David Springer: There is no reason to believe that any mathematical operation one may care to perform on the collection of instrumental temperature records that exist will give a more – or less – accurate temperature reading for that time, date and location than the illustration I used. (I attempted to use the overall numbers the author had written in the article; perhaps I failed to do so.) Since each instrumental measurement of temperature has some unknown – and in the case of some of the records, unknowable – margin of error, nothing can be done using math, statistics or any other abstract discipline to calculate the what the “actual” temperature reading was for each of the times, dates and locations. After reading the article, that dawned on me and I expressed it in my first comment. To claim that somehow all of the measurements necessarily will “average-out” or all the errors will “cancel each other out” over a long enough period of time is an assertion unsupported by any evidence. Most of the math being referred to is not something which has been tested experimentally against real world events, but rather are mathematical constructs, assumed to be valid in in the real world because they were derived according to the “rules” of mathematics.
Whether the “confidence” level is 90% or 95% or 98%, there is always going to be a margin of error in each and every measurement of any real world quantity. The higher the confidence, the smaller the margin of error, but in this case, despite the high confidence levels, the point of the article is that the margins of error are larger than the math-based assertions. How? Because there is a missing 2% or 5% or 10% which is a consequence of using mathematical abstractions, rather than simple arithmetical aggregations of the data. Further, it is each measurement which is a source of error and so the errors in measurement compound with one another in the production of the “global temperature” figures widely published as being authoritative and being sufficiently accurate and precise for use in guiding public policy. At the end of whatever processes are used to determine that temperature figure, there is a margin of error of unknown – and perhaps unknowable – size which is directly applicable to the final figure claimed, just as if it were an instrumental temperature measurement from a single station.
Your respond is sophiscated and well-written, sir, but it is likewise sophistic and inapplicable to the matter of margins of error in instrumental temperature measurements.

January 26, 2011 10:01 pm

Part 1 of my reply to EFS_Junior’s critique of my paper has now been posted.
Part 2 will be forthcoming.

January 30, 2011 6:21 pm

Part II of my reply to EFS_Junior is now here.

1 8 9 10