The Metrology of Thermometers

For those that don’t notice, this is about metrology, not meteorology, though meteorology uses the final product. Metrology is the science of measurement.

Since we had this recent paper from Pat Frank that deals with the inherent uncertainty of temperature measurement, establishing a new minimum uncertainty value of ±0.46 C for the instrumental surface temperature record, I thought it valuable to review the uncertainty associated with the act of temperature measurement itself.

As many of you know, the Stevenson Screen aka Cotton Region Shelter (CRS), such as the one below, houses a Tmax and Tmin recording mercury and alcohol thermometer.


Hanksville, UT USHCN climate monitoring station with Stevenson Screen - sited over a gravestone. Photo by volunteer Juan Slayton

They look like this inside the screen:

NOAA standard issue max-min recording thermometers, USHCN station in Orland, CA - Photo: A. Watts

Reading these thermometers would seem to be a simple task. However, that’s not quite the case. Adding to the statistical uncertainty derived by Pat Frank, as we see below in this guest re-post, measurement uncertainty both in the long and short term is also an issue.The following appeared on the blog “Mark’s View”, and I am reprinting it here in full with permission from the author. There are some enlightening things to learn about the simple act of reading a liquid in glass (LIG) thermometer that I didn’t know as well as some long term issues (like the hardening of the glass) that have values about as large as the climate change signal for the last 100 years ~0.7°C – Anthony


Metrology – A guest re-post by Mark of Mark’s View

This post is actually about the poor quality and processing of historical climatic temperature records rather than metrology.

My main points are that in climatology many important factors that are accounted for in other areas of science and engineering are completely ignored by many scientists:

  1. Human Errors in accuracy and resolution of historical data are ignored
  2. Mechanical thermometer resolution is ignored
  3. Electronic gauge calibration is ignored
  4. Mechanical and Electronic temperature gauge accuracy is ignored
  5. Hysteresis in modern data acquisition is ignored
  6. Conversion from Degrees F to Degrees C introduces false resolution into data.

Metrology is the science of measurement, embracing both experimental and theoretical determinations at any level of uncertainty in any field of science and technology. Believe it or not, the metrology of temperature measurement is complex.

It is actually quite difficult to measure things accurately, yet most people just assume that information they are given is “spot on”.  A significant number of scientists and mathematicians also do not seem to realise how the data they are working with is often not very accurate. Over the years as part of my job I have read dozens of papers based on pressure and temperature records where no reference is made to the instruments used to acquire the data, or their calibration history. The result is that many scientists  frequently reach incorrect conclusions about their experiments and data because the do not take into account the accuracy and resolution of their data. (It seems this is especially true in the area of climatology.)

Do you have a thermometer stuck to your kitchen window so you can see how warm it is outside?

Let’s say you glance at this thermometer and it indicates about 31 degrees centigrade. If it is a mercury or alcohol thermometer you may have to squint to read the scale. If the scale is marked in 1c steps (which is very common), then you probably cannot extrapolate between the scale markers.

This means that this particular  thermometer’s resolution is1c, which is normally stated as plus or minus 0.5c (+/- 0.5c)

This example of resolution is where observing the temperature is under perfect conditions, and you have been properly trained to read a thermometer. In reality you might glance at the thermometer or you might have to use a flash-light to look at it, or it may be covered in a dusting of snow, rain, etc. Mercury forms a pronounced meniscus in a thermometer that can exceed 1c  and many observers incorrectly observe the temperature as the base of the meniscus rather than it’s peak: ( this picture shows an alcohol meniscus, a mercury meniscus bulges upward rather than down)

Another  major common error in reading a thermometer is the parallax error.

Image courtesy of Surface meteorological instruments and measurement practices By G.P. Srivastava (with a mercury meniscus!) This is where refraction of light through the glass thermometer exaggerates any error caused by the eye not being level with the surface of the fluid in the thermometer.

(click on image to zoom)

If you are using data from 100’s of thermometers scattered over a wide area, with data being recorded by hand, by dozens of different people, the observational resolution should be reduced. In the oil industry it is common to accept an error margin of 2-4% when using manually acquired data for example.

As far as I am aware, historical raw multiple temperature data from weather stations has never attempted to account for observer error.

We should also consider the accuracy of the typical mercury and alcohol thermometers that have been in use for the last 120 years.  Glass thermometers are calibrated by immersing them in ice/water at 0c and a steam bath at 100c. The scale is then divided equally into 100 divisions between zero and 100. However, a glass thermometer at 100c is longer than a thermometer at 0c. This means that the scale on the thermometer gives a false high reading at low temperatures (between 0 and 25c) and a false low reading at high temperatures (between 70 and 100c) This process is also followed with weather thermometers with a range of -20 to +50c

25 years ago, very accurate mercury thermometers used in labs (0.01c resolution) had a calibration chart/graph with them to convert observed temperature on the thermometer scale to actual temperature.

Temperature cycles in the glass bulb of a thermometer harden the glass and shrink over time, a 10 yr old -20 to +50c thermometer will give a false high reading of around 0.7c

Over time, repeated high temperature cycles cause alcohol thermometers to evaporate  vapour into the vacuum at the top of the thermometer, creating false low temperature readings of up to 5c. (5.0c not 0.5 it’s not a typo…)

Electronic temperature sensors have been used more and more in the last 20 years for measuring environmental temperature. These also have their own resolution and accuracy problems. Electronic sensors suffer from drift and hysteresis and must be calibrated annually to be accurate, yet most weather station temp sensors are NEVER calibrated after they have been installed. drift is where the recorder temp increases steadily or decreases steadily, even when the real temp is static and is a fundamental characteristic of all electronic devices.

Drift, is where a recording error gradually gets larger and larger over time- this is a quantum mechanics effect in the metal parts of the temperature sensor that cannot be compensated for typical drift of a -100c to+100c electronic thermometer is about 1c per year! and the sensor must be recalibrated annually to fix this error.

Hysteresis is a common problem as well- this is where increasing temperature has a different mechanical affect on the thermometer compared to decreasing temperature, so for example if the ambient temperature increases by 1.05c, the thermometer reads an increase on 1c, but when the ambient temperature drops by 1.05c, the same thermometer records a drop of 1.1c. (this is a VERY common problem in metrology)

Here is a typical food temperature sensor behaviour compared to a calibrated thermometer without even considering sensor drift: Thermometer Calibration depending on the measured temperature in this high accuracy gauge, the offset is from -.8 to +1c

But on top of these issues, the people who make these thermometers and weather stations state clearly the accuracy of their instruments, yet scientists ignore them!  a -20c to +50c mercury thermometer packaging will state the accuracy of the instrument is +/-0.75c for example, yet frequently this information is not incorporated into statistical calculations used in climatology.

Finally we get to the infamous conversion of Degrees Fahrenheit to Degrees Centigrade. Until the 1960’s almost all global temperatures were measured in Fahrenheit. Nowadays all the proper scientists use Centigrade. So, all old data is routinely converted to Centigrade.  take the original temperature, minus 32 times 5 divided by 9.

C= ((F-32) x 5)/9

example- original reading from 1950 data file is 60F. This data was eyeballed by the local weatherman and written into his tallybook. 50 years later a scientist takes this figure and converts it to centigrade:

60-32 =28


140/9= 15.55555556

This is usually (incorrectly) rounded  to two decimal places =: 15.55c without any explanation as to why this level of resolution has been selected.

The correct mathematical method of handling this issue of resolution is to look at the original resolution of the recorded data. Typically old Fahrenheit data was recorded in increments of 2 degrees F, eg 60, 62, 64, 66, 68,70. very rarely on old data sheets do you see 61, 63 etc (although 65 is slightly more common)

If the original resolution was 2 degrees F, the resolution used for the same data converted to  Centigrade should be 1.1c.

Therefore mathematically :





In conclusion, when interpreting historical environmental temperature records one must account for errors of accuracy built into the thermometer and errors of resolution built into the instrument as well as errors of observation and recording of the temperature.

In a high quality glass environmental  thermometer manufactured in 1960, the accuracy would be +/- 1.4F. (2% of range)

The resolution of an astute and dedicated observer would be around +/-1F.

Therefore the total error margin of all observed weather station temperatures would be a minimum of +/-2.5F, or +/-1.30c…


UPDATE: This comment below from Willis Eschenbach, spurred by Steven Mosher, is insightful, so I’ve decided to add it to the main body – Anthony


Willis Eschenbach says:

As Steve Mosher has pointed out, if the errors are random normal, or if they are “offset” errors (e.g. the whole record is warm by 1°), increasing the number of observations helps reduce the size of the error. All that matters are things that cause a “bias”, a trend in the measurements. There are some caveats, however.

First, instrument replacement can certainly introduce a trend, as can site relocation.

Second, some changes have hidden bias. The short maximum length of the wiring connecting the electronic sensors introduced in the late 20th century moved a host of Stevenson Screens much closer to inhabited structures. As Anthony’s study showed, this has had an effect on trends that I think is still not properly accounted for, and certainly wasn’t expected at the time.

Third, in lovely recursiveness, there is a limit on the law of large numbers as it applies to measurements. A hundred thousand people measuring the width of a hair by eye, armed only with a ruler measured in mm, won’t do much better than a few dozen people doing the same thing. So you need to be a little careful about saying problems will be fixed by large amounts of data.

Fourth, if the errors are not random normal, your assumption that everything averages out may (I emphasize may) be in trouble. And unfortunately, in the real world, things are rarely that nice. If you send 50 guys out to do a job, there will be errors. But these errors will NOT tend to cluster around zero. They will tend to cluster around the easiest or most probable mistakes, and thus the errors will not be symmetrical.

Fifth, the law of large numbers (as I understand it) refers to either a large number of measurements made of an unchanging variable (say hair width or the throw of dice) at any time, or it refers to a large number of measurements of a changing variable (say vehicle speed) at the same time. However, when you start applying it to a large number of measurements of different variables (local temperatures), at different times, at different locations, you are stretching the limits …

Sixth, the method usually used for ascribing uncertainty to a linear trend does not include any adjustment for known uncertainties in the data points themselves. I see this as a very large problem affecting all calculation of trends. All that are ever given are the statistical error in the trend, not the real error, which perforce much be larger.

Seventh, there are hidden biases. I have read (but haven’t been able to verify) that under Soviet rule, cities in Siberia received government funds and fuel based on how cold it was. Makes sense, when it’s cold you have to heat more, takes money and fuel. But of course, everyone knew that, so subtracting a few degrees from the winter temperatures became standard practice …

My own bozo cowboy rule of thumb? I hold that in the real world, you can gain maybe an order of magnitude by repeat measurements, but not much beyond that, absent special circumstances. This is because despite global efforts to kill him, Murphy still lives, and so no matter how much we’d like it to work out perfectly,  errors won’t be normal, and biases won’t cancel, and crucial data will be missing, and a thermometer will be broken and the new one reads higher, and …

Finally, I would back Steven Mosher to the hilt when he tells people to generate some pseudo-data, add some random numbers, and see what comes out. I find that actually giving things a try is often far better than profound and erudite discussion, no matter how learned.



newest oldest most voted
Notify of
Ceri Reid

Very interesting post, thanks.
As an engineer, most of the ideas here were pretty familiar to me. I find it almost unbelievable that the climate records aren’t being processed in a way which reflects the uncertainty of the data, as is stated in the article. What evidence is there that this is the case (I think I mean: is the processing applied by GISS or CRU clearly described, and is there any kind of audit trail? Have the academics who processed the data published the processing method?).
I think the F to C conversion issue is tricky. I think the conversion method you favour (using significant figures in C to reflect uncertainty) would lead to a bias which varies with temperature. What is actually needed is proper propagation of the known uncertainty through the calculation, rather than using the implied accuracy of the number of significant figures. So the best conversion from 60F to C would be 15.55+/-1.1C (in your example above). But obviously, promulgating 15.55 is fraught with the danger of the known 1.1C uncertainty being forgotten, and the implied 0.005C certainty used instead. Which would be bad.

Grumpy Old Man

Is the Hanksville photo one of those,”how many mistakes” competitions?

Mike Haseler

I once designed a precision temperature “oven” which had a display showing the temperature to 0.01C. In practice the controller for device was only accurate to 0.1C at best and with a typical lab thermometer there was at least another 0.1C error. Then the was the fact you were not measuring the temperature at the centre of the oven and drift and even mains supply variation had a significant effect!
All in all the error of this device which might appear to be accurate to 0.01C could have been as bad as the total so called “global warming signal”.
I’ve also set up commercial weather stations using good commercial equipment which I believe is also used by many meteorological stations and the total error is above +/-1C even on this “good” equipment.
As for your bulk standard thermometer from a DIY shop. Go to one and take a reading from them all and see how much they vary … it’s normally as much as 2C or even 3C from highest to lowest.
Basically, the kind of temperature error being quoted by the climategate team is only possible in a lab with regularly calibrated equipment.


I think Hanksville was just a pretty typical set-up. Some of those stations in the study were in even worse shape from what I saw. I particularly liked the one in the junkyard. Okay… maybe it wasn’t a junkyard, but that was what it looked like.
As for the Mark’s View post, I read this over at his site and thought it was fascinating. I knew about the calibration requirements for electronic test equipment, but had no idea of the vagaries behind the simple mechanical/visual thermometer.


Are those thermometers nailed horizontal onto the Stevenson Screen?
Surely a horizontal thermometer, without the gravitational component, will read differently to a vertical one. And if it is touching the wood, then surely there is radiative cooling of the wood at night and warming in the day. Surely the thermometers should be insulated from the screen by a few centimeters. How much error would this give??

I actually ran a spreadsheet back in December showing how just 3 instrument changes, with each instrument having better resolution, over a long term historical record changes the trend of the data (over 1° F in change to the USCHN data)

Peter Plail

The picture of the inside of a Stevenson Screen raised a couple of questions in my mind that perhaps a professional can answer:
Does the orientation of a glass thermometer affect its reading? I have thought of all thermometers as being used approximately vertically but those in the screen are shown as horizontal.
Is the vertical position inside the screen relevant? Even though there are ventilation louvres, I would expect some sort of vertical temperature gradient within the enclosure, reporting higher temperatures closer to the top of the enclosure. This, of course would be especially bad when exposed to full sun, with a tendency therefore to over-report temperatures. I would also expect this problem to increase with time as the reflectivity of the white paint drops with flaking, build up of dust and dirt etc.

John Marshall

So in reality the 0.6C feared temperature rise could mean that statistically there has been no temperature change. And all models give the wrong answer because temperature inputs are incorrect.
Very interesting article is a pdf copy possible please Anthony?


pedants’ corner: ‘extrapolate’ between the scale markers – or ‘interpolate’?


As long as the errors don’t trend in a biased way over time, the fact that there are thousands of sensors should make standard errors small (variance divided by square root no. of observations).
Of course, one biased sensor in the middle of nowhere would have a disproportionate effect – although I’m not clear how the interpolations are done.


I mean standard deviation (or put square root around the brackets)…

Ziiex Zeburz

As the earths surface area is +- 196,935,000 square miles and the geological parameters are as diverse as it is possible to imagine, we have intelligent people declaring that the world is heating by as much as 2.0c per 100 years, with official recorded temp. covering perhaps 1% of the total area of the earth. I fully agree with the above article, we as humans are all idiots, some of us think we understand what we are trying to do, but even reading a temp. looks beyond the scope of those who’s job it is.
Italy is a prime example of human intelligence, knowledge, and understanding, it is in that country for all to see, and Italians have in the past produced some of the worlds most outstanding thinkers, BUT !!!! if you put 100 Italians into a room and ask them to form a political party, at the end of a month you would find that they have formed 100 plus political parties, and thousands of political ideologies non of which address the problems facing the country.
And Italy is one of the worlds better countries, it had inside baths and running hot water when the English were still learning to make fire. but here we are 3000 years later and 99.9% of the world population still cannot read a thermometer, like i said above we are all idiots.

Joe Lalonde

You missed a few big errors in thermometers only from experience with them.
Having them measure in direct sunlight. The material around it absorbs heat and gives a false warmer temperature reading.
Some themomters are pressure fit or have a fastened and can slide in the sleeve.
Having the themometer snow covered in heavy blowing wind.
Moisture on the themometers.

“The correct mathematical method of handling this issue of resolution is to look at the original resolution of the recorded data. Typically old Fahrenheit data was recorded in increments of 2 degrees F, eg 60, 62, 64, 66, 68,70. very rarely on old data sheets do you see 61, 63 etc (although 65 is slightly more common)
If the original resolution was 2 degrees F, the resolution used for the same data converted to Centigrade should be 1.1c.
Therefore mathematically :
In conclusion, when interpreting historical environmental temperature records one must account for errors of accuracy built into the thermometer and errors of resolution built into the instrument as well as errors of observation and recording of the temperature.”
In GHCN observers recorded F by rounding up or down.
Tmax to 1degree F
Tmin to 1 degree F.
Then the result is averaged and rounded (tmax+tmin/2)
Now, if you think that changing from F to C is an issue you can do the following.
calculate the trend in F
convert F to C and calculate the trend.
Also, the “observer error” and transcription errors are all addressed in the literature, see brohan 06.
The other thing that is instructive is to compare two thermometers that are within a few km of each other.. over the period of say 100 years. Look at the corellation.
98% plus.
Or you can write a simulation of a sensor with very gross errors. simulate daily data for 100 years. Assume small errors. calculate the trend. Assume large errors. calculated the trend.
Result? the error structure of individual measures doesnt impact your estimation of the long term trends. NOW, if many thermomemters all has BIASES ( not uncertainty) and if those biases were skewed hot or cold, and if those biases changed over time, then your trend estimation would get impacted
Result? no difference.

Brian H

You don’t get to just wave away the possibility of systematic error. That’s the whole point of error bars. You can’t know anything about systematic error WITHOUT ACTUALLY CHECKING FOR IT. And such checking has not been done. Ergo ….

Steven Mosher

result? the error structure of individual measures doesnt impact your estimation of the long term trends. NOW, if many thermomemters all has BIASES ( not uncertainty) and if those biases were skewed hot or cold, and if those biases changed over time, then your trend estimation would get impacted. Result? no difference.

I agree with you, but only in case all biases occur at same time, same direction, same amount. And that is never ever the case. One even don´t know how much, which and where that kind of bias appear, until you look at every station and have a deep look in its history. What only Antonys volunteers did. But at sea (7/10 of earth surface) situation is even much more worse. Literature is full about this issue. The Metadata neccessary to estimate – and perhaps luckily correct that kind of bias- are not available. So what is left is: You should mention them and increase your range of uncertainty accordingly.

Viv Evans

Thanks – a very enlightening post!
I was most interested in the way that looking at the meniscus from different angles will give you different results. Ages ago, we students were shown how to do this, in an introductory practical about metrology.
In my naivety I assumed that this care was taken by all who record temperatures, and that the climate scientists using those data were aware of the possibility of measuring errors … apparently not.
So even if sloppiness here or there doesn’t influence the general trend, as Steven Mosher points out above – should we not ask what other measurements used by the esteemed computer models are equally sloppy, and do not even address the question of metrological quality control?


I saw the uniform o.46 number for error over the instrumental record and my first thoughts were on the estimates of sea surface temperature. The oceans cover 70 percent of the globe yet until about 30-40 years ago the temperature measurement followed shipping lanes and the spacial coverage was poor. Land temp measurements were also weighted heavily in more developed regions. So to me it is absurd to think the accuracy can be that consistent based on spacial distribution changes of where the temp has been measured only.

David L

Your article is “spot on”. I have various mercury bulb and alcohol bulb thermometers around my house as well as type K and RTD thermocouples. On any given day you can’t really get better than 1 or 2 deg F agreement between them all.
As a person versed in both thermodynamics and statistics I find it amusing every time I see precision and accuracy statements from the climate community.


Besides the paint chipping off over time, if it’s really cold and the thermometer reader takes some time to read the thermometer, or gets really close to it due to 1) bad eyesight or 2) to get that “perfect” measurement in the interest of science, either breathing on the thermometer or body heat would affect the reading to the upside wouldn’t it? The orientation of the thermometer in the picture shows that a person would be facing the thermometer and breathing in that direction for however long it took to take the reading. And if it was very cold and windy that sure would make a nice little shelter to warm up for a moment and scrape the fog off the glasses, maybe have a sip of coffee before taking the reading.
Another example of a measurement errors to the upside. Why oh why are they always to the upside.

Area Man

Great post. As a practicing engineer involved in design and analysis of precision measurement equipment I am well aware of the challenges in making measurements and interpreting data. This post is spot-on in its observation that many data users put little thought into the accuracy of the measurements.
Especially in cases where one is searching for real trends, the presence of measurement drift (as opposed to random errors) can create huge problems. The glass hardening issue is therefore huge here.
Any chance of being allowed to make an icewater measurement with one of these old thermometers? I’m sure that result would be fascinating.

Adam Soereg

There is one more source of uncertainty which is not mentioned in this excellent article: changes in observation times. Different daily average calculation methods could create a significant warm or cold bias compared to the true 24-hour average temperature of any day. The difference will be different in each station and in each historical measurement site, because the average daily temperature curve is determined by microclimatic effects.
In most of the cases, climatologists try to account for these biases with monthly mean adjustments calculated from 24-hour readings. However, it is impossible to adjust these errors correctly with a single number, when the deviations are not the same in each observing site.
Let me show you an example for this issue:
Between 1780 and 1870, Hungarian sites observed the outdoor temperature at 7-14-21h, 6-14-22h or 8-15-22h Local Time, depending on location. How can anyone compare these early readings with contemporary climatological data? (the National Met. Service defines the daily mean temperature as the average of 24-hourly observations)
The average annual difference between 7-14-21h LT and 24-hr readings calcuted from over a million automatic measurements is -0.283°C. This old technique causes a warm bias, which is most pronounced in early summer (-0.6°c in June) and negligible in late winter/early spring. Monthly adjustments are within 0.0 and -0.6°c. The accuracy of these adjustments are different in each month, 1-sigma standard error varies between 0.109 and 0.182°C. Instead of a single value, we can only define an interval for each historical monthly and annual mean.
I am wondering what exactly CRU does when they incorporate 19th century Hungarian data to their CRUTEM3 global land temperature series. Observation time problems (bias and random error) are just one source of uncertainty. What about different site exposures – Stevenson screens haven’t existed that time – or the old thermometers which were scaled in Reaumur degrees instead of Celsius? These issues are well documented in hand-written station diaries, 19th century yearbooks and other occasional publications. Such informations are only available in Hungarian, did Phil Jones ever read them? 🙂

Alexander K

Great Post, Mark, and thanks for giving it the air it deserves, Anthony. I read this on Mark’s blog a couple of days ago and was sure it was worth wider promulgation.

Mark Sonter

My memory from running a university metsite 40 yrs ago (University of Papua New Guinea, Port Moresby), is that the ‘almost horizontal’ orientation commented on and queried by a couple of posters, is because the max and min thermometers have little indicator rods inside the bore, which get pushed up to max by the meniscus and stick there (Hg) ; and pulled down to the min and stick there (alcohol). The weather observer resets them by turning to the vertical (upright for the Hg (max), and the rod slides back down to again touch the meniscus); or upside down, for the alcohol (min), wherupon the rod slides back up, inside the liquid, till it touches the inside of the meniscus. Both are then resplaced on their near-horizontal stands.
By the way, the stevenson screen pictured is in an atrocious condition, and the surrounds are far out of compliance from the WMO specs which require, from memory, ‘100 feet of surrounding mown but not watered grass, and no close concrete or other paved surface’ or similar …
This alone could surely give a degree or so temp enhancement, and I suspect that this sort of deterioration over time from site spec has on average added somewhat to the global average *as recorded* to give a secular trend which is really just reflecting the increasing crappiness of the average met station over time…
Not helped by the gradual secular transition to electronic sensors, cos they apparently are almost always within spitting distance of a (!!!) building…. thus giving another time trend pushing up the average.

Mosher has it summed up pretty well. The overall error for a single instrument would impact local records as has been seen quite a few times. As far as the global temperature record, not so much. Bias is the major concern when instrumentation is changed at a site or the site relocated.
Adjustments to the temperature record are more a problem for me. UHI adjustment is pretty much a joke. I still have a problem with the magnitude of the TOBS. It makes sense in trying to compare absolute temperatures (where the various errors do count) but not so much where anomalies are used for the global average temperature record. Perhaps Mosh would revived the TOBS adjustment debate.

Ben D.

Something that is interesting here,
Although error does tend to “wash out” when using large amounts of data, this still means that the error bands will tend to be large. The larger impact is in fact human caused in the actual measuring. To say that you can detect noise within the noise of measurements such as these boggles the mind. The noise in this case would be the human impact on the climate, which will be indectable from other causes over time. You simply can not torture the data enough to get that signal so to speak.
But the largest issue is that even with the understanding of error here, many people do not understand the actual implications. The odds are that the temperature increase of 0.7C is correct (this is the actual observed temperature..) but the fact that the error is say +- 1.3 C does say something that we should be aware of.
It means that above all else, we have probably seen warming which few people doubt. Whether this is natural or man-caused is the actual debate now, and this result seriously puts some damper into the assertion that it is man-caused since with such large error bands you can not be as sure on the trends over smaller periods….and since the signal comes from shorter trends as a rule, this means that over short time periods you will be able to get a very accurate trend, but whether its noise or caused by humans…? No chance.
The implications are somewhat large in that sense. If you run the models by randomly adding “possible” error and over many reproductions of the models, they should show that if the CO2 signal exists, that the measuring error wouldn’t matter. To put that into practice would involve randomly changing observed temperatures up and down somewhat and fine-tuning the GCM’s based on that new assumption.
This is something that the GCM’s are weak on since they use temperatures to fine-tune themselves (on other climate variables ranging from solar influences on down…) and to my knowledge the actual error in the instruments has not been calculated in model runs to this date. This is an issue, which does bear some exploring.
Overall, very good article, although I question the larger possible error and I would hazzard to guess that 1.3C is the actual limit of the error since we would assume that the observer bias and C/F computation has been considered (its fairly obvious and although I do find most climate scientists to be mostly incompetent, I would find it hard to believe that they didn’t figure this one out. The actual error from the instruments would also be obvious, but shucks, its something I can see them over-looking as they simply assume it washes out so to speak.
The fact that so much error is possible does bear a large study in itself. If we could make the temperature record more accurate, it would help a lot in our studying of the climate overall.

Solomon Green

Steven Mosher
“The other thing that is instructive is to compare two thermometers that are within a few km of each other.. over the period of say 100 years. Look at the corellation.
98% plus.”
I never tried it over a period of 100 years but over about 100 days, there was not much correlation between the trend in the max/min readings from the thermometer at my school and that at my club. But perhaps 20 kms is more than “a few”. Or am I cheating because one was in the desert and the other in an urban area?


When I was at school, I recorded the temperatures for years from the school’s Stevenson screen. We were a sub-recording station for the local RAF base, so I assume it was set up correctly. We had 4 thermometers inside, a high precision Tmax, which was mercury, a high precision Tmin which was alcohol, and a mid precision wet & dry. The Tmax and Tmin were angled at about 20 degrees, with the bulbs at the bottom of the slope and the wet & dry was vertical. Two things strike me about the photos of the screen. The thermometers are fixed directly onto a piece of horizontal wood so there is no free airflow round the bulbs. Standing the screen over a gravestone must make the nightime minimum temperature totally innacurate as it will be affected by re-radiated heat from the block of stone below, the warmth rising straight upwards towards the box in the cool night air. At our school site, the screen was located in a fenced off, grassed area which the school gardener was told not to cut. The grass was quite long there, and contained a final thermometer, the grass minimum, which was horizontal on two small supports such that the bulb just touched the blades of grass. This often used to record ground frosts which didn’t appear as air frosts.
An 8 inch rain gauge and a Fortin barometer completed the equipment – we also recorded cloud cover and type, and estimated cloud base. We used to record all this daily, draw up graphs and charts, and work out the humidity from the wet & dry readings, using a book of tables. It was a very good grounding in Physics, Physical Geography, and the methods of recording and presenting data over a long period. Many, many years before computers!!


I second the request for a pdf document. It would be great in my file.
This post highlights and confirms with numbers something I have believed for a long time. How does this fit with the past discussions of the fact that temperature recording at airports has changed (the M for minus thing)? More and more errors introduced and unaccounted for, I suppose.

Frank K.

In the 1960s I used to wander round the UK with a team engaged in commissioning turbine-generator units before they were handed over to the CEGB.
We got to one power station where the oil return drains from the turbine-generator bearings were fitted with dial thermometers (complete with alarm contacts) and witnessed yet another round in the endless battle between the “sparks” and “hiss & piss” departments.
The electrical engineers in the generator department at head office had specified temperature scales in degrees C, whereas the mechanical engineers in the turbine department (on the floor below) had specified scales in degrees F.
How we laughed (when the CEGB guys were not looking).

Mike G

Another thing ignored, or never even thought of, is the response time of modern electronic sensors compared to liquid in glass thermometers. A wind shift resulting in a sudden movement of warm air towards the temperature station, say warm air from the airport tarmac, would register a lower peak temperature on a glass thermometer than it would on a modern electronic sensor. This biases the Tmax upwards with modern instrumentation compared with what would have been measured by past instrumentation.


I am appalled to see the claim that the Climate Scientists do not seem to have include any of these basic metrology considerations in their work. And, to give significance to results that have a resolution greater than the basic resolution of the measuring process, is not even wrong ( to borrow a phrase).
If you look at just one aspect of the chain of electronic temperature recording , that of the quantisation of the reading of the analog temperature element; there are potentially serious sources of error which must be understood by any serious user of such systems.
Any electonics engineer will confirm that analog-to-digital conversion devices are prone to a host of potential error-sources … ( try googling for them; you will be amazed ….)
They are very difficult to calibrate across their range, and also across the range of environmental variations in order to check that the various compensations remain within spec.
I would add that there is an insidious problem in integrative averaging of many readings, which comes from the non-linearities in each individual measurement system.
If you use the ‘first-diference’ method to get a long term average ‘trend’ then non-linearity will cause a continual upward (or downward) drift in the result. The problem worsens with more readings . For example, using the ‘average’ from a series of weekly readings ….. and comparing it with the ‘average’ from daily reading will reveal thsi source of drift.
And when we come to the Mauna Loa CO2 series …we have another area where the claimed ‘results’ of the manipulation of the instrument readings are given in PPM with decimal places! ( How DO you get 0.1 of a part?)
Anyway, as I understand it, the measurement and reporting of atmospheric CO2 is dominated by a single linear process, from the production and testing of the calibration gases through to the analysis of the average of the results. I should like to see a thorough critical analysis of every stage of this process, to ensure that we are not looking at an artefactual result.
Go, Michael!

Steve in SC

Being that this subject is sort of in my wheelhouse, I would like to add a few items to the uncertainty picture.
Calibration of mercury/alcohol thermometers:
The 0 degree C (ice point) is somewhat dependent on the purity of the water.
Regular tap water from a municipal supply can throw the ice point off by as much as 2 degrees C with the presence of naturally occurring salts/minerals. The same is true to a lesser extent with the boiling point. Also the air (barometric) pressure has a marked effect on the boiling point.
The accuracy and stability of electronic temperature measurement devices is largely dependent on the purity of the components involved as well as the metallurgical chemistry involved. Oxidation and Nitrogen embrittlement are all factors over time on metal based devices.
You have basically three classes of temperature measurement contrivances.
Those that rely on the coefficient of thermal expansion.
Mercury/alcohol thermometers and bi-metal dial thermometers are examples of CTE devices.
Those that generate an EMF due to temperature differences.
Thermocouples are the best examples of these.
Those that vary resistance with temperature.
Thermisters and RTD are examples of these.
Then there are the electronic types that rely on radiation/optics.
These are non contact and are dependent on the emissivity of the object whose temperature is to be measured. (it is a fourth so sue me) Various techniques are used such as thermopiles and photo detectors. These are generally not as accurate as direct contact devices. Your handy dandy satellites use variants of this technique.
About the only stable temperature point available to calibrate anything with is the freezing point of gold. It is stable because of its chemistry. Gold does not readily combine chemically with anything.
You can obtain remarkable resolution from almost all of the contrivances, all it takes is large sums of money, time, and pathological attention to detail.
Errors for Thermocouples are + or – 1.5 C.
Errors for RTDs are + or – 1 C
As stated errors for standard thermometers are + or – 1 C but can be as large as 3 or 4 C depending on factors.
Regarding climatology, the error of temperature measurement is somewhat cumulative so that over time the uncertainty levels should increase. This is of course ignored by the climate community. That and the ludicrous claim that they can reconstruct temperature to within 0.1 C is an indication that they do not know what they are talking about and are fumbling around in the dark.
just my $.02 worth which may not be much due to inflation.

Steven Mosher says:
January 22, 2011 at 3:23 am
The point that you have missed here Steve Mosher is that the margin of error is practically twice the claimed “global warming signal” of 0.7º C. Add in some biased agenda driven human homogenisation and what have you got?
The oldest temperature record in the world is only 352 years old. Based on the Central England Temperature record, over the last 15 years there has been a cooling trend:
So Steve Mosher’s point sounds all well and good until one actually examines it more closely, at which point it becomes utterly meaningless. The reason it becomes meaningless is entirely because the so “global waring signal” is half the margin of error in the “official” data. It makes no difference which trend you prefer, the warming or the cooling, they are both meaningless because they are both approximately half the margin of error. So the whole temperature issue is a red herring.
It is an unprovable and un-winnable faux debate that serves the “warmist’s” and the “gatekeepers” both, by keeping everyone distracted from the real issue, CO2.

D. J. Hawkins

Jit says:
January 22, 2011 at 2:31 am
As long as the errors don’t trend in a biased way over time, the fact that there are thousands of sensors should make standard errors small (variance divided by square root no. of observations).
Of course, one biased sensor in the middle of nowhere would have a disproportionate effect – although I’m not clear how the interpolations are done

I have seen this line of reasoning before, and I belive this is an incorrect application of the statistics of large numbers. If you had 100 thermometers measuring temperature in the same small area at the same time you would be correct. This is how the satellite measurements of sea level can get millimeter accuracy with only 3 cm resolution. They take tens of thousands of measurements of the same area in a short space of time. A time series of measurements of a single thermometer in one location doesn’t, I believe, meet the criteria for this statistical method.


In an engineering sense….. The Climate scientist’s measurement regime is fine if one was cutting 2×4’s and plywood for shelves in the garage….. You definately wouldn’t want these guys designing an air frame for a supersonic jet fighter!……:-o
….. and to be honest, I think they knocked together the climate science club house that they alone are playing in. All that’s missing is the hand painted, “NO GURLS” sign….(that’d be us skeptics)…..;-)


Thanks for posting this Anthony. I would like to see a lot more posts on this topic. I do not agree with Mr Mosher that Result? no difference. If we accepted his dismissive argument about the lack of importance of calibration of the instruments (all of them), then why do other disciplines make a big fuss about correct readings? Is it not important when we are talking about only .7 deg per century, and the economies of many countries being trashed for the sake of that? If instrument calibration adds a degree or two to the “noise”, then the .7 is meaningless.

Wayne Delbeke

John Marshall says:
January 22, 2011 at 1:35 am
So in reality the 0.6C feared temperature rise could mean that statistically there has been no temperature change. And all models give the wrong answer because temperature inputs are incorrect.
Very interesting article is a pdf copy possible please Anthony?
Frank – no need to bother Anthony. Just copy and paste into your word processor and export or save as a pdf. Use MS Word, Word Perfect, Open Source or whatever share ware program you want, they pretty well all do that. I don’t know about copyright.
I remember having thermometer correction sheets in the labs when calibrating thermometers and old steel survey tapes as measurements all had to be adjusted for temperature, – even temperature. 😉 Even modern electronic distance measuring devices have a temperature bias that needs to be adjusted although newer multifrequency devices have self correcting circuits but still: “Need to know change in elevation of two points (slope correction), air temperature, atmospheric pressure, water vapor amount in air”… –all can have an effect on the measurement.
In other words, EVERYTHING requires adjustments to correct for site conditions. All instruments and observers have built in biases and inaccuracies and NOTHING is absolute.

Ed Caryl

It seems to me that there are more positive biases than negative. Consider, glass hardening will always increase over time. At the beginning of a temperature record, the thermometers are new. 100 years later, most of them are old to very old, and are reading high by 0.7 degrees. The enclosures do the same thing. At the beginning all are shiny and new. After 100 years most are in bad shape. Even if they have been repainted, it was with modern paint, not the old white-wash, which was more reflective. You need not consider UHI problems, or siting difficulties to explain all the temperature rise seen over time.

Dave Springer

This doesn’t indict the temperature record.
Accuracy of thermometers matters hardly at all because the acquired data in absolute degrees is used to generate data which is change over time. If a thermometer or observer is off by 10 whole degrees it won’t matter so long as the error is consistently 10 degrees day afer day – the change over time will still be accurate.
Precision is a similar story. There would have to be a bias that changes over the years that somehow makes the thermometers or observers record an ever growing higher temperature as the years go by. Urban heat islands are perfect for that but instrument/recording error just doesn’t work that way. There are thousands of instruments in the network each being replaced at random intervals so the error from age/drift is averaged out because there is an equal distribution of old and new instruments.
This might be interesting in an academic way but isn’t productive in falsifying the CAGW hypothesis. The instrumentation and observation methods are adequate and trying to paint them as less than adequate only appears to be an act of desperation – if the job is botched blaming the tools is no excuse.

John McManus

So; the old trees make lousy thermometers is now thermometers make lousy thermometers.


Another problem comes from taking the average temperature to be halfway between Tmin and Tmax. This may well be the case if temperatures rise and fall in a cyclical fashion in a 24-hour period. However, an anomalous event, such as hot aircraft exhaust blowing in the direction of the station, can increase Tmax considerably, thereby creating a considerable error in the ‘average’. And because such ‘hot’ anomalies are more likely than ‘cold’ ones, and also increasingly likely over time, the bias is likely upwards.

Steve Keohane

xyzlatin says: January 22, 2011 at 6:57 am
[…] If instrument calibration adds a degree or two to the “noise”, then the .7 is meaningless.

Many good points above, I think this sentence says it all. Trying to parse fractions of a degree change from the current system is not possible. Temperature is very difficult to measure accurately. I know, I successfully measured and controlled temperatures in an IC process to +/-0.1°F in the eighties when linewidths went sub-micron.

Dave Springer

Steven Mosher says:
January 22, 2011 at 3:23 am

Result? the error structure of individual measures doesnt impact your estimation of the long term trends.
NOW, if many thermomemters all has BIASES ( not uncertainty) and if those biases were skewed hot or cold, and if those biases changed over time, then your trend estimation would get impacted
Result? no difference.

Absolutely right, Steve.
Skeptics are no better than CAGW alarmists in their willingness to believe anything which supports their own beliefs or disputes the beliefs of the other side. It’s sad. Objectivity is a rare and precious commodity.


Great bunch of real data in one post… First time I ever heard of the following:
[…]”However, a glass thermometer at 100c is longer than a thermometer at 0c.”
The 5 degree cold record set in International Falls (posted below), pretty much eliminates the problem of the thermometer’s resolution.
Look for records like this in the future, burr.
We need a NEW name for the next minimum, we are moving into.
My vote is for “David Archibald Minimum”.
Well maybe not, having one’s name attached to miserable weather, may not be the best way to be remembered.


Steven Mosher,
You say,
“NOW, if many thermomemters all has BIASES ( not uncertainty) and if those biases were skewed hot or cold, and if those biases changed over time, then your trend estimation would get impacted”
Could you confirm that this is what you meant?
(And, I would be grateful if you could comment on the effect of transducer non-linearities, too)

Ken Lydell

This is why climatologists work with anomalies rather than estimates of absolute temperature. And they do so on long time scales using lots of instruments. Instrument error is of interest only if a general bias becomes significantly greater over time.

Alfred Burdett

Has anyone investigated the possible impact of observer preconception bias?
In particular, does positive bias in temperature readings rise and fall with belief in AGW?
Would this not be a worthy topic for investigation.


John McManus:

So; the old trees make lousy thermometers is now thermometers make lousy thermometers.

They do, when you’re trying to measure fractions of a degree change over a period of decades to centuries.
The fact that tree rings etc make such lousy temperature proxies is probably due to the fact that nothing in nature exhibits any great sensitivity to small temperature changes – especially to warming changes. Most plants and animals do better with warmer temperatures. So if nature isn’t particularly perturbed by small temp increases, why should we be?