Human error in the surface temperature record

Guest essay by John Goetz

As noted in an earlier post, the monthly raw averages for USHCN data are calculated with up to nine days are missing from the daily records. Those monthly averages are usually not discarded by the USHCN quality control and adjustment models, although the final values are almost always estimated as a result of that process.

The daily USHCN temperature record collected by NCDC contains daily maximum (TMAX) and minimum (TMIN) temperatures for each station in the network (ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/hcn/). In some cases, measurements for a particular day were not recorded and are shown as -9999 in either or both the TMAX or TMIN record for that day. In other cases, a measurement was recorded but failed one of a number of quality-control checks.

Quality-Control Checks

I was curious as to how often different quality-control checks failed, so I wrote a program to cull through the daily files to learn more. I happened to have a very small number of USHCN daily records already downloaded for another purpose, so I used them to debug the software.

I quickly noticed that my code was calculating a larger number of consistency check fails from the daily record for Muleshoe, TX than was indicated by the “I” flag in the station’s corresponding USHCN monthly record. The daily record, for example, flagged the minimum value on February 6 and 7, 1929 and the maximum value on February 7 and 8. My code was counting that as three failed days but the monthly raw data for Muleshoe indicated it was two days.

Regardless of how many failures should have been counted, it was clear from the daily record why they were flagged. The minimum temperature for February 6 was higher than the maximum temperature for February 7, which is an impossibility. The same was true for February 7th relative to the 8th.

I noticed there were quite a few errors like this in the Muleshoe daily record, spanning many years. I wondered how the station observer(s) could make such a mistake repeatedly. It was time to turn to the B-91 observation form to see if it could shed any light on the matter.

Transcription Errors

The B-91 form obtained from http://www.ncdc.noaa.gov/IPS/coop/coop.html is linked below. After converting the temperatures to Celsius the problem became apparent. The first temperature (43) appears to have been scratched out. The last temperature in that column (39) has a faint arrow pointing to it from a lower line labelled “1*”. The “*” is a note that states “Enter maximum temperature of first day of following month”.

February 1929 B-91 for Muleshoe, TX

It appeared that whoever transcribed this manual record into electronic form thought that the observer intended to scratch out the first temperature and replace it with the one below, and thus shifted the maximum values up one day for the entire month.

Muleshoe

To determine the observer’s intent, the B-91 for March, 1929 was examined to see if the first maximum temperature was 39, as indicated by the “1*” line on the February form. Not only was the first maximum temperature 39, it appeared to be scratched out with the same marking. Although the scratch marking appeared on the March form, that record was transcribed correctly. A quick check of the January, 1929 B-91 showed the same scratch marks over the first temperature.

March 1929 B-91 for Muleshoe, TX

January 1929 B-91 for Muleshoe, TX

The scratch marks appear in other forms as well. October, 1941 looked interesting because both of the failed quality checks were not due to an obvious reason. The flagged temperatures were not unusual for that time of year or relative to the temperatures the day before and after. Upon opening the B-91, the same “scratch out” artifact was visible over the first maximum temperature entry! Sure enough, the maximum temperatures were shifted in the same manner as February, 1929. As a result, two colder days were discarded from the average temperature calculation.

October 1941 B-91 for Muleshoe, TX

Because the markings were similar, it appeared they were transferred to multiple forms when they lay piled in a stack, probably because the forms were carbon copies. This likely would have happened after they were submitted, because on the 1941 form the observer did scratch out temperatures and was clear where the replacements were written.

Impact of the Errors

In addition to one incorrect maximum temperature, the full three days flagged as failing the quality check were not used to calculate the monthly average. The unadjusted average reflected in the electronic record was 0.8C whereas the paper record was 0.24C, just over half a degree cooler. The time of observation estimate was 1.41C. The homogenization model decided that a monthly value could not be computed from daily data and discarded it. It infilled the month instead, replacing the value with an estimate of 0.12C computed using values from surrounding stations. While that was not a bad estimate, the question is would it have been 0.12C if the transcription had been correct? Furthermore, because the month was infilled, GHCN did not include it.

In the case of January, 1941, the unadjusted average reflected in the electronic record was 2.56C whereas the paper record was 2.44C. The TOB model estimated the average as 3.05C. Homogenization estimated the temperature at 2.65C. That was was retained by GHCN.

Discussion

Only recently have we had the ability to collect and report climate data automatically, without the intervention of humans. Much of the temperature record we have was collected and reported manually. When humans are involved, errors can and do occur. I was actually impressed with the records I saw from Muleshoe because the observers corrected errors and noted observation times that were outside the norm at the station. My impression was that the observers at that station tried to be as accurate as possible. I have looked through B-91 forms at other stations where no such corrections or notations were made. Some of those stations were located at people’s homes. Is it reasonable to believe that the observers never missed a 7 AM observation for any reason, such as a holiday or vacation, for years on end? That they always wrote their observation down the first time correctly?

The observers are just one human component. With respect to Muleshoe, the people who transcribed the record into electronic form clearly misinterpreted what was written, and for good reason. Taken by themselves, the forms appeared to have corrections. The people doing data entry likely did so many years ago with no training as to what common errors might occur in the record or the transcription process.

But the transcribers did make mistakes. In other records I have seen digits transposed. While transposing a 27 to a 72 is likely to be caught by a quality control check, transposing a 23 to 32 probably won’t be caught. Incorrectly entering 20 instead of -20 can get a whole month’s worth of useful data tossed out by the automatic checkers. That data could be salvaged by a thorough re-examination of the paper record.

Now expand that to the rest of the world. I think we have done as good a job as could be expected in this country, but it is not perfect. Can we say the same about the rest of the world? I’ve seen a multitude of justification for the adjustments made to the US data, but a lack of explanation as to why the rest of the world is adjusted half as frequently.

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

124 Comments
Inline Feedbacks
View all comments
Michael Moon
September 29, 2015 8:43 am

This is an essentially pointless exercise. Clearly the signal-to-noise ratios of historical temperature records are inadequate to discover if temperatures worldwide or even in the USA have changed significantly, much less whether any such change was due to human actions. Error after error at every stage in multiple processes renders the “data,” raw or otherwise, unfit for purpose. Those who report historical temperatures and compare them to present temperatures, who FAIL to report the standard error of historical temperatures, are simply misleading the public, know it, and should stop. BEST practice is somehow not entirely the truth. Indeed silk purses are not made from sow’s ears…

Dave in Canmore
September 29, 2015 9:00 am

Speaking from the perspective of someone who collects reams of field data in the private sector, this discussion is mind boggling. The way these data are so carelessly treated through adjustment, homogenization, infilling and nonsensical error estimation, you would think that the data serve no real purpose. To think that trillions of dollars worth of ramifications hang on such flimsy quality control procedures makes my mind spin.
In the private sector world of real consequences, there would be no estimations, no in fillings, no homogenization. Good stations with good data would be selected in various parts of the world and stations with discontinuities or discrepancies would be dropped. Period. If I collect some data in the field with a 3DCQ greater than the clients accepted limit, I don’t get to say like so many apologists here “What do want me to do? throw it out? It’s the best I have!” I don’t have the option to discard the data quality rules and estimate the data location. Instead my data gets tossed because it isn’t accurate enough for the purpose of the client. Period.
I’m left concluding that only someone whose work has no consequences could imagine that these data are accurate enough for the purpose they are being used for. Why else is this so obvious to everyone who works in the real world and so hard to understand for academics and bureaucrats?

richard verney
Reply to  Dave in Canmore
September 29, 2015 10:39 am

I have made similar comments for years.
These weather stations were never intended to perform the function to which they are being put. They are not fit for purpose, and their data is being over extrapolated beyond its capabilities.
If the Climate Scientists wish them to perform the task to which they are now being put, the starting point would be to audit each and every station for its siting, siting issues, station moves, equipment used, screen changes, maintenance to equipment and screen, changes to equipment, record keeping and the approach to accurate record keeping, the length of uninterrupted records etc. etc The good stations could be identified and the poor stations could be thrown out.
Essentially what should have been looked for is the equivalent of USCRN stations but with the longest continuous data records. It may be that we would be left with only 1000 or perhaps only 500 stations worldwide, but better to work with good quality pristine data that requires no or little adjustment than to work with loads and loads of cr*p quality station data and cr*p data needing endless data manipulation/adjustment/homogenisation.
We are now no longer examining the data and seeing what the data tells us, but rather we are simply examining the efficacy of the various adjustments/homogenisation undertaken to that data.
Quite farcical really.

Dave in Canmore
Reply to  richard verney
September 29, 2015 12:34 pm

Exactly! There are billions of government dollars available for study after study but the most basic data QC is abandoned? The gulf between best practices in the Climate Science community and the real world is staggering beyond comprehension.

Matt G
September 29, 2015 11:11 am

The only way can get a true surface data set is by using only the same samples throughout it from start to finish. They are changed all the time so they are always measuring different parts of the planets surface all the time. That has never been a technique that should be used to estimate a massive surface area by just tenths of a degree using a tiny percentage of it. It is impossible to suggest there has been any accuracy in it and the only thing that is close to this ideal are the satellite data sets. Would need a million weather stations on the planet’s surface to even come close to what satellite can measure in the troposphere. Forty four thousand weather stations are roughly one percent of planet’s surface.

Richard M
September 29, 2015 3:24 pm

I always thought it would be interesting to try and verify the station data with some kind of proxy. Wouldn’t it be interesting to see a set of proxy data collected for the US since 1880 compared to the temperature record. Yes, proxy data has it’s own limitations but if enough data was collected that should tend to average out the errors.
I suspect the problem is no one in the government wants to see any attempt to validate the data. Hence, nothing could ever get funded. It would almost take a volunteer group.

Michael Jankowski
Reply to  Richard M
September 29, 2015 5:20 pm

Richard M, you’re quite right. You’d probably be interested in this…
http://climateaudit.org/2005/02/20/bring-the-proxies-up-to-date/
And this “volunteer”” effort…
http://climateaudit.org/2007/10/12/a-little-secret/

Michael G. Chesko
September 29, 2015 4:17 pm

This is in reference to the graph that ralfellis presented above in the comments section.
Wait a minute, are you saying that CO2 is a follower of a temperature trend rather than the cause of a temperature trend?
I’m just a layman, and I often don’t understand all the scientific jargon, but it seems to me that if CO2 is a “negative-feedback temperature regulator” that a solution calling for a reduction in CO2 to save the world has a big problem.
This makes me wonder…
Can anyone (preferably a scientist) answer these four questions:
• If human beings were producing the same amount of CO2 before the last Ice Age that we are today, would the last Ice Age have been averted?
• Is the fact that human beings burn fossil fuels today going to avert another Ice Age?
• If human activity is capable of abnormally warming the Earth, are we capable of abnormally cooling it?
• What human activity would abnormally cool the Earth?

Steve M. from TN
September 29, 2015 4:27 pm

Double counting lows/highs.
Maybe some can explain this to me, but wouldn’t you only get a double high/low only once…the day you switch? As long as you don’t switch anymore, then it shouldn’t be a problem. It seems to me that a single switch of TOBS would be insignificant in the data.

richard verney
Reply to  Steve M. from TN
September 30, 2015 2:57 am

As I see it, apart from rare isolated events, this can only potentially be a significant and repeated problem where the station TOB coincides with the warmest part of the day.
Obviously the temperature profile of every day is slightly different, but in general the warmest time of the day is an hour or so after the sun has reached its peak height that day. Thus the warmest time of the day is usually some time between about 1pm and 3:30pm.
That being the case, every station that has a TOB coinciding broadly with the warmest time of the day should either be disregarded from the data set, or it should be put in a separate bin, and very detailed and careful consideration should be given to its record, with an adjustment being made if necessary.
But I consider the better practice would be to disregard any station if it has a TOB coinciding approximately with the warmest period of the day.
It would be interesting if Zeke, or Mosher would comment on why they do not simply disregard stations that have TOBs around the warmest part of the day.