Adjusting Pristine Data

by John Goetz

On September 15, 2008, Anthony DePalma of the New York Times wrote an article about the Mohonk Lakes USHCN weather station titled Weather History Offers Insight Into Global Warming. This article claimed, in part, that the average annual temperature has risen 2.7 degrees in 112 years at this station. What struck me about the article was the rather quaint description of the manner in which temperatures are recorded, which I have excerpted here (emphasis mine):

Mr. Huth opened the weather station, a louvered box about the size of a suitcase, and leaned in. He checked the high and low temperatures of the day on a pair of official Weather Service thermometers and then manually reset them…

If the procedure seems old-fashioned, that is just as it is intended. The temperatures that Mr. Huth recorded that day were the 41,152nd daily readings at this station, each taken exactly the same way. “Sometimes it feels like I’ve done most of them myself,” said Mr. Huth, who is one of only five people to have served as official weather observer at this station since the first reading was taken on Jan. 1, 1896.

That extremely limited number of observers greatly enhances the reliability, and therefore the value, of the data. Other weather stations have operated longer, but few match Mohonk’s consistency and reliability. “The quality of their observations is second to none on a number of counts,” said Raymond G. O’Keefe, a meteorologist at the National Weather Service office in Albany. “They’re very precise, they keep great records and they’ve done it for a very long time.”

Mohonk’s data stands apart from that of most other cooperative weather observers in other respects as well. The station has never been moved, and the resort, along with the area immediately surrounding the box, has hardly changed over time.

Clearly the data collected at this site is of the highest quality. Five observers committed to their work. No station moves. No equipment changes according to Mr. Huth (in contrast to the NOAA MMS records). Attention to detail unparalleled elsewhere. A truly Norman Rockwell image of dedication.

After reading the article, I wondered what happened to Mr. Huth’s data, and the data collected by the four observers who preceded him. What I learned is that NOAA doesn’t quite trust the data meticulously collected by Mr. Huth and his predecessors. Neither does GISS trust the data NOAA hands it. Following is a description of what is done with the data.

Let’s begin with the process of getting the data to NOAA:

From Co-op to NOAA

Mr. Huth and other observers like him record their data in a “B91 Form”, which is submitted to NOAA every month. These forms can be downloaded for free from the NOAA website. Current B91 forms show the day’s minimum and maximum temperature as well as the time of observation. Older records often include multiple readings of temperature throughout the day. The month’s record of daily temperatures is added to each station’s historical record of daily temperatures, which can be downloaded from NOAA’s FTP site here.

The B91 form for Mohonk Lake is hand-written, and temperatures are recorded in Farenheit. Transcribing the data to the electronic daily record introduces an opportunity for error, but I spot-checked a number of B91 forms – converting degrees F to tenths of degree C – and found no errors. Kudos to the NOAA transcriptionists.

Next comes the first phase of NOAA adjustments.

NOAA to USHCN (part I) and GHCN

The pristine data from Mohonk Lake are subject to a number of quality control and homogeneity testing and adjustment procedures. First, data is checked against a number of quality control tests, primarily to eliminate gross transcription errors. Next, monthly averages are calculated from the TMIN and TMAX values. This is straightforward when both values exist for all days in a month, but in the case of Mohonk Lake there are a number of months early in the record with several missing TMIN and/or TMAX values. Nevertheless, NOAA seems capable of creating an average temperature for many of those months. The result is referred to as the “Areal data”.

The Areal data are stored in a file called hcn_doe_mean_data, which can be found here. Even though the daily data files are updated frequently, hcn_doe_mean_data has not been updated in nearly a year. The Areal data also seem to be stored in the GHCN v2.mean file, which can be found here on NOAA’s FTP site. This is the case for Mohonk Lake.

Of course, more NOAA adjustments are needed.

USCHN (part II and III)

The Areal data is adjusted for time of observation and stored as a seperate entry in hcn_doe_mean_data. TOB adjustment is briefly described here. Following the TOB adjustment, the series is tested for homogeneity. This procedure evaluates non-climatic discontinuities (artificial changepoints) in a station’s temperature caused by random changes to a station such as equipment relocations and changes. The version 2 algorithm looks at up to 40 highly-correlated series from nearby stations. The result of this homogenization is then passed on to FILNET which creates estimates for missing data. The output of FILNET is stored as a seperate entry in hcn_doe_mean_data.

Now GISS wants to use the data,  but the NOAA adjustments are not quite what they are looking for. So what do they do? They estimate the NOAA adjustments and back them out!

USHCN and GHCN to GISS

GISS now takes both v2.mean and hcn_doe_mean_data, and lops off any record before 1880. GISS will also look at only the FILNET data from hcn_doe_mean_data. Temperatures in F are converted and scaled to 0.1C.

This is where things get bizarre.

For each of the twelve months in a calendar year, GISS looks at the ten most recent years in common between the two data sets. For each month in those ten most recent years it takes the difference between the FILNET temperature and the v2.mean temperature, and averages them. Then, GISS goes through the entire FILNET record and subtracts the monthly offset from each monthly temperature.

It appears to me that what GISS is attempting to do is remove the corrections done by NOAA from the USHCN data. Standing back to look at the forest through the trees, GISS appears to be trying to recreate the Areal data, failing to recognize that v2.mean is the Areal data, and that hcn_doe_mean_data also contains the Areal data.

Here is a plot of the difference between the monthly raw data from Mohonk Lake and the data GISS creates in GISTEMP STEP0 (yes, I am well aware that in this case it appears the GISS process slightly cools the record). Units on the left are 0.1C.

Even supposedly pristine data cannot escape the adjustment process.

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

104 Comments
Inline Feedbacks
View all comments
Admin
September 23, 2008 7:54 pm

Mr. Goetz has exceeded himself, let’s give him a round of applause! – Anthony

September 23, 2008 7:55 pm

Lord Kelvin weeps.

evanjones
Editor
September 23, 2008 7:59 pm

Yes, very good.
(And why on earth doesn’t GISS simply adjust NOAA raw data?)

George M
September 23, 2008 8:11 pm

That is a wonderful piece of detective work, showing that these climatic bit-heads can’t leave even the purest data alone, but it leaves several loose ends. I believe that in the original post, a copy of a form was shown which indicates the presence of an MMTS. Which data is so thoroughly massaged, MMTS or the Min/Max? If, indeed, Mr. Huth’s data is still used, what happens to the MMTS data? I would not expect it to be there if it is not used for something.
Also, I hope Mr. Goetz has sent a copy of this by registered mail to Anthony DePalma of the New York Times, pointing out how futile Mr. Huth and his predecessors’ work has been.
[Reply by John Goetz: There are several interesting mismatches between the B91 and NOAA MMS data, in addition to the MMTS issue. For one thing, Mr. Huth told the NYT that he recorded the temperatures at around 4:00 PM every day. However, every B91 I looked at signed by Mr. Huth indicated the time of observation was 5:00 PM. Maybe a nit, but hardly pristine.]

Dave Dodd
September 23, 2008 8:21 pm

Will someone explain to me (and perhaps other lurking newbies) whether “a pair of official Weather Service thermometers” can be read to a granularity of 0.1F? My experience with mercury thermometers leads me to believe that reading “accurately” to 0.1 degree involves a high degree of subjectivity. My old science teacher in HS would have required us to read an order of magnitude greater than that or 0.01 degree and then round back. WIll someone please enlighen me?
[Reply by John Goetz: They are read to an accuracy of 1 degree F, but the conversion process to C is where 0.1 C “accuracy” comes into play. I have yet to see a B91 with a temperature recorded in anything other than full degrees. Not proof, of course, that they don’t exist.]
REPLY by Anthony: John, you are close but not quite correct. The thermometer reading is read in 0.1F resolution, but then rounded to the nearest degree F at the time the observer makes the reading and writes it down on the B91 form.

Jeff B.
September 23, 2008 8:24 pm

(And why on earth doesn’t GISS simply adjust NOAA raw data?)
Well that would give Hansen less opportunity to hide, and then find warming.

dearieme
September 23, 2008 8:46 pm

They were nincompoops before they were crooks.

Jeff Alberts
September 23, 2008 8:57 pm

REPLY by Anthony: John, you are close but not quite correct. The thermomter reading is read in 0.1F resolution, but then rounded to the nearest degree F at the time the observer makes the reading and writes it down on the B91 form.

Which means the margin of error is at least as much as the purported warming of the late 20th century. Yup, high quality.
Pardon me while I go puke.

Dave Dodd
September 23, 2008 9:07 pm

Read to 0.1F and record as an integer — GREAT! Simple math rules! My old science teacher will rest peacefully! However, if your B91 data are integers and you average 10,000 integers you still get an integer as the end result, even if you change to a different metric system! The oft-cited AGW temperature rise of 0.7C/century is mathmatically incorrect, is it not? One often sees temps cited to 0.01 degree for data sets presumably from the nineteenth century. Somebody’s finagling!

Mike C
September 23, 2008 9:53 pm

John,
I’ve seen this before. The V2 file you are using is mislabeled. It is actually late V1 stuff… you get TOB, Homogeneity and Filnet adjustments, except the Homogeneity adjustment is actually the version 1 Homogeneity adjustment (SHAP, for documented discontinuities and Karl et al 1988 for urbanization effects). Since this station had no documented discontinuities, there should be no changes for SHAP. Then they employ Karl et al 1988 to adjust for urbanization, which is basically averaging the local USHCN stations.
Since Hansen uses his own urbanization scheme (night lights), he subtracts the Karl et al urbanization adjustment then applies his nightlight adjustment, which for this station will be no adjustment because this is a lights = zero station.
So let’s try to untangle here a little:
Raw Data
plus
TOB
plus
SHAP aka high frequency variation (actually a value of zero)
Plus
Karl et al 1988 (urbanization) aka low frequency variation
Plus
Filnet
Then Hansen takes over;
Minus
Karl et al
Plus
Hansen Night Lights adjustment (actually a value of zero)
Hansen can only use the V 1 homogeneity adjustment scheme because it separates the SHAP and Urbanization adjustments.
You see, the USHCN V2.0 is Hansen’s little problem because the V2.0 Homogeneity adjustment is sold by the NCDC as adjusting for both high and low frequency variation (adjustments for station discontinuities AND urbanization). Hansen cannot subtract any urbanization adjustment in V2.0 then add his night-lights scheme.
Now, here is where it gets really, really funky: Go to the KBSF home page here:
http://home.earthlink.net/~ponderthemaunder/index.html
flip down to the story about how NCDC wants Urban Heat Islands in the USHCN record. Then click on the link to Claud Williams’ powerpoint presentation at the AMS, Jan 2006 and tell me if you figured out why neither NCDC nor Hansen have properly adjusted schemes.

evanjones
Editor
September 23, 2008 9:53 pm

Well, with oversampling I guess you can fine it down, but I do think the 0.1 claim is a bit tight.

Bobby Lane
September 23, 2008 9:59 pm

Are you kidding me? Back and forth and back and forth conversions? Estimates? Algorithms? Averages? I’m terrible at math myself, but this just screams for the existence of room for small inaccuracies that will add up overtime to big changes. I mean, if each day was adjusted up one one-thousanth of a degree over the period of a year you get nearly 4 tenths of a degree rise (.001 x 365 = .365) in that time span. No doubt real warming is taking place at times because of the LIA recovery, but such an addition continuing over time becomes rather extreme. This process reminds me of a game of Telephone. What was said in the beginning and what comes out in the end is bound, by accident or purpose (or both), to be different. Talk about parallel dimensions!

Editor
September 23, 2008 10:35 pm

So, from the news before the “Hansen Returns” visit to Congress where “James Hansen, one of the world’s leading climate scientists, will today call for the chief executives of large fossil fuel companies to be put on trial for high crimes against humanity and nature”
It seems to me this poor, defenseless Mohonk data that never harmed no one has had barely speakable but high crimes committed against algorithms and nature. It may not be a martyr, but perhaps it can be a poster child.
So, do we have the raw Mohonk data? How about a graph?

JFA in Montreal
September 23, 2008 10:37 pm

And kudos for your choice of picture on that post.
Very à propos ! 🙂

crosspatch
September 23, 2008 10:39 pm

So if I take 100 readings with recorded temperatures in whole degrees, add them together and divide by 100, that give me a number to a precision of two decimal places! Every year the data gets more “accurate”!
/sarcasm

Eric Anderson
September 23, 2008 10:47 pm

What Jeff Alberts said.

September 23, 2008 11:16 pm

As I recall from my 1960’s non-digital-era geodesy course, you can get accuracy greater than your precision with a sufficient number of measurements, but this is just the opposite, rounding to 0.5 degrees when your precision is 0.05 degree. Not quite ‘measure with a micrometer, cut with a chainsaw,’ but I suppose it met the requirements of the time.
Obviously back in 1896 they didn’t know that mere tenths of a degree could spell life or death for a penguin, polar bear, or buckeye tree.

Demesure
September 23, 2008 11:34 pm

Very nice presentation John Goetz, thank you.
“The quality of their observations is second to none on a number of counts,”
The observers should look at what the GISS has done to their “second to none” Mohonk lake data: no monthly data in the online database since 2007 (filled with 9999) !
2007 999.9 -6.8 1.4 6.8 16.0 999.9 999.9 999.9 999.9 999.9 999.9 999.9

Bobby Lane
September 24, 2008 12:01 am

I need a little help with something. I am reading elsewhere (i.e., not on this site) but I cannot make sense of a certain statement. Regarding the Eocene period, it is stated that:
“At the same time, however, equatorial temperatures were found to be about 4K colder than at present.”
I thought that might be four thousand at first, but then I remembered I was dealing with temperature. So I assumed the K is degrees Kelvin. Well, I googled a converter so I could find out what that meant in terms I could understand (Fahrenheit). It converted 4K to -452.5. But somehow reading the statment that “equatorial temperatures were found to be about 453F cooler than at present” is a bit difficult to stomach. Am I making a methodological error?
Here is the paragraph in that paper from which it comes, which is interesting reading in itself.
“In the first example, the original data analysis for the Eocene (Shackleton and Boersma, 1981) showed the polar regions to have been so much warmer than the present that a type of alligator existed on Spitzbergen as did florae and fauna in Minnesota that could not have survived frosts.
At the same time, however, equatorial temperatures were found to be about 4K colder than at present. The first attempts to simulate the Eocene (Barron, 1987) assumed that the warming would be due to high levels of CO2, and using a climate GCM (General Circulation Model), he
obtained relatively uniform warming at all latitudes, with the meridional gradients remaining much as they are today. This behavior continues to be the case with current GCMs (Huber, 2008). As a result, paleoclimatologists have devoted much effort to ‘correcting’ their data, but,
until very recently, they were unable to bring temperatures at the equator higher than today’s (Schrag, 1999, Pearson et al, 2000). However, the latest paper (Huber, 2008) suggests that the equatorial data no longer constrains equatorial temperatures at all, and any values may have existed. All of this is quite remarkable since there is now evidence that current meridional
distributions of temperature depend critically on the presence of ice, and that the model behavior results from improper tuning wherein present distributions remain even when ice is absent.” (from page 10)

Jon
September 24, 2008 12:07 am

A question came to mind: how does TOBS handle DST?

Bobby Lane
September 24, 2008 12:08 am

Nevermind, I got it. I put in, say 98 degrees F, converted it to K, subtracted 4 degrees from the K result, and converted it back to F. That makes more sense. The point was just that the equatorial regions were cooler than at present (which was stated) but not necessarily cold (which is what my first, and incorrect, calculation made me think).

Jan RH
September 24, 2008 1:40 am

John G. states: “yes, I am well aware that in this case it appears the GISS process slightly cools the record.”
But, correct me if I’m wrong, doesn’t the fact that Mohonk-GISS temp’s are going from positive to negative just mean that GISS temp’s are getting comparatively higher as time goes by?
Which means that the GISS process produces warming in the record.

Leon Brozyna
September 24, 2008 3:03 am

The words that come to mind — Rube Goldberg.

MattN
September 24, 2008 3:09 am

“This article claimed, in part, that the average annual temperature has risen 2.7 degrees in 112 years at this station.”
And 2.7 degrees appears to be exactly the amount of the overall adjustment since 1896….
Reply – However, surrounding stations don’t show that amount of increase and some even show a decrease. See Calling All Climate Sleuths for an example. – Dee Norris

September 24, 2008 3:31 am

It all started with the 2.7 Fahrenheit (?) per hunderd years increase in the Mohonk Lakes data. Are the pristine data also giving that increase? How did NOAA change the increase, and, finally, what did GISS procedures do to it?

1 2 3 5
Verified by MonsterInsights