Adjusting Pristine Data

by John Goetz

On September 15, 2008, Anthony DePalma of the New York Times wrote an article about the Mohonk Lakes USHCN weather station titled Weather History Offers Insight Into Global Warming. This article claimed, in part, that the average annual temperature has risen 2.7 degrees in 112 years at this station. What struck me about the article was the rather quaint description of the manner in which temperatures are recorded, which I have excerpted here (emphasis mine):

Mr. Huth opened the weather station, a louvered box about the size of a suitcase, and leaned in. He checked the high and low temperatures of the day on a pair of official Weather Service thermometers and then manually reset them…

If the procedure seems old-fashioned, that is just as it is intended. The temperatures that Mr. Huth recorded that day were the 41,152nd daily readings at this station, each taken exactly the same way. “Sometimes it feels like I’ve done most of them myself,” said Mr. Huth, who is one of only five people to have served as official weather observer at this station since the first reading was taken on Jan. 1, 1896.

That extremely limited number of observers greatly enhances the reliability, and therefore the value, of the data. Other weather stations have operated longer, but few match Mohonk’s consistency and reliability. “The quality of their observations is second to none on a number of counts,” said Raymond G. O’Keefe, a meteorologist at the National Weather Service office in Albany. “They’re very precise, they keep great records and they’ve done it for a very long time.”

Mohonk’s data stands apart from that of most other cooperative weather observers in other respects as well. The station has never been moved, and the resort, along with the area immediately surrounding the box, has hardly changed over time.

Clearly the data collected at this site is of the highest quality. Five observers committed to their work. No station moves. No equipment changes according to Mr. Huth (in contrast to the NOAA MMS records). Attention to detail unparalleled elsewhere. A truly Norman Rockwell image of dedication.

After reading the article, I wondered what happened to Mr. Huth’s data, and the data collected by the four observers who preceded him. What I learned is that NOAA doesn’t quite trust the data meticulously collected by Mr. Huth and his predecessors. Neither does GISS trust the data NOAA hands it. Following is a description of what is done with the data.

Let’s begin with the process of getting the data to NOAA:

From Co-op to NOAA

Mr. Huth and other observers like him record their data in a “B91 Form”, which is submitted to NOAA every month. These forms can be downloaded for free from the NOAA website. Current B91 forms show the day’s minimum and maximum temperature as well as the time of observation. Older records often include multiple readings of temperature throughout the day. The month’s record of daily temperatures is added to each station’s historical record of daily temperatures, which can be downloaded from NOAA’s FTP site here.

The B91 form for Mohonk Lake is hand-written, and temperatures are recorded in Farenheit. Transcribing the data to the electronic daily record introduces an opportunity for error, but I spot-checked a number of B91 forms – converting degrees F to tenths of degree C – and found no errors. Kudos to the NOAA transcriptionists.

Next comes the first phase of NOAA adjustments.

NOAA to USHCN (part I) and GHCN

The pristine data from Mohonk Lake are subject to a number of quality control and homogeneity testing and adjustment procedures. First, data is checked against a number of quality control tests, primarily to eliminate gross transcription errors. Next, monthly averages are calculated from the TMIN and TMAX values. This is straightforward when both values exist for all days in a month, but in the case of Mohonk Lake there are a number of months early in the record with several missing TMIN and/or TMAX values. Nevertheless, NOAA seems capable of creating an average temperature for many of those months. The result is referred to as the “Areal data”.

The Areal data are stored in a file called hcn_doe_mean_data, which can be found here. Even though the daily data files are updated frequently, hcn_doe_mean_data has not been updated in nearly a year. The Areal data also seem to be stored in the GHCN v2.mean file, which can be found here on NOAA’s FTP site. This is the case for Mohonk Lake.

Of course, more NOAA adjustments are needed.

USCHN (part II and III)

The Areal data is adjusted for time of observation and stored as a seperate entry in hcn_doe_mean_data. TOB adjustment is briefly described here. Following the TOB adjustment, the series is tested for homogeneity. This procedure evaluates non-climatic discontinuities (artificial changepoints) in a station’s temperature caused by random changes to a station such as equipment relocations and changes. The version 2 algorithm looks at up to 40 highly-correlated series from nearby stations. The result of this homogenization is then passed on to FILNET which creates estimates for missing data. The output of FILNET is stored as a seperate entry in hcn_doe_mean_data.

Now GISS wants to use the data,  but the NOAA adjustments are not quite what they are looking for. So what do they do? They estimate the NOAA adjustments and back them out!

USHCN and GHCN to GISS

GISS now takes both v2.mean and hcn_doe_mean_data, and lops off any record before 1880. GISS will also look at only the FILNET data from hcn_doe_mean_data. Temperatures in F are converted and scaled to 0.1C.

This is where things get bizarre.

For each of the twelve months in a calendar year, GISS looks at the ten most recent years in common between the two data sets. For each month in those ten most recent years it takes the difference between the FILNET temperature and the v2.mean temperature, and averages them. Then, GISS goes through the entire FILNET record and subtracts the monthly offset from each monthly temperature.

It appears to me that what GISS is attempting to do is remove the corrections done by NOAA from the USHCN data. Standing back to look at the forest through the trees, GISS appears to be trying to recreate the Areal data, failing to recognize that v2.mean is the Areal data, and that hcn_doe_mean_data also contains the Areal data.

Here is a plot of the difference between the monthly raw data from Mohonk Lake and the data GISS creates in GISTEMP STEP0 (yes, I am well aware that in this case it appears the GISS process slightly cools the record). Units on the left are 0.1C.

Even supposedly pristine data cannot escape the adjustment process.

Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
104 Comments
Inline Feedbacks
View all comments
Chris H
September 24, 2008 3:43 am

“For one thing, Mr. Huth told the NYT that he recorded the temperatures at around 4:00 PM every day. However, every B91 I looked at signed by Mr. Huth indicated the time of observation was 5:00 PM. Maybe a nit, but hardly pristine.”
Maybe he really *does* take temp measurements at 4pm, but the time recorded in the database is adjusted for Summer Time? (Would make sense since Summer Time is a purely human convention, which might otherwise complicate analysis?)

Phil M
September 24, 2008 4:14 am

John RH
“John G. states: “yes, I am well aware that in this case it appears the GISS process slightly cools the record.”
But, correct me if I’m wrong, doesn’t the fact that Mohonk-GISS temp’s are going from positive to negative just mean that GISS temp’s are getting comparatively higher as time goes by?
Which means that the GISS process produces warming in the record.”
Yes – I’d agree with your reading of that
– as shown (Mohonk – GISStemp) shows that the GISS process produces warming….surprise!

September 24, 2008 4:17 am

A very good and entertaining post; I knew about the adjustments previously, but hadn’t realised how much this resembles a kind of sausage factory for numbers. The long-winded process reminds me of the “think of a number” trick that we used to astound our friends with at school (at age 6 or thereabouts): “Add 100, then take away 5, then take away the number you first thought of…” Magic! No wonder it all somehow “adds up” to Global Warming. :o)

September 24, 2008 4:19 am

Bobby Lane
A degree Kelvin is the same size as a degree Centigrade. The C scale has zero at the freezing point of water, the K scale has zero at absolute zero. Your conversion method was correct.

Phil M
September 24, 2008 4:21 am

Rounding to integers (F)
– yes, it’s a pity that they do this
– which just goes to show that the temperature monitoring was never intended for the purpose for which it is now being used….
– but, by using large enough samples it *is* possible to recover the information that was lost in the rounding process.
– that’s the benefit of using a large sample set – the error introduced by the rounding can be effectively eliminated by using taking readings from many places
– although I do agree that there is an overall problem with the accuracy of the whole system, which Anthony has pointed out many times
– hence all the ‘correction’ factors that get applied….

Peanut Gallery(formerly know as the artist Tom in Florida)
September 24, 2008 4:38 am

This is a classic example of taking a simple thing, adding a large dose of government and presto:
awholebunchofstuffthatisallmixedupanddoesn’tdowhatwasoriginallyintended

Mike Bryant
September 24, 2008 4:47 am

Thanks John Goetz,
This is just mind-boggling. Of course, I’ve read of these adjustments, but to see them laid out like this…
I guess putting this data through all these acrobatics makes them oh so perfect.

September 24, 2008 4:55 am

Two words come to mind….
“Paralysis By Analysis”
Why in the world the information gets manipulated is beyond me…
http://www.cookevilleweatherguy.com

September 24, 2008 4:55 am

*LOL*…or was that 3 words?? Haven’t had enough coffee yet! 🙂

Mike Bryant
September 24, 2008 5:00 am

Like I said Phil, oh so perfect. Shame on you. I thought you were a scientist. Or was it sarcasm?

Phil M
September 24, 2008 5:25 am

Slightly off topic
– In September Satellite temps (lower troposphere, AMSU)
– it looks like this month is going to come out with an anomaly of around +0.2C
– the highest for this year…
It will be interesting to see if this coming winter is as cold as the last one
– or if we return to regular anomalies of around +0.2C….

MarkW
September 24, 2008 5:37 am

Is Hansen starting to lose it?
http://www2.ljworld.com/news/2008/sep/23/nasa_climate_expert_warns_dire_consequences_global/
“If we don’t get this thing under control we are going to destroy the creation,” said James Hansen,

Editor
September 24, 2008 5:49 am

Paul (04:19:39) :

Bobby Lane
A degree Kelvin is the same size as a degree Centigrade. The C scale has zero at the freezing point of water, the K scale has zero at absolute zero. Your conversion method was correct.

Yes, but your terminology is confused. Technically “degree Kelvin” shouldn’t be used. It may have happened when Centigrade was renamed Celcius (and cycle/sec Hertz, etc), but Kelvins were redefined to make them used more like other measurements.
The phrase “about 4K colder than at present” does have one confusion, i.e. K is for Kelvin and K is a prefix for a 1000 multiplier. The K in 4K isn’t a prefix, so it must be Kelvins. Consider “that pipe is 1m shorter than the old one.” You’d know that if the old pipe was 1.618 meters long, you’d know the the new pipe was 0.618 meters long. A melting ice cube’s temperature is 273 Kelvins or 0 degrees Celcius, not 273 degrees Kelvin. “About 4K colder than at present” is the same as “4 degrees C colder.”
In dog nights, it’s about one dog, i.e. a three dog night would be a four dog night.

Bill Marsh
September 24, 2008 5:53 am

Mark W,
“Starting to lose it”?

Editor
September 24, 2008 6:03 am

MarkW (05:37:56) :

Is Hansen starting to lose it?
http://www2.ljworld.com/news/2008/sep/23/nasa_climate_expert_warns_dire_consequences_global/
“If we don’t get this thing under control we are going to destroy the creation,” said James Hansen,

I think so. He seems to be both becoming more and more messianic in his speeches and seems to be reaching out to large forums. There was a rock & environmental festival in the spring (that got mostly rained out) that he spoke at. From the coverage it seemed he was trying to expand his flock. As he seems to be losing his support from science as reality, he seems to be drawing more and more on a faithful following. Evan suggested that Hansen not be forced from NASA lest he become a martyr, my sense is the sooner the better. I do think that any Hansen watcher look beyond the science to try to figure out where he’s going.

Editor
September 24, 2008 6:15 am

“science as reality”? I meant “science and reality,” though it seems to work either way. 🙂

Harold Ambler
September 24, 2008 6:21 am

I sent an e-mail to Benjamin Cook, the NOAA meteorologist whose data was used by the New York Times for its Mohonk House article, asking him about a year at the turn of last century shown to have a sub-freezing average temperature. Having lived in the Northeast for most of my adult life, I knew this was pretty unlikely! This is what he said:
“It turns out that the graphics people at the times converted between Celsius and Fahrenheit incorrectly, so the temperatures in the graph were way too cold (although the shape of the curve and the trends were the same). I’ve already notified them, and they said they would be fixing the online graphic.”
It was good of Benjamin to get back to me. Hopefully, the Times folks will do what they have promised.

Harold Ambler
September 24, 2008 6:29 am

P.S. The graph in the Times article, using the incorrectly converted figures, shows a 20-degree swing from the coldest annual temperature to the warmest. This also seems pretty surprising, and I have sent an e-mail to Benjamin asking him about it.

September 24, 2008 6:54 am

Successively rendering significant figures insignificant, then “creating” significant figures and adjusting them back into insignificance is mind boggling. At some point in this process, “data” disappears and is replaced by “number sets” which purportedly represent what the data sets shoulda/coulda/woulda looked like had they been collected timely from properly installed and calibrated instruments in the first place.
The suggestion that the denizens of this globe should invest more than $100 trillion to correct a “problem” projected to occur based on these “supple” number sets is laughable, at best.

Dan McCune
September 24, 2008 6:56 am

I just checked with http://www.surfacestations.org/ and this site has not been surveyed. Anyone near by who could verify the equipment is reliable as the measurments?
Surveyed?
Active?
lat
long
CRN Rating*
USHCN ID 305426
Station name MOHONK LAKE
Distance from Post office 0.1
dir_PO SE
State NY
lat 41.77
long -74.15
GHCN ID 72504006
Elev (ft) 1245
location MOUNTAIN HOTEL ON LAKE 4 MILES WNW OF PO AT NEW PALTZ, NY
MMS id 20026
Reply – Ummm, Yes. See Calling All Climate Sleuths – Dee Norris

Bill Illis
September 24, 2008 7:06 am

Anthony’s trip to the NCDC this spring allowed us to see how much they are adjusting the raw temperature records with these adjustments.
USHCN V2 has two adjustments for TOBS (increases the average trend by 0.2C) and the Homogenity Adjustment (also increases the trend by 0.2C). So the adjustments increase the overall temperature trend (in the US) by 0.4C compared to the raw data.
These adjustments are shown in Slide 7 and Slide 16 of the powerpoint given to Anthony by the NCDC (not shown anywhere else on the net that I have seen).
(this link locks up sometimes)
http://wattsupwiththat.files.wordpress.com/2008/05/watts-visit.ppt#256,1,U.S. HCN Temperature Trends: A brief overview
Original can be found in this post by Anthony.
http://wattsupwiththat.com/2008/05/13/ushcn-version-2-prelims-expectations-and-tests/
Of course all the adjustments in USHCN V1 are shown in this chart (0.55F or 0.3C) So Version 2 increases the trend by a further 0.1C compared to Version 1.
http://www.ncdc.noaa.gov/img/climate/research/ushcn/ts.ushcn_anom25_diffs_urb-raw_pg.gif

Phil M
September 24, 2008 7:37 am

John Goetz
– sorry, I mis-read the graph title (several times!)
Mike Bryant
– me? I wasn’t being sarcastic
– you can get data back (statistically) that’s been ‘lost’ in rounding…
– you just need enough spacial or temporal samples…
– which is what the GISS software it trying to do by using thousands of sites…
– but it is interesting that the trend we’re looking for is of the same magnitude (per century) as the rounding error.
– persumably the results were rounded to 1F because it was felt that this was the useful limit of accuracy of the data, which is also interesting (i.e. the raw data isn’t very accurate)

September 24, 2008 7:42 am

This is why I don’t trust the NOAA and GISTemp datasets. If they do this to temperature readings within the USofA, Lord knows what they do to temperature readings from the ROW. I still haven’t figured out what happens to Canadian data.