Contribution of USHCN and GISS bias in long-term temperature records for a well-sited rural weather station

Guest post by David W. Schnare, Esq. Ph.D.

When Phil Jones suggested that if folks didn’t like his surface temperature reconstructions, then perhaps they should do their own, he was right. The SPPI analysis of rural versus urban trends demonstrates the nature of the overall problem. It does not, however, go into sufficient detail. A close examination of the data suggests three areas needing address. Two involve the adjustments made by NCDC (NOAA) and by GISS (NASA). Each made their own adjustments and typically these are serial, the GISS done on top of the NCDC. The third problem is organic to the raw data and has been highlighted by Anthony Watts in his Surface Stations project. That involves the “micro-climate” biases in the raw data.

As Watts points out, while there are far too many biased weather station locations, there remain some properly sited ones. Examination of the data representing those stations provides a clean basis by which to demonstrate the peculiarities in the adjustments made by NCDC and GISS.

One such station is Dale Enterprise, Virginia. The Weather Bureau has reported raw observations and summary monthly and annual data from this station since 1891 through the present, a 119 year record. From 1892 to 2008, there are only 9 months of missing data during this 1,404 month period, a missing data rate of less than 0.64 percent. The analysis below interpolates for this missing data by using an average of the 10 years surrounding the missing value, rather than basing any back-filling from other sites. This correction method minimizes the inherent uncertainties associated with other sites for which there is not micro-climate guarantee of unbiased data.

The site itself is in a field on a farm, well away from buildings or hard surfaces. The original thermometer remains at the site as a back-up to the electronic temperature sensor that was installed in 1994.

The Dale Enterprise station site is situated in the rolling hills east of the Shenandoah Valley, more than a mile from the nearest suburban style subdivision and over three miles from the center of the nearest “urban” development, Harrisonburg, Virginia, a town of 44,000 population.

Other than the shift to an electronic sensor in 1994, and the need to fill in the 9 months of missing reports, there is no reason to adjust the raw temperature data as reported by the Weather Bureau.

Here is a plot of the raw data from the Dale Enterprise station.

There may be a step-wise drop in reported temperature in the post-1994 period. Virginia does not provide other rural stations that operated electronic sensors over a meaningful period before and after the equipment change at Dale Enterprise, nor is there publicly available data comparing the thermometer and electronic sensor data for this station. Comparison with urban stations introduces a potentially large warm bias over the 20 year period from 1984 to 2004. This is especially true in Virginia as most such urban sites are typically at airports where aircraft equipment in use and the pace of operations changed dramatically over this period.

Notably, neither NCDC nor GISS adjusts for this equipment change. Thus, any bias due to the 1994 equipment change remains in the record for the original data as well as the NCDC and GISS adjusted data.

The NCDC adjustment

Although many have focused on the changes GISS made from the NCDC data, the NCDC “homogenization” is equally interesting, and as shown in this example, far more difficult to understand.

NCDC takes the originally reported data and adjusts it into a data set that becomes a part of the United States Historical Climatology Network (USHCN). Most researchers, including GISS and the East Anglia University Climate Research Center (CRU) begin with the USHCN data set. Figure 2 documents the changes NCDC made to the original observations and suggests why, perhaps, one ought begin with the original data.

The red line in the graph shows the changes made in the original data. Considering the location of the Dale Enterprise station and the lack of micro-climate bias, one has to wonder why NCDC would make any adjustment whatever. The shape of the red delta line indicates these are not adjustments made for purposes of correcting missing data, or for any obvious other bias. Indeed, with the exception of 1998 and 1999, NCDC adjusts the original data in every year! [Note, when a 62 year old Ph.D. scientist uses an exclamation point, their statement is rather to be taken with some extraordinary attention.]

This graphic makes clear the need to “push the reset button” on the USHCN. Based on this station, alone, one can argue the USHCN data set is inappropriate for use as a starting point for other investigators, and fails to earn the self-applied moniker as a “high quality data set.”

The GISS Adjustment

GISS states that their adjustments reflect corrections for the urban heat island bias in station records. In theory, they adjust stations based on the night time luminosity of the area within which the station is located. This broad-brush approach appears to have failed with regard to the Dale Enterprise station. There is no credible basis for adjusting station data with no micro-climate bias conditions and located on a farm more than a mile from the nearest suburban community, more than three miles from a town and more than 80 miles from a population center of greater than 50,000, the standard definition of a city. Harrisonburg, the nearest town, has a single large industrial operation, a quarry, and is home to a medium sized (but hard drinking) university (James Madison University). Without question, the students at JMU have never learned to turn the lights out at night. Based on personal experience, I’m not sure most of them even go to bed at night. This raises the potential for a luminosity error we might call the “hard drinking, hard partying, college kids” bias. Whether it is possible to correct for that in the luminosity calculations I leave to others. In any case, the lay out of the town is traditional small town America, dominated by single family homes and two and three story buildings. The true urban core of the town is approximately six square blocks and other than the grain tower, there are fewer than ten buildings taller than five stories. Even within this “urban core” there are numerous parks. The rest of the town is quarter-acre and half-acre residential, except for the University, which has copious previous open ground (for when the student union and the bars are closed).

Despite the lack of a basis for suggesting the Dale Enterprise weather station is biased by urban heat island conditions, GISS has adjusted the station data as shown below. Note, this is an adjustment to the USHCN data set. I show this adjustment as it discloses the basic nature of the adjustments, rather than their effect on the actual temperature data.

While only the USHCN and GISS data are plotted, the graph includes the (blue) trend line of the unadjusted actual temperatures.

The GISS adjustments to the USHCN data at Dale Enterprise follow a well recognized pattern. GISS pulls the early part of the record down and mimics the most recent USHCN records, thus imposing an artificial warming bias. Comparison of the trend lines is somewhat difficult to see in the graphic. The trends for the original data, the USHCN data and the GISS data are: 0.24,

-0.32, and 0.43 degrees C. per Century, respectively.

If one presumes the USHCN data reflect a “high quality data set”, then the GISS adjustment does more than produce a faster rate of warming, it actually reverses the sign of the trend of this “high quality” data. Notably, compared to the true temperature record, the GISS trend doubles the actual observed warming.

This data presentation constitutes only the beginning analysis of Virginia temperature records. The Center for Environmental Stewardship of the Thomas Jefferson Institute for Public Policy plans to examine the entire data record for rural Virginia in order to identify which rural stations can serve as the basis for estimating long-term temperature trends, whether local or global. Only a similar effort nationwide can produce a true “high quality” data set upon which the scientific community can rely, whether for use in modeling or to assess the contribution of human activities to climate change.

David W. Schnare, Esq. Ph.D.

Director

Center for Environmental Stewardship

Thomas Jefferson Institute for Public Policy

Springfield Virginia

===================================

UPDATE: readers might be interested in the writeup NOAA did on this station back in 2002 here (PDF, second story). I point this out because initially NCDC tried to block the surfacestations project saying that I would compromise “observer privacy” by taking photos of the stations. Of course I took them to task on it when we found personally descriptive stories like the one referenced above and they relented. – Anthony

Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
196 Comments
Inline Feedbacks
View all comments
RockyRoad
February 26, 2010 10:26 pm

Although the following was published two months ago on Dec 17, 2009, it is even more applicable today:
http://www.climategate.com/u-s-lawyers-get-their-legal-briefs-in-order
Indeed, I’d say the lawyers on these cases are having trouble keeping up with all the juicy revelations coming from so many directions. RICO strikes again!
In the meantime, the Warmers will WISH it was only as bad as the Big Tobacco cases (and how ironic–the Warmers are the equivalent of Big Tobacco, not the realists/dissidents).

kadaka
February 26, 2010 10:53 pm

Eric Gamberg (19:47:25) :
Don’t forget that the surfacestations.org survey results are really only addressing the conditions at the site at the time of the survey. The history of the site and its locations and equipment are not rigorously examined.
REPLY: We may be able to do this if NCDC will give me access to B44 forms which are top view site sketches and description of surroundings, but so far they have not made them available. -A

Will this provide what you need?

You may obtain copies of station history documents submitted prior to January 2001, with the personally identifiable information obliterated, at the NCDC’s cost of reproduction. This is the same information accessible via MMS at no charge. For station history submitted after January 2001, MMS (https://mi3.ncdc.noaa.gov/mi3qry) is the only delivery option.

It says above that:

Individual NWS B44 forms are not publicly accessible online because they contain personally identifiable information such as observer name, address, phone number and gender. The cooperative observers are volunteers who donate their time in the interests of the public good with a reasonable expectation that their personal information will remain private.

So, does that mean one can or cannot get the personal-info-redacted B44 forms online, and would they have the sketches and descriptions you need?

John F. Hultquist
February 26, 2010 11:01 pm

c james (17:26:25) :
James Sexton (16:46:41) growth and decline of towns
There are quite a number of towns of historical interest, although not necessarily relevant to the current weather station kerfuffle. Try these,
Silver —- City, Nevada
http://en.wikipedia.org/wiki/Virginia_City,_Nevada
Oil —- Pithole, Pennsylvania
http://en.wikipedia.org/wiki/Pithole,_Pennsylvania

John F. Hultquist
February 26, 2010 11:02 pm

Virginia City, Nevada

D. Patterson
February 26, 2010 11:05 pm

David Schnare (20:29:56) :
Data sources for the analysis.
The “raw” data come through the NOAA Locate Station portal at:
http://www.ncdc.noaa.gov/oa/climate/stationlocator.html
I pulled from “Daily/Monthly/Annual Virginia Climatological Data” which are the original reports from the various weather stations, raw and unadjusted.

The climate science community (CRU, NOAA, GISS, et al) has a distressingly bad habit of using terminology to describe data as “robust”, “raw”, and “unadjusted” when it is, in fact, anything but robust, raw, or unadjusted. Consequently, every time I see such terminology being used, I have to sit up and take notice, because it usually simply isn’t so. The USHCN and GHCN are compiled from daily summaries that include extensive adjustments to the raw, meaning original, observations. In this example, I have to note that the daily summary is itself a summary of supposedly “raw and unadjusted” observations for a single daily period, but is actually a summary of previously adjusted observation records which are too often not actual raw observation values.
The number and type of adjustments used to compile the daily summary varies considrably according to the specific dataset, observations series, and stations being used. Which dataset and dataset number are you referring to in the above comment? Are they perhaps Dataset 3200, Dataset 3210, or another Dataset?
Readers need to understand and approach claims of “raw and unadjusted” observations of surface, marine, and upper air observations with extreme caution and skepticism. Actual “raw and unadjusted” observations used in the USHCN and GHCN datasets are extremely difficult to access, and in certain instances such “raw observations” may have been permanently destroyed and are now unrecoverable. Sources at the National Climatic Data Center (NCDC) have reported that certain unidentified original manuscript observation records have been destroyed by water damage and bookworm type insect infestations. In the event the destroyed documents may include observational records incorporated as adjusted values in the USHCN and GHCN datasets, the actual “raw” data may not ever be recoverable in those instances for the purpose of reconstructing those datasets with the actual raw observations. The extent of this problem is unknown.
What is needed is a well researched diagram detailing every step involved in capturing the temperature observation, recording the observation, performing QC (Quality Control) corrections and adjustments, and all subsequent adjustments and changes applied for summarizations, transcriptions, digitizations, and compilation of the datasets. Doing so will no doubt astound most unsuspecting people when they can see just how many alterations have been applied to a single true “raw” temperature observation taken in one minute, of one hour, of one day, in one year.

Pete H
February 26, 2010 11:10 pm

“based on the night time luminosity of the area within which the station is located.”
A couple of guys have picked up on the above statement.
I sat back for a while and tried to work out what the person who came up with this one was …struggling… to do! Was it just some AGW modeller trying to fit his/her data into the argument put forward by Anthony or is it a true, peer reviewed piece of research, after all, surely no real research could be done just by assuming lights on equals effect.
Come on RC people. Show us the research linking “luminosity of the area with relation to UHI effect”.
I am not being sarcastic, I really am interested, as I am sure are many here, in the research, if it is available.

D. Patterson
February 26, 2010 11:23 pm

Pete H (23:10:24)
Details of the night lights was discussed extensively at Climateaudit some time ago.

February 26, 2010 11:29 pm

Re: RedS10 (Feb 26 17:11),
I believe the post is wrong. The sources of data for GIStemp are:
1. Raw GHCN (v2.mean)
2. USHCN noFIL – ie without FILnet adjustment. It includes TOBS, maxmin and SHAP adjustments, according to the code. TOBS, time of observation, is directly based on the station records, as is SHAP. SHAP adjusts for known events in the station history, and would include an adjustment for MMTS.
DALE is in both sets, so I’m not sure which set of data was used. But the “NCDC adjustments” which I presumes means GHCN, would not have been used. In fact, I don’t know of any published index that uses the GHCN adjustments.

juan
February 26, 2010 11:32 pm

The NOAA /GISS data records are byzantine and I am struggling to understand them. Still, mere ignorance never yet kept an American quiet, and I am sure I will be promptly set straight by my fellow commenters.
MMS marks the beginning of COOP 44208 in 1948. During the following 62 years it shows 2 location changes, one of nearly half a mile.
boballab has written a clear exposition of how to look up station data on the USHCN site, with emphasis on flags that show data has been estimated rather than measured: http://boballab.wordpress.com/2010/02/19/before-using-temperature-data-read-the-fine-print/
The two flags showing estimation are E and X. The record for Dale Evanston shows the following:
oct 1913 E
sep 1916 to nov 1917 X
mar 1932 to sep 1932 X
dec 1949 to sep 1951 X
sep 1959 X
mar 1971 E
sep 1972 to jan 1973 X
jan 1983 E
may 2007 X
jun 2007 X
jul 2007 to dec 2008 X
This adds up to 73 months of infilled data, not 9. Have I got this wrong? If not, how does it affect the analysis?

Jeff B.
February 26, 2010 11:38 pm

Frantic attempts by the NOAA and NASA to make the data fit their hypothesis. Thankfully the truth from folks like Schnare and Watts, is coming out.

E.M.Smith
Editor
February 26, 2010 11:43 pm

Dan (17:24:15) : Discussion over surface station data quality should be placed in the context of the satellite data record,
Which is really kind of a dumb thing to do given that the recent past is when the USHCN and GHCN data are left alone and the time period from prior to the satellite period are where all the “cooking by cooling the far past” takes place.
So I’d MUCH rather see what the author has done, exposing the impact of the “rewrite the pre-satellite past” than admire the period where things are left more alone…
BTW, to understand why the “average temperature” now can be up a smidgeon but the world is cooling do this:
Take 10 coins. 2 nickels and 8 pennies. Now, put the nickels in your right pocket and the pennies in your left. Which pocket has the most COINS? That is the temperature measurement. Which pocket has the most VALUE? That is the HEAT measurement.
The “still warm from 20 years of hot sun and warming PDO” is putting a lot of warm air pennies into the air over the large ocean area. It then runs up to the poles and dumps a bucket load of heat out to space. This cold air then heads south over the land and freezes a boat load of water into snow. That “several feet of snow” is the nickels.
So your satellite is just counting the coins, but the FEET of snow and the snow extent all the way to Florida is looking at the denominations…
Or put less metaphorically: The heat content of the air over an ocean is not as much as the heat lost making tons of snow over the land, even if the average temperatures x area is greater over the ocean.

John F. Hultquist
February 26, 2010 11:53 pm
hotrod ( Larry L )
February 26, 2010 11:59 pm

There have been a number of historical mass migrations of populations here in the U.S. that would effect population density on a regional level over the last century.
Among these would be the dust bowl period, large numbers of people moved out of rural mid America in the dust bowl regions (Colorado, Kansas, Oklahoma, Texas etc.)
In the late 1970’s (about the time the satellite records began) there was a substantial shift of people out of the north eastern states (rust belt) both due to economic issues from the high inflation and job losses in that period and the severe cold winters that area experienced in the 1970’s when the popular press was pitching global cooling and a coming ice age.
Census records give snap shots on a decadal scale, but a higher resolution proxy might be local housing starts, school enrollments, building permits, sewage flow, utility usage, retail grocery sales volumes etc. to try to give a cross check on local population trends. The most likely time for sudden population movements would be the deep recession periods with high layoff rates (plant shut downs).
In the Eastern plains of Colorado there are a lot of small towns that have essentially become ghost towns population wise, but they still have high lighting levels in some areas due to interstate businesses serving travelers. Also many farmers and ranchers put up mercury vapor yard lights.
In the mid 1970’s I had occasion to drive from Denver down to Rocky Ford Colorado on a regular basis. I took I-70 to highway 71 south out of Limon Colorado. There is an area south of the intersection of Highway 71 and Highway 94 that you coulc stop the car get out and turn in a full circle and not see a single artificial light late at night. But you could see the sky glow of both Colorado Springs and the Denver Metro area which are 50 -100 miles away over the horizon.
Last time I was down there in the 1990’s that was no longer the case, due to farm yard lights which come on at dusk and go off at dawn. The actual population has not changed a great deal, but the local light environment has.
Larry

pwl
February 27, 2010 12:07 am

Wow, barnacles layered upon barnacles! No wonder there is an alleged warming!
Is it possible they didn’t realize that they were layering warming distortion after warming distortion upon the data? How could they have missed this messing it up so much?
NCDC (NOAA) and GISS (NASA) had better hope that they didn’t leave an audit trail otherwise they’ve been caught with yet another fraudulent invention of data. I wonder what they say now about this?
pwl
http://PathsToKnowledge.NET

February 27, 2010 12:10 am

The simple spot check folks did of nightlights more than a year ago confirmed that it did not “pick out” stations that were rural and classified some rural sites as “small town” and some urban sites as rural. Nightlights is a PROXY for population density and population density is a proxy for UHI. The only method that has a chance at working with high quality is a PHYSICAL INSPECTION of the site.
It’s tedious work. It’s not as sexy as GCM work. It’s nice to try to do this work by looking at satillite photos in your cozy ivory tower. But nothing beats feet on the street.

E.M.Smith
Editor
February 27, 2010 12:14 am

David Schnare (20:29:56) : Data sources for the analysis.
The USHCN adjusted data are pulled from the GISS site. GISS states that it uses the newest available data from USHCN data set.

Be careful with that. It would be better to get the USHCN data directly from NCDC at their FTP site. The data available from that stage of GIStemp is the combination of GHCN and USHCN so for the (about 136) USA stations that are in both you will be getting an odd ‘sort of an average’ merger of the two (that have different modification histories). For the other stations, you ought to be OK (and since the GHCN stations are highly biased toward urban and airports, I doubt you would hit one in a rural pristine selection criteria).
More importantly, GISS swapped from the USHCN (that ends in 5/2007) to the USHCN Version Two (that has a different and ‘warmer’ adjustment history) on about 15 Nov 2009. So depending on when you grabbed your data, you got either USHCN or USHCN.v2 with no obvious notice… ( You had to read the updates log about software changes).

I used the “raw GHCN data + and USHCN corrections” as the USHCN data, and used “after homogeneity adjustment” as the GISS data.

This is pulling GISS data from two different points in the GIStemp processing. The first is ‘after STEP0’ that mostly just merges USHCN.v2 and GHCN and tosses away anything from prior to 1880. (It does a very odd merger. If it has only one, it passes that one through unchanged. If it has both, it “unadjusts” the USHCN and then semi-averages it smoothing any splices with the GHCN for that station). If your station is not in GHCN (and it ought not to be IMHO) then you ought to be getting the USHCN (if prior to Nov 2009 pull date) or USHCN.v2 (if done after Nov 2009) with only the truncate at 1880 change.
After homogenizing is pulling from after the STEP1 homogenizing and, I believe, after the STEP2 UHI as well and will include various infill and related impacts.

GISS offers a third option: “after combining sources at same location”. In the Dale Enterprise case study, this data was identical to the “raw GHCN data + and USHCN corrections”, which would be expected for a true rural data set, as there are no other sources at the same location in such a case.

The ‘as combined’ ought to give you after STEP1 and before STEP2 I think… it’s a bit of a guess since they don’t document it exactly on the web site.
But for your station the only “combining” would be if there are multiple “modification history flags” for that site. That is, if the 12th digit of the StationID comes in more varieties than “0”. That ought to have happened at the time the instrument was changed from LIG to electronic, but the GISS site shows only the one 12 digit StationID, so you are getting no ‘combining’.
GISS just picks up the USHCN.v2 data from NCDC and changes the StationID and format, so it’s basically the same data, but the provenance has another step in it… so things like that swap from USHCN to USHCN.v2 can happen to you. FWIW, I’ve heard an assertion that GHCN is supposed to get some kind of swap “soon” too. So keep eyes open and watch the change log at GISS…
(They sure do like to keep those walnut shells moving… )

E.M.Smith
Editor
February 27, 2010 12:18 am

OT, but there has been an 8.8 Quake in Chile and there are Tsunami watches being issued for all over the place in the pacific… be advised.

kwik
February 27, 2010 12:29 am

So Global Warming IS happening?
But its actually NASA that is doing it, not CO2?
That means NASA doesnt deserve any fundings.
They deserve to be taxed.

February 27, 2010 12:32 am

Re: hotrod ( Larry L ) (Feb 26 22:05),
“undocumented adjustment on top of undocumented adjustment”
The GHCN, USHCN and GISS datasets and their adjustments are thoroughly documented. The problems is that people who write posts here on the topic don’t seem to read it.
GHCN documentation is here. USHCN documentation is here. GIStemp documentation is here (with over 20 years of papers), and you can get their code there as well.

Ruhroh
February 27, 2010 1:02 am

Hey Cheif,
What’s your take on that Raw vs. Raw blinker from rockyhigh66?
i.e.,
was there some kind of labelling accident?
was one v1 and the other v2?
The graphics just come roaring directly through when making a blinking GIF, so, not much opportunity for miscreants to monkey around with labels.
My consensus bro is very dubious of the provenance of those blinkers, although perhaps not for the reasons that might concern the midnight aspie crew…
RR

Tenuc
February 27, 2010 1:31 am

Thank you Dr. Schnare for an interesting and easy to understand explanation of why the current ‘standard’ historic temperature data sets like GISS, NCDC and (I suspect) Hadcrut are not fit for purpose.
Your choice of Dale Enterprise, Virginia as an example was very good, because as well as having an almost complete long-term continuous record, the same family had been reporting the temperature there since 1868 (as of Spring 2002).
Details of the family history are available on the link below. This ‘provenance’ for Dale Enterprise weather station is very rare, and I’m sure the family would be interested to see your results – who knows, they may still be using the old mercury thermometer to record the temperature there?
http://origin.www.erh.noaa.gov/lwx/reporter/Spring2002.pdf

Alan S
February 27, 2010 1:51 am
February 27, 2010 2:06 am

Hansen’s nightlights: Imhoff 1997 is a must read if you want to understand the stuff. also, try this
http://www.ngdc.noaa.gov/dmsp/pubs/ElvidgeEtAl-Global_urban_mapping_20090618.pdf
ISA might be better than nightlights as a UHI proxy
http://www.ngdc.noaa.gov/dmsp/download_global_isa.html

Peter Plail
February 27, 2010 2:07 am

What Dr Schnare has done for Dale Enterprise ( and what others have done for other individual sites) can presumably be repeated for other sites using the same data sources, methods, calculations and presentation of results.
This would be a massive undertaking for an individual, but on this site I suspect you would have a lot of willing and able volunteers who just need guidance as to how to accomplish the work.
Would someone consider publishing a document to guide such willing volunteers, together with any particular formulae or algorithms used in processing the data.
Such a methodology could be openly debated here until there is concensus.
At this stage individuals could then volunteer to analyse a specific site or sites. Initially, some degree of quality control can be achieved by having more than one volunteer analysing each site and then comparing results.
I am sure that between us we have sufficient skills and knowledge to plan, set up and implement such an “open source” project that warmists and sceptics alike can contribute to. My quest is to achieve some level of truth that is acceptable to both sides of the debate.

brc
February 27, 2010 2:25 am

I’ve got a question : which came first : the warming bias adjustments or the AGW theory? Is it possible that the adjustments were made, then people looked at the temperature records and went ‘hey, that’s warming up. Maybe co2 is causing it? I can’t think of anything else it could be’