There’s been quite a bit of publicity about Hansen’s Y2K error and the

change in the U.S. leaderboard (by which 1934 is the new warmest U.S. year)

in the right-wing blogosphere. In contrast,

realclimate has dismissed it a triviality and the climate blogosphere is

doing its best to ignore the matter entirely.

My own view has been that

matter is certainly not the triviality that Gavin Schmidt would have you

believe, but neither is it any magic bullet. I think that the point is

significant for reasons that have mostly eluded commentators on both sides.

Station Data

First, let’s start with the impact of Hansen’s error on individual station

histories (and my examination of this matter arose from examination of

individual station histories and not because of the global record.) GISS

provides an excellent and popular

online service

for plotting temperature histories of individual stations. Many such

histories have been posted up in connection with the ongoing examination of

surface station quality at surfacestations.org. Here’s an example of this

type of graphic:

Figure 1. Plot of Detroit Lakes MN using GISS software

But it’s presumably not just Anthony Watts and surfacestations.org

readers that have used these GISS station plots; presumably scientists and

other members of the public have used this GISS information. The Hansen

error is far from trivial at the level of individual stations. Grand Canyon

was one of the stations previously discussed at climateaudit.org in

connection with Tucson urban heat island. In this case, the Hansen error was

about 0.5 deg C. Some discrepancies are 1 deg C or higher.

Figure 2. Grand Canyon Adjustments

Not all station errors lead to positive steps. There is a bimodal

distribution of errors reported earlier at

CA here , with many

stations having negative steps. There is a positive skew so that the impact

of the step error is about 0.15 deg C according to Hansen. However, as you

can see from the distribution, the impact on the majority of stations is

substantially higher than 0.15 deg. For users of information regarding

individual stations, the changes may be highly relevant.

GISS recognized that the error had a significant impact on individual

stations and took rapid steps to revise their station data (and indeed the

form of their revision seems far from ideal indicating the haste of their

revision.) GISS failed to provide any explicit notice or warning on their

station data webpage that the data had been changed, or an explicit notice

to users who had downloaded data or graphs in the past that there had been

significant changes to many U.S. series. This obligation existed regardless

of any impact on world totals.

Figure 3. Distribution of Step Errors

GISS has emphasized recently that the U.S. constitutes only 2% of global

land surface, arguing that the impact of the error is negligible on the

global averagel. While this may be so for users of the GISS global average,

U.S. HCN stations constitute about 50% of active (with values in 2004 or

later) stations in the GISS network (as shown below). The sharp downward

step in station counts after March 2006 in the right panel shows the last

month in which USHCN data is presently included in the GISS system. The

Hansen error affects all the USHCN stations and, to the extent that users of

the GISS system are interested in individual stations, the number of

affected stations is far from insignificant, regardless of the impact on

global averages.

Figure 4. Number of Time Series in GISS Network. This includes all versions

in the GISS network and exaggerates the population in the 1980s as several

different (and usually similar) versions of the same data are often

included.

U.S. Temperature History

The Hansen error also has a significant impact on the GISS estimate of U.S.

temperature history with estimates for 2000 and later being lowered by about

0.15 deg C (2006 by 0.10 deg C). Again GISS moved quickly to revise their

online information changing their

US temperature

data on Aug 7, 2007. Even though Gavin Schmidt of GISS and realclimate

said that changes of 0.1 deg C in individual years were “significant”,

GISS did not explicitly announce these changes or alert readers that a

“significant” change had occurred for values from 2000-2006. Obviously they

would have been entitled to observe that the changes in the U.S. record did

not have a material impact on the world record, but it would have been

appropriate for them to have provided explicit notice of the changes to the

U.S. record given that the changes resulted from an error.

The changes in the U.S. history were not brought to the attention of

readers by GISS itself, but in

this post at climateaudit. As a result of the GISS revisions, there was

a change in the “leader board” and 1934 emerged as the warmest U.S. year and

more warm years were in the top ten from the 1930s than from the past 10

years. This has been widely discussed in the right-wing blogosphere and has

been acknowledged at

realclimate as follows:

The net effect of the change was to reduce mean US anomalies by

about 0.15 ºC for the years 2000-2006. There were some very minor knock

on effects in earlier years due to the GISTEMP adjustments for rural vs.

urban trends. In the global or hemispheric mean, the differences were

imperceptible (since the US is only a small fraction of the global

area).

There were however some very minor re-arrangements in the various

rankings (see data). Specifically, where 1998 (1.24 ºC anomaly compared

to 1951-1980) had previously just beaten out 1934 (1.23 ºC) for the top

US year, it now just misses: 1934 1.25ºC vs. 1998 1.23ºC. None of these

differences are statistically significant.

In my opinion, it would have been more appropriate for Gavin Schmidt of

GISS (who was copied on the GISS correspondence to me) to ensure that a

statement like this was on the caption to the U.S. temperature history on

the GISS webpage, rather than after the fact at realclimate.

Obviously much of the blogosphere delight in the leader board changes is

a reaction to many fevered press releases and news stories about year x

being the “warmest year”. For example, on Jan 7, 2007, NOAA

announced that

The 2006 average annual temperature for the contiguous U.S. was

the warmest on record.

This press release was widely covered as you can determine by googling

“warmest year 2006 united states”. Now NOAA and NASA are different

organizations and NOAA, not NASA, made the above press release, but members

of the public can surely be forgiven for not making fine distinctions

between different alphabet soups. I think that NASA might reasonably have

foreseen that the change in rankings would catch the interest of the public

and, had they made a proper report on their webpage, they might have

forestalled much subsequent criticism.

In addition, while Schmidt describes the changes atop the leader board as

“very minor re-arrangements”, many followers of the climate debate are aware

of intense battles over 0.1 or 0.2 degree (consider the satellite battles.)

Readers might perform a little thought experiment: suppose that Spencer and

Christy had published a temperature history in which they claimed that 1934

was the warmest U.S. year on record and then it turned out that they had

been a computer programming error opposite to the one that Hansen made, that

Wentz and Mears discovered there was an error of 0.15 deg C in the Spencer

and Christy results and, after fiixing this error, it turned out that 2006

was the warmest year on record. Would realclimate simply describe this as a

“very minor re-arrangement”?

So while the Hansen error did not have a material impact on world

temperatures, it did have a very substantial impact on U.S. station data and

a “significant” impact on the U.S. average. Both of these surely “matter”

and both deserved formal notice from Hansen and GISS.

Can GISS Adjustments “Fix” Bad Data?

Now my original interest in GISS adjustments did not arise abstractly,

but in the context of surface station quality. Climatological stations are

supposed to meet a variety of quality standards, including the relatively

undemanding requirement of being 100 feet (30 meters) from paved surfaces.

Anthony Watts and volunteers of surfacestations.org have documented one

defective site after another, including a weather station in a parking lot

at the University of Arizona where MBH coauthor Malcolm Hughes is employed,

shown below.

Figure 5. Tucson University of Arizona Weather Station

These revelations resulted in a variety of aggressive counter-attacks in

the climate blogosphere, many of which argued that, while these individual

sites may be contaminated, the “expert” software at GISS and NOAA could fix

these problems, as, for example

here .

they [NOAA and/or GISS] can “fix” the problem with math and

adjustments to the temperature record.

or here:

This assumes that contaminating influences can’t be and aren’t

being removed analytically.. I haven’t seen anyone saying such

influences shouldn’t be removed from the analysis. However I do see

professionals saying “we’ve done it”

“Fixing” bad data with software is by no means an easy thing to do (as

witness Mann’s unreported modification of principal components methodology

on tree ring networks.) The GISS adjustment schemes (despite protestations

from Schmidt that they are “clearly outlined”) are not at all easy to

replicate using the existing opaque descriptions. For example, there is

nothing in the methodological description that hints at the change in data

provenance before and after 2000 that caused the Hansen error. Because many

sites are affected by climate change, a general urban heat island effect and

local microsite changes, adjustment for heat island effects and local

microsite changes raises some complicated statistical questions, that are

nowhere discussed in the underlying references (Hansen et al 1999, 2001). In

particular, the adjustment methods are not techniques that can be looked up

in statistical literature, where their properties and biases might be

discerned. They are rather ad hoc and local techniques that may or may not

be equal to the task of “fixing” the bad data.

Making readers run the gauntlet of trying to guess the precise data sets

and precise methodologies obviously makes it very difficult to achieve any

assessment of the statistical properties. In order to test the GISS

adjustments, I requested that GISS provide me with details on their

adjustment code. They refused. Nevertheless, there are enough different

versions of U.S. station data (USHCN raw, USHCN time-of-observation

adjusted, USHCN adjusted, GHCN raw, GHCN adjusted) that one can compare GISS

raw and GISS adjusted data to other versions to get some idea of what they

did.

In the course of reviewing quality problems at various surface sites,

among other things, I compared these different versions of station data,

including a comparison of the Tucson weather station shown above to the

Grand Canyon weather station, which is presumably less affected by urban

problems. This comparison demonstrated a very odd pattern discussed

here. The adjustments show that the trend in the problematic Tucson site

was reduced in the course of the adjustments, but they also showed that the

Grand Canyon data was also adjusted, so that, instead of the 1930s being

warmer than the present as in the raw data, the 2000s were warmer than the

1930s, with a sharp increase in the 2000s.

Figure 6. Comparison of Tucson and Grand Canyon Versions

Now some portion of the post-2000 jump in adjusted Grand Canyon values

shown here is due to Hansen’s Y2K error, but it only accounts for a 0.5 deg

C jump after 2000 and does not explain why Grand Canyon values should have

been adjusted so much. In this case, the adjustments are primarily at the

USHCN stage. The USHCN station history adjustments appear particularly

troublesome to me, not just here but at other sites (e.g. Orland CA). They

end up making material changes to sites identified as “good” sites and my

impression is that the USHCN adjustment procedures may be adjusting some of

the very “best” sites (in terms of appearance and reported history) to

better fit histories from sites that are clearly non-compliant with WMO

standards (e.g. Marysville, Tucson). There are some real and interesting

statistical issues with the USHCN station history adjustment procedure and

it is ridiculous that the source code for these adjustments (and the

subsequent GISS adjustments – see bottom panel) is not available/

Closing the circle: my original interest in GISS adjustment procedures

was not an abstract interest, but a specific interest in whether GISS

adjustment procedures were equal to the challenge of “fixing” bad data. If

one views the above assessment as a type of limited software audit (limited

by lack of access to source code and operating manuals), one can say firmly

that the GISS software had not only failed to pick up and correct fictitious

steps of up to 1 deg C, but that GISS actually introduced this error in the

course of their programming.

According to any reasonable audit standards, one would conclude that the

GISS software had failed this particular test. While GISS can (and has)

patched the particular error that I reported to them, their patching hardly

proves the merit of the GISS (and USHCN) adjustment procedures. These need

to be carefully examined. This was a crying need prior to the identification

of the Hansen error and would have been a crying need even without the

Hansen error.

One practical effect of the error is that it surely becomes much harder

for GISS to continue the obstruction of detailed examination of their source

code and methodologies after the embarrassment of this particular incident.

GISS itself has no policy against placing source code online and, indeed, a

huge amount of code for their climate model is online. So it’s hard to

understand their present stubbornness.

The U.S. and the Rest of the World

Schmidt observed that the U.S. accounts for only 2% of the world’s land

surface and that the correction of this error in the U.S. has “minimal

impact on the world data”, which he illustrated by comparing the U.S. index

to the global index. I’ve re-plotted this from original data on a common

scale. Even without the recent changes, the U.S. history contrasts with the

global history: the U.S. history has a rather minimal trend if any since the

1930s, while the ROW has a very pronounced trend since the 1930s.

Re-plotted from GISS Fig A and GFig D data.

These differences are attributed to “regional” differences and it is

quite possible that this is a complete explanation. However, this conclusion

is complicated by a number of important methodological differences between

the U.S. and the ROW. In the U.S., despite the criticisms being rendered at

surfacestations.org, there are many rural stations that have been in

existence over a relatively long period of time; while one may cavil at how

NOAA and/or GISS have carried out adjustments, they have collected metadata

for many stations and made a concerted effort to adjust for such metadata.

On the other hand, many of the stations in China, Indonesia, Brazil and

elsewhere are in urban areas (such as Shanghai or Beijing). In some of the

major indexes (CRU,NOAA), there appears to be no attempt whatever to adjust

for urbanization. GISS does report an effort to adjust for urbanization in

some cases, but their ability to do so depends on the existence of nearby

rural stations, which are not always available. Thus, ithere is a real

concern that the need for urban adjustment is most severe in the very areas

where adjustments are either not made or not accurately made.

In its consideration of possible urbanization and/or microsite effects,

IPCC has taken the position that urban effects are negligible, relying on a

very few studies (Jones et al 1990, Peterson et al 2003, Parker 2005, 2006),

each of which has been discussed at length at this site. In my opinion, none

of these studies can be relied on for concluding that urbanization impacts

have been avoided in the ROW sites contributing to the overall history.

One more story to conclude. Non-compliant surface stations were reported

in the formal academic literature by Pielke and Davey (2005) who described a

number of non-compliant sites in eastern Colorado. In NOAA’s official

response to this criticism, Vose et al (2005) said in effect –

it doesn’t matter. It’s only eastern Colorado. You

haven’t proved that there are problems anywhere else in the United

States.

In most businesses, the identification of glaring problems, even in a

restricted region like eastern Colorado, would prompt an immediate

evaluation to ensure that problems did not actually exist. However, that

does not appear to have taken place and matters rested until Anthony Watts

and the volunteers at surfacestations.org launched a concerted effort to

evaluate stations in other parts of the country and determined that the

problems were not only just as bad as eastern Colorado, but in some cases

were much worse.

Now in response to problems with both station quality and adjustment

software, Schmidt and Hansen say in effect, as NOAA did before them –

it doesn’t matter. It’s only the United States.

You haven’t proved that there are problems anywhere else in the world.