The fact that my work is mentioned by NCDC at all is a small miracle, even if it is “muted”, as Roger says. However, I’m pleased to get a mention and I express my thanks to Matt Menne for doing so. Unfortunately they ducked the issue of the long term site bias contribution and UHI to the surface record. But, we’ll address that later – Anthony

From Roger Pielke Sr.’s Climate Science website
There is a new paper on the latest version of the United States Historical Climatololgy Network (USHCN). This data is used to monitor and report on surface air temperature trends in the United States. The paper is
Matthew J. Menne, Claude N. Williams, Jr. and Russell S. Vose, 2009: The United States Historical Climatology Network Monthly Temperature Data – Version 2.(PDF) Bulletin of the American Meteorological Society (in press). [url for a copy of the paper added thanks and h/t to Steve McIntyre and RomanM on Climate Audit].
The abstract reads
“In support of climate monitoring and assessments, NOAA’s National Climatic Data Center has developed an improved version of the U.S. Historical Climatology Network temperature dataset (U.S. HCN version 2). In this paper, the U.S. HCN version 2 temperature data are described in detail, with a focus on the quality-assured data sources and the systematic bias adjustments. The bias adjustments are discussed in the context of their impact on U.S. temperature trends from 1895-2007 and in terms of the differences between version 2 and its widely used predecessor (now referred to as U.S. HCN version 1). Evidence suggests that the collective impact of changes in observation practice at U.S. HCN stations is systematic and of the same order of magnitude as the background climate signal. For this reason, bias adjustments are essential to reducing the uncertainty in U.S. climate trends. The largest biases in the HCN are shown to be associated with changes to the time of observation and with the widespread changeover from liquid-in-glass thermometers to the maximum minimum temperature sensor (MMTS). With respect to version 1, version 2 trends in maximum temperatures are similar while minimum temperature trends are somewhat smaller because of an apparent over correction in version 1 for the MMTS instrument change, and because of the systematic impact of undocumented station changes, which were not addressed version 1.”
I was invited to review this paper, and to the authors credit, they did make some adjustments to their paper in their revision. Unfortunately, however, they did not adequately discuss a number of remaining bias and uncertainty issues with the U.S. HCN version 2 data.
The United States Historical Climatology Network Monthly Temperature Data – Version 2 still contains significant biases.
My second review of their paper is reproduced below.
Review By Roger A. Pielke Sr. of Menne et al 2009.
Dear Melissa and Chet
I have reviewed the responses to the reviews of the Menne et al paper, and, while they are clearly excellent scientists, and have provided further useful information, unfortunately, they still did not adequately respond to several of the issues that have been raised. I have summarized these issues below:
1. With respect to the degree of uncertainty associated with the homogenization procedure, they misunderstood the comment. The issue is that in the creation of each adjustment [time-of-observation bias, change of instrument], there is a regression relationship that is used to create these adjustments. These regression relationships have an r-squared associated with them as well as a standard deviation. These deviations arise from the adjustment regression evaluation. These values need to be provided (standard deviations, r-squared) for each formula that they use.
Their statement that
“Based on this assessment, the uncertainty in the U.S. average temperature anomaly in the homogenized (version 2) dataset is small for any given year but contributes to an uncertainty to the trends of about (0.004°C)”
is not the correct (complete) uncertainty analysis.
2.
i) With respect to their recognition of the pivotal work of Anthony Watt, while they are clear on this contribution in their response; i.e.
“Nevertheless, we have now also added a citation acknowledging the work of Anthony Watts whose web site is mentioned by the reviewer. Note that we have met personally with Mr. Watts to discuss our homogenization approach and his considerable efforts in documenting the siting characteristics of the HCN are to be commended. Moreover, it would seem that the impetus for modernizing the HCN has come largely as a reaction to his work. “
the text itself is much more muted on this. The above text should, appropriately, be added to the paper.
Also, the authors bypassed the need to provide the existing photographic documentation (as a url) for each site used in their study. They can clearly link in their paper to the website
http://www.surfacestations.org/ for this documentation. Ignoring this source of information in their paper is inappropriate.
ii) On the authors’ response that
“Moreover, it does not necessarily follow that poorly sited stations will experience trends that disagree with well-sited stations simply as a function of microclimate differences, especially during intervals in which both sites are stable. Conversely, the trends between two well-sited stations may differ because of minor changes to the local environment or even because of meso-scale changes to the environment of one or both stations..”
they are making an unsubstantiated assumption on the “stability” of well-sited and poorly-sited stations. What documentation do they have that determines when “both sites are stable”? As has been clearly shown on Anthony Watt’s website, it is unlikely that any of the poorly sited locations have time invariant microclimates.
Indeed, despite their claim that
“We have documented the impact of station changes in the HCN on calculations of U.S. temperature trends and argue that homogenized data are the only way to estimate the climate signal at the surface (which can be important in normals calculations etc) for the full historical record “
is not correct. Without photographs of each site (which now exists for many of them), they have not adequately documented each station.
iii) The authors are misunderstanding the significance of the Lin et al paper. They state
“Moreover, the homogenized HCN minimum temperature data can be thought of as a fixed network (fixed in both location and height). Therefore, the mix of station heights can be viewed as constant throughout the period of record and therefore as providing estimates of a fixed sampling network albeit at 1.5 and 2m (not at the 9m for which differences in trends were found in Oklahoma). Therefore, these referenced papers do not add uncertainty to the HCN minimum temperature trends per se. “
First, as clearly documented on the Anthony Watts website, many of the observing sites are not at the same height above the ground (i.e. not at 1.5m or 2m). Thus, particularly for the minimum temperatures, which vary more with height near the ground, the height matters in patching all of the data together to create long term temperature trends. Even more significant is that the trend will be different if the measurements are at different heights. For example, if there has been overall long term warming in the lower atmosphere, the trends of the minimum temperature at 2m will be significantly larger than when it is measured at 4m (or other higher level). Including minimum temperature trends together will result in an overstatement of the actual warming.
The authors need to discuss this issue. Preliminary analyses have suggested that this warm bias can overstate the reported warming trend by tenths of a degree C.
iv) While the authors seek to exclude themselves from attribution; i.e.
“Our goal is not to attribute the cause of temperature trends in the U.S. HCN, but to produce time series that are more generally free of artificial bias.”
they need to include a discussion of land use/land cover change effects on long term temperature trends, which now has a rich literature. The authors are correct that there are biases associated with non-climatic and microclimate effects in the immediate vicinity of the observation sites (which they refer to as “artificial bias”), and real effects such as local and regional landscape change. However, they need to discuss this issue more completely than they do in their paper, since, as I am sure the Editors are aware, this data is being used to promote the perspective that the radiative effect of the well-mixed greenhouse gases (i.e. “global warming”) is the predominate reason for the positive temperature trends in the USA.
iv) The neglect of using a complementary data analysis (the NARR) because it only begins in 1979 is not appropriate. The more recent years in the HCN analyses would provide an effective cross-comparison. Also, even if the NARR does not separate maximum and minimum temperatures, the comparison could still be completed using the mean temperature trends.
Their statement that
” Given these complications, we argue that a general comparison of the HCN trends to one of the reanalysis products is inappropriate for this manuscript (which is already long by BAMS standards)”
therefore, is not supportable as part of any assessment of the robustness of the trends that they compute. The length issue is clearly not a justifiable reason to exclude this analysis.
In summary, the authors should include the following:
1. In their section “Bias caused by changes to the time of observation”
the regression relationship used in
“…the predictive skill of the Karl et al. (1986) approach to estimating the TOB was confirmed using hourly data from 500 stations over the period 1965-2001 (whereas the approach was originally developed using data from 79 stations over the period 1957-64)”
should be explicitly included with the value of explained variance (i.e. the r-squared value) and standard deviation, rather than referring the reader to an earlier paper. This uncertainty in the adjustment process has been neglected in presenting the trend values with its +/- values.
2. In their section “Bias associated with other changes in observation practice”
the same need to present the regression relationship that is used to adjust the temperatures due to instrument changes; i.e. from
“Quayle et al. (1991) concluded that this transition led to an average drop in maximum temperatures of about 0.4°C and to an average rise in minimum temperatures of 0.3°C for sites with no coincident station relocation.”
What is the r-squared and the standard deviation from which these “averages” were obtained?
3. With respect to “Bias associated with urbanization and nonstandard siting”,
as discussed earlier in this e-mail, the link to the photographs for each site needs to be included and citation to Anthony Watt’s work on this subject more appropriately highlighted.
On the application of “In contrast, no specific urban correction is applied in HCN version 2″, this conclusion conflicts with quite a number of urban-rural studies. They assume “that adjustments for undocumented changepoints in version 2 appear to account for much of the changes addressed by the Karl et al. (1988) UHI correction used in version 1.”
The use of text that concludes that this adjustment process “appear” to account for the urban correction of Karl et al (1988) indicates even some uneasiness by the authors on this issue. They need more text as to why they assume their adjustment can accommodate such urban effects. Moreover, the urban correction in Karl et al is also based on a regression assessment with an explained variance and standard deviation; the same data Karl used should be applied to ascertain if the new “undocumented changepoint adjustment” can reproduce the Karl et al results.
The authors clearly recognize this limitation also in their paragraph that starts with
“It is important to note, however, that while the pairwise algorithm uses a trend identification process to discriminate between gradual and sudden changes, trend inhomogenieties in the HCN are not actually removed with a trend adjustment..”
and ends with
“This makes it difficult to robustly identify the true interval of a trend inhomogeneity (Menne and Williams 2008).”
Yet, despite this clear serious limitation of the ability to quantify long term temperature trends in tenths of a degree C with uncertainties, they present such precise quantitative trends; e.g.
“0.071°and 0.077°C dec-1, respectively” (on page 15).
They also write that
“…there appears to be little evidence of a positive bias in HCN trends caused by the UHI or other local changes”
which ignores detailed local studies that clearly show positive temperature biases; e.g.
Brooks, Ashley Victoria. M.S., Purdue University, May, 2007. Assessment of the Spatiotemporal Impacts of Land Use Land Cover Change on the Historical Climate Network Temperature Trends in Indiana.
Christy, J.R., W.B. Norris, K. Redmond, and K.P. Gallo, 2006, Methodology and results of calculating Central California surface temperature trends: Evidence of human-induced climate change?, J. Climate, 19, 548-563.
Hale, R. C., K. P. Gallo, and T. R. Loveland (2008), Influences of specific land use/land cover conversions on climatological normals of near-surface temperature, J. Geophys. Res., 113, D14113, doi:10.1029/2007JD009548.
4. On the claim that
“However, from a climate change perspective, the primary concern is not so much the absolute measurement bias of a particular site, but rather the changes in that bias over time, which the TOB and pairwise adjustments effectively address (Vose et al. 2003; Menne and Williams 2008) subject to the sensitivity of the changepoint tests themselves.”
this is a circular argument. While I agree it is the changes in bias over time that matter most, without an independent assessment, there is no way for the authors to objectively conclude that their adjustment procedure captures these changes of bias in time.
Their statement that
“Instead, the impact of station changes and non-standard instrument exposure on temperature trends must be determined via a systematic evaluation of the observations themselves (Peterson 2006).”
is fundamentally incomplete. The assessment of the impact “of station changes and non-standard instrument exposure on temperature trends” must be assessed from the actual station location and its changes over time! To rely on the observations to extract this information is clearly circular reasoning.
As a result of these issues, their section “Temperature trends in U.S. HCN” overstate the confidence that should be given to the quantitative values of the trends and the statistical uncertainty in their values.
If this paper is published, the issues raised in this review need to be more objectively and completely presented. It should not be accepted until they do this.
I would be glad to provide further elaboration on the subjects I have presented in this review of their revised paper, if requested.
Best Regards
Roger A. Pielke Sr.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
[SNIP – way way Off Topic, and I’m really growing weary of people putting OT stuff in the very first comment. Just saying “OT” and then posting something completely unrelated is not a license to OT. I busted my butt over many months to get a mention in this paper, so I ask that you have a little respect for the content and focus please – Anthony]
This is a typical example of a dog turning in circles biting it’s own tail.
It takes a lot of patience and perseverance before you can teach the dog to sit and give a paw.
Having tried to understand the histories of several USHCN sites for the SurfaceStations survey, I have serious doubts that is it possible to detect and correct for station relocations. The metadata on such moves are sparse, confusing and essentially unverifiable. We have discovered that many current stations have large microsite issues, but what of stations that moved – sometimes several miles – 20, 40, or 60 years ago? Ignoring this problem of NON-systematic bias is a major flaw in any recalculation/adjustment of the dataset.
This paper confirms that the total adjustments made to the Raw data is +0.425C or +0.765F (from 1920 to today which is the timeline of the maximum adjustment made).
I calculate the total temperature change over this same time period at 0.53C or just 0.1C over 9 decades excluding the adjustments.
Ron de Haan (17:20:30) :
This is a typical example of a dog turning in circles biting it’s own tail.
It takes a lot of patience and perseverance before you can teach the dog to sit and give a paw.
Yeah the tail is so exciting, there it is… it’s gone, ahh it’s back again, bite it whoops, its gone again…. (endless fun in a loop).
If I read the review correctly, it sounded like “start over”.
Read tis article, watch the video:
http://penoflight.com/climatebuzz/?p=543
With all their money, wouldn’t you think auditing the very sites and instruments they depend on would be first priority?
Other Topic…
[snip]
Well, it is clear that Anthony and his team did the job for them.
It will be handled like every “good idea”.
At first the idea is ignored, in the next phase it is attacked and in the end they will try to steal it.
It’s good that we all know what an immense piece of work has been performed (and still has to be performed) and how important this work will be in all the discussions in the near future about the real impact of AGW/Climate Change.
Correct data is the basis for all science, let alone political decisions that will determine the economic and political future of the USA and the world.
That is where this is all about. Our future, our freedom and our prosperity.
Tom in Texas (18:25:22) :
If I read the review correctly, it sounded like “start over”.
I don’t read that at all. I believe that Professor Pielke is providing the authors valuable constructive criticism on some fairly important deficiencies in what is otherwise a good paper.
This is fairly common practice for reviewers of science papers in respectable journals. This is peer review.
I’m not so sure they have much money allocated to properly audit. That there never has a maintenance and calibration of the surface station network let alone an observational survey as undertaken by Anthony Watts speaks volumes.
RPS sums it all up in this sentence:
“The assessment of the impact “of station changes and non-standard instrument exposure on temperature trends” must be assessed from the actual station location and its changes over time! To rely on the observations to extract this information is clearly circular reasoning.”
I could only imagine what an auditor would do if my lab were run like NCDC 🙂
It might just be an American-English v English-English thing, but I have difficulty with the problems identified by Mr Watts’ survey being described as “systematic” rather than “systemic”.
To describe something as systematic means that it follows a system, where “system” is used to mean a pattern or a particular way of doing things. There is an element of stability to systematic problems because the system stays the same. Apparent anomalies in the results can be adjusted for because you can calculate a theoretical margin of error from a close examination of the established system. All results are subject to that margin of error.
Where problems arise because a system (meaning, in this context, an arrangement of things or rules) contains inherent flaws, those problems are systemic not systematic. Where the system (that is, the arrangement of things or rules) is not constant you cannot calculate a single margin for error for the system as a whole and at all times. A separate adjustment is required for each change in the system.
The crucial difference between systematic and systemic errors is illustrated very well by the surfacestations project. That project has established, for the 70% or so of stations surveyed to date, that the vast majority have changed in location or surroundings to a very substantial extent since they were first established. The problem is systemic in that the system of surface stations today is radically different from the system in place just thirty years ago, perhaps even ten years ago.
Since I first came across the surfacestations project a year or so ago, I have tried to wrap what I laughably call my mind around the concept of calculating an average surface temperature for the USA, or any individual state thereof, and calculating a trend when the margin of error in each measurement is incalculable because of the changing physical conditions under, over and around each measuring point.
There is always an answer to calculating averages and trends from an unstable body of measuring devices and that is to acknowledge the need for a very wide margin of error. On any view of it, it seems to me that the necessary margin of error for the whole wobbly mass of US surface stations must be so high that no narrowly-defined conclusion can be drawn.
I have a historical bias question:
I just received my daily weather records for Weaverville Ranger Station from 1894 to 2009 from the Western Regional Climate Center in Reno, NV.
From the Wiki History of SR299 (US 299 prior to 1934) the highway next to the weather station was originally concrete instead of asphalt.
How much difference would concrete vs asphalt make for a UHI?
“On any view of it, it seems to me that the necessary margin of error for the whole wobbly mass of US surface stations must be so high that no narrowly-defined conclusion can be drawn.” Aye, there’s the rub. Propagate those errors through non-linear recursive models and the predictions diverge so rapidly that any realistic temperature is equally possible along with a raft of unrealistic ones. So we see Faux Errors being reported in the form of ensemble means and ensemble ranges where the mutual consistency of the models is said to prove something about the real world instead of something about the models.
@FatBigot – since my days in undergrad physics, “systematic error” has always meant basically non-random error. Error arising from miscalibration, or other properties of the system that usually lead to a consistent measurement bias in one direction or the other.
I don’t know about the brit vs. american English angle, but we use “systemic” on this side of the pond as well – I’d consider them basically synonymous – though I’ve never heard anyone use the term “systemic error”.
I’m confused. Mr. Pielke begins his review with: “they are clearly excellent scientists” and then takes their work apart with criticisms like, “fundamentally incomplete,” etc. etc.. Mr. Pielke’s criticisms were clearly stated and, as far as I can see, this study is seriously flawed. After reading Anthony’s report and other blog posts too, I can’t believe that these “scientists” could have any certainty, at all, as to the accuracy of the historical temperature data. Is Mr. Pielke just being nice in his remark or, can you write a paper this bad and still be an “excellent scientist”?
Wow, now I really hate to tiptoe off topic, but this can’t wait. Released today is a blockbuster memo from the White House’s Office of Management and Budget alleging that the Precautionary Principle has been stretched beyond the science, that the EPA’s finding of CO2 endangerment is insufficiently documented, and that EPA regulation of CO2 would lead to serious economic damage. Apparently in response, EPA Director Lisa Jackson told Congress today that the EPA endangerment finding won’t necessarily lead to regulation.
This memo gives heavy legal ammunition to anyone suing the EPA over regulations, it pressurizes the debate in Congress over Cap and Trade, and, by tomorrow, ought to give the opponents of Carbon encumbrance in Congress some heavy artillery.
Code Blue, Code Blue, the CO2=AGW paradigm is flat-lining.
========================================
Their statement that
“Based on this assessment, the uncertainty in the U.S. average temperature anomaly in the homogenized (version 2) dataset is small for any given year but contributes to an uncertainty to the trends of about (0.004°C)”
So now we’re calculating things out to 4/1000 C and asserting that is the most uncertainty possible when the raw data are in full degrees F.
Sigh. Does no one “get it” that you can not take a set of full degree F single samples (for a single location at a single time – no oversampling) average them all together and get any more precision than full degrees F?
It’s called FALSE PRECISION and it’s WRONG.
If any 60 samples for one month for a place average together to 12.00001
all I can know is that it could be 11.00002 or it could be 13.00000 or it could be anywhere in between. I can say that I have “12 +/- 1 F” but I can not say that I have 12.00001 F.
The “Monthly Average Temperature” is not a physical thing that can be over sampled (measured several times to create better precision than the measuring device itself supports). It is a mathematical creation and so is limited to the original precision of the raw data and can never have a higher precision than that.
Further, any calculations that use that monthly average will also be limited to whole degree F precision. That applies to the calculated anomalies. But they ought to have less precision than whole degrees F due the large number of calculations done to get to that anomaly result. Error accumulates.
Just because your calculator has 10 digits of precision it doesn’t mean they have any meaning.
First, that is a well articulated review and Roger Pielke Sr. is thanked, by me at least, for the effort this involved. These things take a lot of time reading, checking, reading again, and so on. Impressive!
Second, Anthony – “muted” is way better than “ignored.” They have problems with this data and they admit knowing it thanks to the Surface Station Project. Bravo!
Third, they ought to publish this with a black box warning on the front and back covers that while the data base is improved it is still not adequate for world, regional and long term comparisons. And, values such as averages should not be presented with digits to the right of the decimal. Any published work using these data should carry this warning.
Fourth, they should commit to at least a 10% sample of stations in an attempt to figure out what has gone on and what, if anything, can be done about it. They may find out they can’t make this rattletrap collection of stations into a scientific instrument at any cost.
Fifth, any researcher using this data should have to sign a statement that they have read and understand your recent Heartland publication:
http://www.heartland.org/books/SurfaceStations.html
Anthony,
Maybe you have been asked this before, but is there any work underway to create a temperature record using just using CRN-1 / 2 surface stations? Looking at your map of surveyed stations, while there very few of these, they do seem more evenly distributed and greater in number than stations used in a recent Antarctic temperature reconstruction attempt.
The multiple issues you have identified with stations rated below CRN-2 would indicate that TOB and UHI adjustments would be unlikely to make data from these stations usable. However it seems possible that such adjustments could be applied to CRN-1 / 2 data. The GISS “Lights” adjustment may even be usable.
Never judge a duck by its cover.
Eric Naegle (20:31:08):
I’m confused. Mr. Pielke begins his review with: “they are clearly excellent scientists” and then takes their work apart with criticisms like, “fundamentally incomplete,” etc. etc.. Mr. Pielke’s criticisms were clearly stated and, as far as I can see, this study is seriously flawed. After reading Anthony’s report and other blog posts too, I can’t believe that these “scientists” could have any certainty, at all, as to the accuracy of the historical temperature data. Is Mr. Pielke just being nice in his remark or, can you write a paper this bad and still be an “excellent scientist”?
Roger Pielke is applying rules of etiquette and good manners on writings. In any formal writing, its author first mentions something favorable and sensitively acceptable; after the pleasing introduction, the author mentions the inauspicious aspects.
Great work Anthony,
I had a suggestion for a practical use for the work you’ve done grading the USHCN stations. Since you have survey data on 80% of the stations, and 11% of those stations are CRN-1 and 2, it seems to me that computing the temperature anomaly using just those stations could be used to gage the UHI signal from the total USHCN network.
Thanks for sharing this demonstration of how peer review ought to work. Of course peer review is not just about pre-publication review of a paper; the review continues on (or it should) well after a paper is published. And then there is the other shocker – scientists are people too – they’re human; and when you raise a criticism of their work, it’s best to do a little ego stroking.
I was especially pleased to see this recognition of the work done by Anthony and the Surfacestations Project, “… Moreover, it would seem that the impetus for modernizing the HCN has come largely as a reaction to his work. “
I think it entirely appropriate that Dr. Pielke calls for greater reference to the work done by the project – links to the site, for example.
This paper sounds like it is another step in the right direction, albeit a slow, tentative step. I guess we’ll just have to wait for the hard part to get done – just how much of a bias has been introduced due to bad siting, etc.