
by Anthony Watts
There has been a lot of buzz about the Menne et al 2010 paper “On the reliability of the U.S. Surface Temperature Record” which is NCDC’s response to the surfacestations.org project. One paid blogger even erroneously trumpeted the “death of UHI” which is humorous, because the project was a study about station siting issues, not UHI. Anybody who owns a car with a dashboard thermometer who commutes from country to city can tell you about UHI.
There’s also claims of this paper being a “death blow” to the surfacestations project. I’m sure in some circles, they believe that to be true. However, it is very important to point out that the Menne et al 2010 paper was based on an early version of the surfacestations.org data, at 43% of the network surveyed. The dataset that Dr. Menne used was not quality controlled, and contained errors both in station identification and rating, and was never intended for analysis. I had posted it to direct volunteers to so they could keep track of what stations had been surveyed to eliminate repetitive efforts. When I discovered people were doing ad hoc analysis with it, I stopped updating it.
Our current dataset at 87% of the USHCN surveyed has been quality controlled.
There’s quite a backstory to all this.
In the summer, Dr. Menne had been inviting me to co-author with him, and our team reciprocated with an offer to join us also, and we had an agreement in principle for participation, but I asked for a formal letter of invitation, and they refused, which seems very odd to me. The only thing they would provide was a receipt for my new data (at 80%) and an offer to “look into” archiving my station photographs with their existing database. They made it pretty clear that I’d have no significant role other than that of data provider. We also invited Dr. Menne to participate in our paper, but he declined.
The appearance of the Menne et al 2010 paper was a bit of a surprise, since I had been offered collaboration by NCDC’s director in the fall. In typed letter on 9/22/09 Tom Karl wrote to me:
“We at NOAA/NCDC seek a way forward to cooperate with you, and are interested in joint scientific inquiry. When more or better information is available, we will reanalyze and compare and contrast the results.”
“If working together cooperatively is of interest to you, please let us know.”
I discussed it with Dr. Pielke Sr. and the rest of the team, which took some time since not all were available due to travel and other obligations. It was decided to reply to NCDC on a collaboration offer.
On November 10th, 2009, I sent a reply letter via Federal Express to Mr. Karl, advising him that we would like to collaborate, and offered to include NCDC in our paper.. In that letter I also reiterated my concerns about use of the preliminary surfacestation data (43% surveyed) that they had, and spelled out very specific reasons why I didn’t think the results would be representative nor useful.
We all waited, but there was no reply from NCDC to our reply to offer of collaboration by Mr. Karl from his last letter. Not even a “thank you, but no”.
Then we discovered that Dr. Menne’s group had submitted a paper to JGR Atmospheres using my preliminary data and it was in press. This was a shock to me since I was told it was normal procedure for the person who gathered the primary data the paper was based on to have some input in the review process by the journal.
NCDC uses data from one of the largest volunteer organization in the world, the NOAA Cooperative Observer Network. Yet NCDC director Karl, by not bothering to reply to our letter about an offer he initiated, and by the journal not giving me any review process opportunity, extends what Dr. Roger Pielke Senior calls “professional discourtesy” to my own volunteers and my team’s work. See his weblog on the subject:
Professional Discourtesy By The National Climate Data Center On The Menne Et Al 2010 paper
I will point out that Dr. Menne provided thanks to me and the surfacestations volunteers in the Menne et al 2010 paper, and I hear through word of mouth, also in a recent verbal presentation. For that I thank him. He has been gracious in his communications with me, but I think he’s also having to answer to the organization for which he works and that limited his ability to meet some of my requests, like a simple letter of invitation.
Political issues aside, the appearance of the Menne et al 2010 paper does not stop the surfacestations project nor the work I’m doing with the Pielke research group to produce a peer reviewed paper of our own. It does illustrate though that some people have been in a rush to get results. Texas state Climatologist John Nielsen-Gammon suggested way back at 33% of the network surveyed that we had a statistically large enough sample to produce an analysis. I begged to differ then, at 43%, and yes even at 70% when I wrote my booklet “Is the US Surface Temperature Record Reliable?, which contained no temperature analysis, only a census of stations by rating.
The problem is known as the “low hanging fruit problem”. You see this project was done on an ad hoc basis, with no specific roadmap on which stations to acquire. This was necessitated by the social networking (blogging) Dr. Pielke and I employed early in the project to get volunteers. What we ended up getting was a lumpy and poorly spatially distributed dataset because early volunteers would get the stations closest to them, often near or within cities.
The urban stations were well represented in the early dataset, but the rural ones, where we believed the best siting existed, were poorly represented. So naturally, any sort of study early on even with a “significant sample size” would be biased towards urban stations. We also had a distribution problem within CONUS, with much of the great plains and upper midwest not being well represented.
This is why I’ve been continuing to collect what some might consider an unusually large sample size, now at 87%. We’ve learned that there are so few well sited stations, the ones that meet the CRN1/CRN2 criteria (or NOAA’s 100 foot rule for COOPS) are just 10% of the whole network. See our current census:

When you have such a small percentage of well sited stations, it is obviously important to get a large sample size, which is exactly what I’ve done. Preliminary temperature analysis done by the Pielke group of the the data at 87% surveyed looks quite a bit different now than when at 43%.
It has been said by NCDC in Menne et al “On the reliability of the U.S. surface temperature record” (in press) and in the June 2009 “Talking Points: related to “Is the U.S. Surface Temperature Record Reliable?” that station siting errors do not matter. However, I believe the way NCDC conducted the analysis gives a false impression because of the homogenization process used. As many readers know, the FILNET algorithm blends a lot of the data together to infill missing data. This means temperature data from both well sited and poorly sited stations gets combined to infill missing data. The theory is that it all averages out, but when you see that 90% of the USHCN network doesn’t meet even the old NOAA 100 foot rule for COOPS, you realize this may not be the case.
Here’s a way to visualize the homogenization/FILNET process. Think of it like measuring water pollution. Here’s a simple visual table of CRN station quality ratings and what they might look like as water pollution turbidity levels, rated as 1 to 5 from best to worst turbidity:





In homogenization the data is weighted against the nearby neighbors within a radius. And so a station might start out as a “1” data wise, might end up getting polluted with the data of nearby stations and end up as a new value, say weighted at “2.5”. Even single stations can affect many other stations in the GISS and NOAA data homogenization methods carried out on US surface temperature data here and here.

In the map above, applying a homogenization smoothing, weighting stations by distance nearby the stations with question marks, what would you imagine the values (of turbidity) of them would be? And, how close would these two values be for the east coast station in question and the west coast station in question? Each would be closer to a smoothed center average value based on the neighboring stations.
Essentially, in my opinion, NCDC is comparing homogenized data to homogenized data, and thus there would not likely be any large difference between “good” and “bad” stations in that data. All the differences have been smoothed out by homogenization (pollution) from neighboring stations!
The best way to compare the effect of siting between groups of stations is to use the “raw” data, before it has passed through the multitude of adjustments that NCDC performs. However NCDC is apparently using homogenized data. So instead of comparing apples and oranges (poor sited -vs- well sited stations) they essentially just compare apples (Granny Smith -vs- Golden delicious) of which there is little visual difference beyond a slight color change.
We saw this demonstrated in the ghost authored Talking Points Memo issued by NCDC in June 09 in this graph:

Referencing the above graph, Steve McIntyre suggested in his essay on the subject:
The red graphic for the “full data set” had, using the preferred terminology of climate science, a “remarkable similarity” to the NOAA 48 data set that I’d previously compared to the corresponding GISS data set here (which showed a strong trend of NOAA relative to GISS). Here’s a replot of that data – there are some key telltales evidencing that this has a common provenance to the red series in the Talking Points graphic.

When I looked at SHAP and FILNET adjustments a couple of years ago, one of my principal objections to these methods was that they adjusted “good” stations. After FILNET adjustment, stations looked a lot more similar than they did before. I’ll bet that the new USHCN adjustments have a similar effect and that the Talking Points memo compares adjusted versions of “good” stations to the overall average.
There’s references in the new Menne et al 2010 paper to the new USHCN2 algorithm and we’ve been told how it is supposed to be better. While it does catch undocumented station moves that USHCN 1 did not, it still adjusts data at USHCN stations in odd ways, such as this station in rural Wisconsin, and that is the crux of the problem.

Or this one in Lincoln, IL at the local NWS office where they took great effort to have it well sited.


Thanks to Mike McMillan for the graphs comparing USHCN1 and USHCN2 data
Notice the clear tendency in the graphs comparing USHCN1 to USHCN2 to cool off the early record and leave the current levels near recently reported levels or to increase them. The net result is either reduced cooling or enhanced warming not found in the raw data.
As for the Menne et all 2010 paper itself, I’m rather disturbed by their use of preliminary data at 43%, especially since I warned them that the dataset they had lifted from my website (placed for volunteers to track what had been surveyed, never intended for analysis) had not been quality controlled at the time. Plus there are really not enough good stations with enough spatial distribution at that sample size. They used it anyway, and amazingly, conducted their own secondary survey of those stations, comparing it to my non-quality controlled data, implying that my 43% data wasn’t up to par. Well of course it wasn’t! I told them about it and why it wasn’t. We had to resurvey and re-rate a number of stations from early in the project.
This came about only because it took many volunteers some time to learn how to properly ID them. Even some small towns have 2-3 COOP stations nearby, and only one of them is “USHCN”. There’s no flag in the NCDC metadatabase that says “USHCN”, in fact many volunteers were not even aware of their own station status. Nobody ever bothered to tell them. You’d think if their stations were part of a special subset, somebody at NOAA/NCDC would notify the COOP volunteer so they would have a higher diligence level?
If doing an independent stations survey was important enough for NCDC to do to compare to my 43% data now for their paper, why didn’t they just do it in the first place?
I have one final note of interest on the station data, specifically the issue of MMTS thermometers and their tendency to be sited closer to building due to cabling issues.
Menne et al 2010 mentioned a “counterintuitive” cooling trend in some portions of the data. Interestingly enough, former California State Climatologist James Goodridge did an independent analysis ( I wasn’t involved in data crunchng, it was a sole effort on his part) of COOP stations in California that had gone through modernization, switching from Stevenson Screens with mercury LIG thermometers to MMTS electronic thermometers. He sifted through about 500 COOPs in California and chose stations that had at least 60 years of uninterrupted data, because as we know, a station move can cause all sorts of issues. He used the “raw” data from these stations as opposed to adjusted data.
He writes:
Hi Anthony,
I found 58 temperature station in California with data for 1949 to 2008 and where the thermometers had been changed to MMTS and the earlier parts were liquid in glass. The average for the earlier part was 59.17°F and the MMTS fraction averaged 60.07°F.
Jim
A 0.9F (0.5C) warmer offset due to modernization is significant, yet NCDC insists that the MMTS units are tested at about 0.05C cooler. I believe they add this adjustment into the final data. Our experience shows the exact opposite should be done and with a greater magnitude.
I hope to have this California study published here on WUWT with Jim soon.
I realize all of this isn’t a complete rebuttal to Menne et al 2010, but I want to save that option for more detail for the possibility of placing a comment in The Journal of Geophysical Research.
When our paper with the most current data is completed (and hopefully accepted in a journal), we’ll let peer reviewed science do the comparison on data and methods, and we’ll see how it works out. Could I be wrong? I’m prepared for that possibility. But everything I’ve seen so far tells me I’m on the right track.
If doing a stations survey was important enough for NCDC to do to compare to my data now for their paper, why didn’t they just do it in the first place?
We currently have 87% of the network surveyed (1067 stations out of 1221), and it is quality controlled and checked. I feel that we have enough of the better and urban stations to solve the “low hanging fruit” problem of the earlier portion of the project. Data at 87% looks a lot different than data at 43%.
The paper I’m writing with Dr. Pielke and others will make use of this better data, and we also use a different procedure for analysis than what NCDC used.
Facts have an amazing way of cutting through political BS. I look at the UAH lower tropsphere data today and see significant warming over the year 2000. Warming. I do not care about your statistical follies. I look at the ground truth. What can be measured and recorded. All I see is warming. What do you suggest? Nothing is changing, or it is all natural change? The first is just denial. The second sort of begs the question, ” well, shouldn’t we do something to limit the impact anyway?”. So much Arrogance, Ignorance, and Greed in this blog… A.I.G. where have I seen that recently? I wonder if it could be related? The question is not “watts up with that” (sic) but WHYs up with that. Why this obfuscation and half truth? Why this obtuse denial of real science and the scientific method? This debate is actually no longer about science. Science on America’s Right is dead. Knifed through heart. All that is left is a dog fight for ideology and political power… ask Chris Horner from the CEI, he is the one who wrote it. Read everything in this blog in that context. Most of what you read is written with the full knowledge of how disingenuous it is. Every fight has a loser. The irony is that if y’all “win” we will all lose. Sad.
David Alan Evans (19:36:34) :
ShaneOfMelbourne.
I’ve seen you elsewhere on Australian sites.
Sorry Dave, you are mistaken. I have made a few posts here, but read quite a bit, looking at both sides of the debate. I have never made a post on another climate change site, so you must have me confused with someone else.
Hi Anthony, clearly the Menne paper has a significant methodological flaw that you have highlighted. I assume that with Roger Pielke you are just looking at the data from USHCN 1 and 2 stations compared with all data homogenised. This would be an interesting comparison and as close to definitive as you will get. At least the Menne paper provides you with methods to compare the data.
Is this data available currently (I presume not from your post). It would be nice if we could see it after you have published. At least with this publication prior, it should be relatively easy to get accepted as the site selection and analysis will clearly be more informative with a larger dataset.
I look for ward to the results.
Pat Frank (18:06:27) wrote:
“It’s inconceivable that Drs. Menne and Karl did not consciously know they were violating a very basic ethical principle of science.”
[to which Anthony replied:]
“I made the same arguments with them that you cite, they dismissed them”
Yet another instance (as if more were needed) in which those with a vested interest in promoting the AGW “message” demonstrate that their standard operating procedures do not include adherence to any ethical principles.
It’s almost as if they’ve all been contaminated by some kind of virus and they are refusing the only treatment available: a truth serum.
Incidentally, I gridded the US 20th century USHCN1 data (using 5°x5° grids).
If you average all the station trends, ungridded and equally weighted, you get +0.141C/century warming for raw data, +0.314 for TOBS and +0.588 for FILNET.
The gridded data (i.e., the average of all grids as opposed to the average of all stations), shows +0.268C/century for raw, +0.430 for TOBS, and +0.699 for FILNET.
evanmjones (21:02:57) :
As a scientist far more interested in truth than ideology, I welcome and encourage comparitive analyses, at all times. Menne, et al. analyzed publicly available data and were quite effusive in their acknowledgments. There’s no conspiracy or hoax there.
[REPLY – Oh, I’m not saying there is any hoax or conspiracy involved. I think Menne was doing the best he could with what he had. But he overlooked a vital, critical part of the analysis which turns his conclusion completely on its head. When the paper comes out, I will discuss this in detail. ~ Evan]
Tim (16:38:05) wrote:
“The Competitive Enterprise Institute today charged that a senior official of the U.S. Environment Protection Agency actively suppressed a scientific analysis of climate change […] ”
And you can read all about this at the website of the (very courageous, IMHO) person whose work was suppressed:
http://www.carlineconomics.com
When I first saw the Menne paper I was floored that so many people could buy into it. It was too good to be true. The notion that a sample size of less than 6% could so closely match the population size so closely was in my opinion ludicrous. My statistics may not be the best but I calculated a confidence interval of such a sample size to be approximately 11%.
It is easy to imagine several reasons why once a day thermometer records would not agree with subsequent thermocouple or thermistor devices recording many times a day. Logically, if there was not to be an expected difference, it would be hard to justify the expense of the change.
Now, it’s hard to justify the expense of spending more public funds on fiddling what is a flawed record. Would it not be better for authorities to do a better job with station-by-station metadata sheets instead of inventing New Math? That is the subject of a song by math lecturer Tom Lehrer partly reproduced here
evanmjones (21:02:57) :
[REPLY – Oh, I’m not saying there is any hoax or conspiracy involved. I think Menne was doing the best he could with what he had. But he overlooked a vital, critical part of the analysis which turns his conclusion completely on its head. When the paper comes out, I will discuss this in detail. ~ Evan]
That first sentence puts you in the minority on this blog, I’m afraid. Will you be submitting your paper for publication?
[REPLY – Perhaps it does. But I am not willing to impugn Dr. Menne’s motives. We all know how scientists fall in love with their theories. (A scientist is got to dream, boy. It comes with the territory.) That’s one of the reasons peer and independent review is necessary. Without going into any detail (not being at liberty to do so), yes, what I am alluding to will be submitted for publication. ~ Evan]
Thankyou Wattsy well done.
The thing that upsets me is that these wags at NOAA GISS etc treat people like Anthomy and S McIntyre and others as somehow….I dunno, inferior? not worthy? contempt?
The smug arrogant so and so’s have it coming to ’em and doesn’t this really give it to ’em.
Even with the MIGHT of NASA behind it, goliath suffers a gaping wound at the hands of (David) Watts.
Well done again “David” the Thermometre Terminator
[REPLY – Well, okay, maybe they’re not frauds or conspirators. But they sure as shooting ARE intolerable snobs! ~ Evan]
Will you all excuse me if I am pedantic for a moment. I think that it would bring clarity to the discussion if, when we discuss UHI effects, we use the term “delta UHI” effects, meaning change in the UHI effect over time.
As observed above, anybody with a thermometer in their car can observe UHI effects when driving from a rural location into a city. However, from the viewpoint of the temperature record, that, of itself, isn’t all that important.
What IS important is the change (ie ‘delta’) in UHI effect over time. In many cases, a temperature station might have started off as a truly rural location (I for one do not accept that a town of 10000 people is ‘rural’ for this purpose!) say in 1900, but then a major city might have grown around the location over the subsequent period. That growth of human population would have caused the UHI effect to increase over time, thus exaggerating the warming compared with a truly rural station.
Some urban locations might actually, for some reason, lose population. And it is conceivable that the delta UHI effect at that location could be negative over time.
More precision on this matter would be useful.
Evan has alluded to this at points, but I have to say that the times I thought baby jesus tear ducts were welling up was when I was in 20-miles-from-the-nearest-McDonalds, South Dakota. . . five miles on dirt roads to get there. . . 160 acre farm surrounded to the horizon by other 160 acre farms. . .and there’s the MMTS nestled up 10 feet from the farmhouse. And 100 yr old S. Dakota farmhouses are not exactly known for their excellence in insulation.
[REPLY – Note that geo has done quite a number of site surveys. ~ Evan]
Phil M (21:35:04) “That first sentence puts you in the minority on this blog, I’m afraid.” I remember all the talk of “big oil” from the pro-AGW side long before your statement. That is a whopper when it comes to conspiracy theories. As Barnum and Bailey said “Come see the egress”. Submit that for publication.
[REPLY – Well, I think the peak-oil crowd is also simply in error. They just seem to forget the honorable and ancient dictum, “peek and ye shall find”. A very basic, but woefully common “Club-of-Rome” style error. ~ Evan]
PhilM (19:37:08) , it’s not that Anthony’s data was public. I made it clear in my post at (18:06:27) that the issue was about priority of use. Anthony and his volunteers collected those data. First use, by any worthy scientific ethic, belonged solely to them. Dr. Menne and Dr. Karl absconded with that priority ownership in a clear violation of professional ethics.
Since posting, I’ve looked at Anthony’s 2009 Heartland report, “Is the US Surface Temperature Record Reliable?” and at Menne, et al. 2010 “On the reliability of the U.S. Surface Temperature Record,” as linked above. It’s very clear that Dr. Menne did not rely on the data in Anthony’s 2009 report to inform his analysis, which would have been completely acceptable.
Instead, he and his coauthors document that they used the full data set then at Anthony’s surfacestations.org, although they’re very coy about explicity stating that they just went right ahead and downloaded it directly. Maybe the admission looked too stark in print.
They then used the information obtained from surfacestations.org to collate and categorize the stations and to direct their analysis. This is highly unethical, in that it absconds the priority of analysis and publication from Anthony and his co-workers. Dr. Menne, et al., knew from conversations with Anthony that he intended his own analysis of his data. Knowing this, they went ahead and preempted his right.
REPLY: Just wondering Pat, are you a member of the AGU? – Anthony
“ShaneOfMelbourne (18:14:59) :
eager to get both sides of the debate, I started to watch Monckton, but have had to stop after a couple of minutes. The part about Haitians living on and now, no longer being able to afford Mud Pies, was too much for me.”
Anyone thinking Haitians are wealthy enough to not be eating mud pies should really do some reading.
http://www.telegraph.co.uk/news/worldnews/1577057/Haitis-rising-food-prices-drive-poor-to-eat-mud.html
Yes they do!
mondo: Yes, quite. As a matter of fact, I looked at urban vs. rural USHCN1 trends [sic] (as compared with their background grids), and what NOAA considers to be urban sites clock in at a solid 0.5C/century greater trend than suburban/rural sites of equivalent microsite quality. As for what NOAA classifies as urban, suburban, or rural is a matter of some dispute.
An interesting side-note is that, on average the micrositing of urban US sites is slightly superior to suburban/rural. This is quite unexpected. It also strengthens the argument that meso/macrosite is every bit as important a factor as microsite when it comes to climate stations.
Sorry for the repeated response, but my blood flash boiled.
with 2% of the stations being CRN1 no responsible analyst would suggest moving forward before a complete census was taken..
Oh, forget it! We had to lump the CRN1s and 2s together.
Besides, all but 3 (count ’em, THREE) CRN1 stations are airports. And we had to separate out airports for a whole slew of reasons.
Once you yank out all the CRN1&2 stations that are airport sites, there are only 59 left! (And one of those is at a Wastewater Plant.)
So, when the dust clears, only 6% of USHCN stations are worth the paper their B-91 forms are printed on!
As the person who did the survey of the Lincoln, IL site – it’s nice to see it cited in a post 🙂 Yes, the site is the way it should be – no buildings close by, no roads close by, no tarmac or parking lots close by. It was a refreshing change from most of the sites I surveyed in Illinois.
Pat Frank (21:52:20) :
Whatever the nuance, two things are certain: 1) Anthony put the data on the internet 2) Someone downloaded it and used it. It wasn’t hacked or stolen.
His reluctance to release data seems at odds with accusations leveled at AGW climate scientists. Particularly since it is, in essence, an inventory of public property. It’s not clear to me why all data should be shared – except now. If they want to use incomplete, non-QA’d data, let them. Then point it out later when an article is submitted for publication. I disagree with the commonly used tactic of casting dispersions on well educated, highly trained scientists who make a career, not a hobby, of collecting and analyzing climate data.
(Excuse the Aussie lingo in this post…It is how our current Prime Minister speaks at times to impress the country blokes!) )It is obvious that there will be adverse warming effects from poorly sited such as many tarmac and air conditiooning vent etc affected sited screens.
And by siting from country to city. If this were not the case we may as well throw science out the window…It is obvious to a child…fair suck of the sauce bottle, mate!…And how can one compare global temps when they vary from 6,000 stations to 2,000 stations as one moves through the years…fair shake of the sauce bottle again mate…and they are all homoginsd like modern milk…fair squeeze of the sauce bottle mate…And the sites that are deleted are the least variation or negative tendency of temp in the main…fair whack of the sauce bottle mate…Wake up mate it is a big scam or at the least biased to the extreme in favour of the result one wants to try and prove a point!
tfp (19:02:11) :
Anthony, Your withholding of the data is just as bad as CRU.
The only difference is you cannot be subject to FOIA!
Free the Data! Please.”
There is a point that NOBODY gets about this. Anthony HAS FREED HIS DATA.
The DATA is the photos. The METHOD is the rating system. The RESULT
is what you get when you apply the METHOD to the DATA.
What Anthony first released were preliminary RESULTS. results of applying a method to a set of photos. Menne picked up that result. that result had not been peer reviewed. In fact Menne ( and other like me) was informed that the result had to be QCed.
So please, understand what the RATING IS. the rating ( 1-5) is the RESULT
of applying a METHOD ( the rating system) to the DATA (photos)
So its actually worse than Anthony explains. Menne could have used more than 43%. he could have just taken the time to look at the photos and apply the method. But why do that work when you can just snag a random file off the internet and hit run. but hey its climate science
It is my contention that the global temp trend is governed only by the stations with the least warming, or the greatest cooling. The coolest station in a grid should be fixed and all others homogenized toward that station. If the globe was warming, the coldest stations would be warming. If a station is cooling, it doesn’t matter what the rest of the heat islands are doing, the globe is still cooling. The very idea that you would use heat island thermometers to measure global temp is ludicrous. It is like measuring the temp trend in your house by setting the thermometer next to the heater while it is on… Duuu. If the family room at the other end of the house is still cooling, the house is still cooling.
Stephen