
by Anthony Watts
There has been a lot of buzz about the Menne et al 2010 paper “On the reliability of the U.S. Surface Temperature Record” which is NCDC’s response to the surfacestations.org project. One paid blogger even erroneously trumpeted the “death of UHI” which is humorous, because the project was a study about station siting issues, not UHI. Anybody who owns a car with a dashboard thermometer who commutes from country to city can tell you about UHI.
There’s also claims of this paper being a “death blow” to the surfacestations project. I’m sure in some circles, they believe that to be true. However, it is very important to point out that the Menne et al 2010 paper was based on an early version of the surfacestations.org data, at 43% of the network surveyed. The dataset that Dr. Menne used was not quality controlled, and contained errors both in station identification and rating, and was never intended for analysis. I had posted it to direct volunteers to so they could keep track of what stations had been surveyed to eliminate repetitive efforts. When I discovered people were doing ad hoc analysis with it, I stopped updating it.
Our current dataset at 87% of the USHCN surveyed has been quality controlled.
There’s quite a backstory to all this.
In the summer, Dr. Menne had been inviting me to co-author with him, and our team reciprocated with an offer to join us also, and we had an agreement in principle for participation, but I asked for a formal letter of invitation, and they refused, which seems very odd to me. The only thing they would provide was a receipt for my new data (at 80%) and an offer to “look into” archiving my station photographs with their existing database. They made it pretty clear that I’d have no significant role other than that of data provider. We also invited Dr. Menne to participate in our paper, but he declined.
The appearance of the Menne et al 2010 paper was a bit of a surprise, since I had been offered collaboration by NCDC’s director in the fall. In typed letter on 9/22/09 Tom Karl wrote to me:
“We at NOAA/NCDC seek a way forward to cooperate with you, and are interested in joint scientific inquiry. When more or better information is available, we will reanalyze and compare and contrast the results.”
“If working together cooperatively is of interest to you, please let us know.”
I discussed it with Dr. Pielke Sr. and the rest of the team, which took some time since not all were available due to travel and other obligations. It was decided to reply to NCDC on a collaboration offer.
On November 10th, 2009, I sent a reply letter via Federal Express to Mr. Karl, advising him that we would like to collaborate, and offered to include NCDC in our paper.. In that letter I also reiterated my concerns about use of the preliminary surfacestation data (43% surveyed) that they had, and spelled out very specific reasons why I didn’t think the results would be representative nor useful.
We all waited, but there was no reply from NCDC to our reply to offer of collaboration by Mr. Karl from his last letter. Not even a “thank you, but no”.
Then we discovered that Dr. Menne’s group had submitted a paper to JGR Atmospheres using my preliminary data and it was in press. This was a shock to me since I was told it was normal procedure for the person who gathered the primary data the paper was based on to have some input in the review process by the journal.
NCDC uses data from one of the largest volunteer organization in the world, the NOAA Cooperative Observer Network. Yet NCDC director Karl, by not bothering to reply to our letter about an offer he initiated, and by the journal not giving me any review process opportunity, extends what Dr. Roger Pielke Senior calls “professional discourtesy” to my own volunteers and my team’s work. See his weblog on the subject:
Professional Discourtesy By The National Climate Data Center On The Menne Et Al 2010 paper
I will point out that Dr. Menne provided thanks to me and the surfacestations volunteers in the Menne et al 2010 paper, and I hear through word of mouth, also in a recent verbal presentation. For that I thank him. He has been gracious in his communications with me, but I think he’s also having to answer to the organization for which he works and that limited his ability to meet some of my requests, like a simple letter of invitation.
Political issues aside, the appearance of the Menne et al 2010 paper does not stop the surfacestations project nor the work I’m doing with the Pielke research group to produce a peer reviewed paper of our own. It does illustrate though that some people have been in a rush to get results. Texas state Climatologist John Nielsen-Gammon suggested way back at 33% of the network surveyed that we had a statistically large enough sample to produce an analysis. I begged to differ then, at 43%, and yes even at 70% when I wrote my booklet “Is the US Surface Temperature Record Reliable?, which contained no temperature analysis, only a census of stations by rating.
The problem is known as the “low hanging fruit problem”. You see this project was done on an ad hoc basis, with no specific roadmap on which stations to acquire. This was necessitated by the social networking (blogging) Dr. Pielke and I employed early in the project to get volunteers. What we ended up getting was a lumpy and poorly spatially distributed dataset because early volunteers would get the stations closest to them, often near or within cities.
The urban stations were well represented in the early dataset, but the rural ones, where we believed the best siting existed, were poorly represented. So naturally, any sort of study early on even with a “significant sample size” would be biased towards urban stations. We also had a distribution problem within CONUS, with much of the great plains and upper midwest not being well represented.
This is why I’ve been continuing to collect what some might consider an unusually large sample size, now at 87%. We’ve learned that there are so few well sited stations, the ones that meet the CRN1/CRN2 criteria (or NOAA’s 100 foot rule for COOPS) are just 10% of the whole network. See our current census:

When you have such a small percentage of well sited stations, it is obviously important to get a large sample size, which is exactly what I’ve done. Preliminary temperature analysis done by the Pielke group of the the data at 87% surveyed looks quite a bit different now than when at 43%.
It has been said by NCDC in Menne et al “On the reliability of the U.S. surface temperature record” (in press) and in the June 2009 “Talking Points: related to “Is the U.S. Surface Temperature Record Reliable?” that station siting errors do not matter. However, I believe the way NCDC conducted the analysis gives a false impression because of the homogenization process used. As many readers know, the FILNET algorithm blends a lot of the data together to infill missing data. This means temperature data from both well sited and poorly sited stations gets combined to infill missing data. The theory is that it all averages out, but when you see that 90% of the USHCN network doesn’t meet even the old NOAA 100 foot rule for COOPS, you realize this may not be the case.
Here’s a way to visualize the homogenization/FILNET process. Think of it like measuring water pollution. Here’s a simple visual table of CRN station quality ratings and what they might look like as water pollution turbidity levels, rated as 1 to 5 from best to worst turbidity:





In homogenization the data is weighted against the nearby neighbors within a radius. And so a station might start out as a “1” data wise, might end up getting polluted with the data of nearby stations and end up as a new value, say weighted at “2.5”. Even single stations can affect many other stations in the GISS and NOAA data homogenization methods carried out on US surface temperature data here and here.

In the map above, applying a homogenization smoothing, weighting stations by distance nearby the stations with question marks, what would you imagine the values (of turbidity) of them would be? And, how close would these two values be for the east coast station in question and the west coast station in question? Each would be closer to a smoothed center average value based on the neighboring stations.
Essentially, in my opinion, NCDC is comparing homogenized data to homogenized data, and thus there would not likely be any large difference between “good” and “bad” stations in that data. All the differences have been smoothed out by homogenization (pollution) from neighboring stations!
The best way to compare the effect of siting between groups of stations is to use the “raw” data, before it has passed through the multitude of adjustments that NCDC performs. However NCDC is apparently using homogenized data. So instead of comparing apples and oranges (poor sited -vs- well sited stations) they essentially just compare apples (Granny Smith -vs- Golden delicious) of which there is little visual difference beyond a slight color change.
We saw this demonstrated in the ghost authored Talking Points Memo issued by NCDC in June 09 in this graph:

Referencing the above graph, Steve McIntyre suggested in his essay on the subject:
The red graphic for the “full data set” had, using the preferred terminology of climate science, a “remarkable similarity” to the NOAA 48 data set that I’d previously compared to the corresponding GISS data set here (which showed a strong trend of NOAA relative to GISS). Here’s a replot of that data – there are some key telltales evidencing that this has a common provenance to the red series in the Talking Points graphic.

When I looked at SHAP and FILNET adjustments a couple of years ago, one of my principal objections to these methods was that they adjusted “good” stations. After FILNET adjustment, stations looked a lot more similar than they did before. I’ll bet that the new USHCN adjustments have a similar effect and that the Talking Points memo compares adjusted versions of “good” stations to the overall average.
There’s references in the new Menne et al 2010 paper to the new USHCN2 algorithm and we’ve been told how it is supposed to be better. While it does catch undocumented station moves that USHCN 1 did not, it still adjusts data at USHCN stations in odd ways, such as this station in rural Wisconsin, and that is the crux of the problem.

Or this one in Lincoln, IL at the local NWS office where they took great effort to have it well sited.


Thanks to Mike McMillan for the graphs comparing USHCN1 and USHCN2 data
Notice the clear tendency in the graphs comparing USHCN1 to USHCN2 to cool off the early record and leave the current levels near recently reported levels or to increase them. The net result is either reduced cooling or enhanced warming not found in the raw data.
As for the Menne et all 2010 paper itself, I’m rather disturbed by their use of preliminary data at 43%, especially since I warned them that the dataset they had lifted from my website (placed for volunteers to track what had been surveyed, never intended for analysis) had not been quality controlled at the time. Plus there are really not enough good stations with enough spatial distribution at that sample size. They used it anyway, and amazingly, conducted their own secondary survey of those stations, comparing it to my non-quality controlled data, implying that my 43% data wasn’t up to par. Well of course it wasn’t! I told them about it and why it wasn’t. We had to resurvey and re-rate a number of stations from early in the project.
This came about only because it took many volunteers some time to learn how to properly ID them. Even some small towns have 2-3 COOP stations nearby, and only one of them is “USHCN”. There’s no flag in the NCDC metadatabase that says “USHCN”, in fact many volunteers were not even aware of their own station status. Nobody ever bothered to tell them. You’d think if their stations were part of a special subset, somebody at NOAA/NCDC would notify the COOP volunteer so they would have a higher diligence level?
If doing an independent stations survey was important enough for NCDC to do to compare to my 43% data now for their paper, why didn’t they just do it in the first place?
I have one final note of interest on the station data, specifically the issue of MMTS thermometers and their tendency to be sited closer to building due to cabling issues.
Menne et al 2010 mentioned a “counterintuitive” cooling trend in some portions of the data. Interestingly enough, former California State Climatologist James Goodridge did an independent analysis ( I wasn’t involved in data crunchng, it was a sole effort on his part) of COOP stations in California that had gone through modernization, switching from Stevenson Screens with mercury LIG thermometers to MMTS electronic thermometers. He sifted through about 500 COOPs in California and chose stations that had at least 60 years of uninterrupted data, because as we know, a station move can cause all sorts of issues. He used the “raw” data from these stations as opposed to adjusted data.
He writes:
Hi Anthony,
I found 58 temperature station in California with data for 1949 to 2008 and where the thermometers had been changed to MMTS and the earlier parts were liquid in glass. The average for the earlier part was 59.17°F and the MMTS fraction averaged 60.07°F.
Jim
A 0.9F (0.5C) warmer offset due to modernization is significant, yet NCDC insists that the MMTS units are tested at about 0.05C cooler. I believe they add this adjustment into the final data. Our experience shows the exact opposite should be done and with a greater magnitude.
I hope to have this California study published here on WUWT with Jim soon.
I realize all of this isn’t a complete rebuttal to Menne et al 2010, but I want to save that option for more detail for the possibility of placing a comment in The Journal of Geophysical Research.
When our paper with the most current data is completed (and hopefully accepted in a journal), we’ll let peer reviewed science do the comparison on data and methods, and we’ll see how it works out. Could I be wrong? I’m prepared for that possibility. But everything I’ve seen so far tells me I’m on the right track.
If doing a stations survey was important enough for NCDC to do to compare to my data now for their paper, why didn’t they just do it in the first place?
We currently have 87% of the network surveyed (1067 stations out of 1221), and it is quality controlled and checked. I feel that we have enough of the better and urban stations to solve the “low hanging fruit” problem of the earlier portion of the project. Data at 87% looks a lot different than data at 43%.
The paper I’m writing with Dr. Pielke and others will make use of this better data, and we also use a different procedure for analysis than what NCDC used.
Phil M (22:27:21) :
“His reluctance to release data seems at odds with accusations leveled at AGW climate scientists. Particularly since it is, in essence, an inventory of public property.”
So, Phil, did you help out at all with the surface stations project? Did you visit any of the sites? Did Dr. Menne? Did Tom Karl? Did anyone at the NCDC? Why are they all of a sudden now interested in Anthony’s “inventory of public property”?
You can still contribute, Phil M. There are many sites that are yet to be surveyed…
But, I can understand if you’re too busy…I mean, it’s just an “inventory of public property” after all…
Some portion of the “Delta UHI” effect as you call it is related to increasing intensity of energy usage in urban areas. Heating buildings, running lights, all sorts of HVAC equipment, industry, etc., etc., all of it has to be degraded eventually to heat. I have done some rough calculations and the effect in urban areas is a significant fraction of the calculated Delta “greenhouse effect” of increasing CO2. Thus, even with a stable or falling population there is a potential for positive Delta UHI. More precision about the UHI is never going to be possible until people stop looking at it statistically and begin making site-specific measurements.
I found a page at NOAA that spoke of a study about the effect of placing cooperative stations on rooftops. I am unaware that the study was ever finished or that the results are available, but they did begin such a study.
Probably they will eventually learn of the need for a slight temperature adjustment…upward.;)
There is a lot of talk on this thread about Menne et al having scooped surfacestations.org, and that Menne et al appropriated data from surfacestations, or that they did not because it is public data, and so forth.
Menne et al did acknowledge surfacestations.org and were at least respectful. Certainly they should have been more cognizant of the value that a lot of volunteers and Anthony added to the data set. However, we all know that professional scientists are rarely aware of the contributions of others — look at the controversy around Salk and the vaccine for instance. They are often ungenerous people. We wish they were otherwise.
However, the Menne et al study looks pretty minimal to me, looks like it has errors of its own, and I’m not convinced it settles anything. Among other thing, using just a data set to prove its own internal consistency, unless done with care, looks very circular.
As a scientist, don’t release it before it is all gahtered and quality checked.
CAGW is dead. Do you want a partial autopsy report from the ambulance on the way to the coroner?
The ones writing a report with 43% of the stations gathered are violating rules for testing for randomness of samples.
Frank K. (05:27:59) :
“So, Phil, did you help out at all with the surface stations project? Did you visit any of the sites? Did Dr. Menne? Did Tom Karl? Did anyone at the NCDC? Why are they all of a sudden now interested in Anthony’s “inventory of public property”?
Why would those first five questions matter? Did Anthony help with any of the projects so regularly criticized on this site? or collect any of the data? Did you?
They are “all of the sudden interested” because Anthony and other have been leveling accusations of incompetence and fraud for quite some time. I’ve already stated that the treatment Anthony was given regarding authorship is exceedingly rare, in my experience, but not surprising given the context.
I don’t personally know Menne, but I’ll speculate that had Anthony extended a larger amount of professional courtesy to this point, his request to not use the data would have been heeded. As we all know, scientists are very sensitive to things like reputation and no one wants to be known as a data thief. It’s quite bad form. But, right or wrong, the data were publicly available and posted on the internet. And in the past, that has been the only criteria for all things posted on this site (e.g. CRU emails).
Nigel S (05:25:15) :
“I agree, the most likely answer is that the whole thing is based on the thermometer on Hansen’s desk.”
That sounds about as scientific as a dashboard thermometer, doesn’t it?
Thanks for the great work and soon-to-be-published paper! I was wondering, since Menne rushed and published his first, will it have to be a rebuttal paper (with the attending complications), or would it be separate? I would think the latter would be the obvious scenario, since it’s your data, but as many people have already said, this is climate science.
Keep up the good work!
Anthony,
I am glad to hear that you are still showing vital signs. According to a musical contact of mine, someone at RC has ordered a choral copy of the World War 1 song;
Oh! We don’t want to lose you, but we think you ought to go,
so perhaps they are planning on singing you to death; probably their best shot.
There is also a rumour that Robespierrehumbert has been sharpening Madame Guillotine (open to interpretation?), but as you can only legally be executed during the month of Thermidor, which was abolished in 1806 and reabolished with the aid of the IPCC recently, you should be pretty safe for the moment.
I therefore look forward to some more ‘Vendémiaire’ Premier Cru Watts efforts in the future.
I came across this post by Anthony the other day.
The ‘definitive’ way to establish if the official US temp records are biased is to do a quantitative analysis.
I was disappointed this endeavour was cut off two years ago, when steven mosher and John V and others ran analyses at climateaudit on data from good stations as it emerged. The transparency was refreshing.
It is a theme uncontested here that plotting the raw data is what is needed to verify or invalidate the temp record. However, this apprehension may be about to change. In a post above;
I sincerely hope that a temp series using the raw data from good stations is plotted in the upcoming paper. This is what has been promised, and everyone here would agree that this was the expectation we’ve been labouring under.
At the same time, a plot of data that is “dug out” (if raw temps ‘won’t do’) should be plotted with a clearly explained rationale.
It will be ironic if the raw data is ‘adjusted’ in the upcoming paper. Evan, is that what we can expect?
I’d like to commend once more, and as I’ve done elsewhere, the contribution to climatology that has been made by the surfacestations project.
I did read over a few comments at skeptical science and, instead of seeing anything skeptical about the Menne paper, it was mostly pure worship. The number of outright ridiculous statements was mind boggling.
The first thing that popped into my head was the Menne paper is actually another blow to the surface stations credibility. It is completely accepted that UHI is real in the science community. If these poorly sited stations show no warming bias that means there MUST be a cooling bias somewhere in the process. In other words, since two wrongs don’t make a right, it furthers the claim of BAD SCIENCE.
@Phil M (22:34:18)
As I pointed out further upstream, the value of the project in the long term is in giving more confidence to the eventual scientific consensus. SS is the complete antithesis of Briffa’s magic tree in Russia. Nor do I think Anthony’s paper will be the last word either. . .but eventually there will be consensus, and we’ll have greater confidence in the actual bias these sitings issues cause (or don’t cause), and that will be made possible by having all this raw data.
I only mention the farmhouse (really, the inherent limitations of the MMTS system) to give a sense of the regret when you’re coming up on a site that you know ought to just be dead perfect for an uncontaminated collection site. . . and then its not.
Doing so-called “preliminary” analysis on data produced by a census attempt is a cardinal sin in Statistics: It creates the possibility that biases in a subset will shape subsequent analysis.
Actually submitting such analysis for publication is either ignorance (I run in to Ph.D.s who do not understand basic Statistics on an almost daily basis, so I cannot rule out) or a malicious attempt to preempt the effect of a proper analysis.
In statistical analysis, the number of observations does not matter as much as the process by which those observations were collected. Random sampling allows you to use the standard tools whereas an incomplete census does not.
UHI ?? As one who doesn’t always remember mnemonics, I had to go to Google to learn that UHI stands for “Urban Heat Island”. Please, Mr. Watts, the first time you use a mnemonic or abbreviation in a posting, please spell it it out– some of us are a bit slow.
Lol, no prob!
I would go so far as to say even averaging a grid is bogus. Unless the grids are only a mile across, you can’t average an area where the temp variations can be as much as 10f even just 5 miles apart and end up with anything meaningful.
Reading this Menne paper, one questions comes up immediately.
What has all this to do with science?
– it starts in the cheapest possible way, with attacks like Heartland/ tobacco/ oil.
What a nice move after Pachauri …
– it admits, yes, most of our stations are completely rubbish, but the results of the remaining are just fine. Don’t you use the bad ones for your analysis? Or do you? How did this “selfsnip” pass peer-review … (well, we know already)
– using incomplete data un-authorized, without even asking the owners … what an un-scientific behaviour.
As somebody said it already: something starting this way is not worth reading. Low-level propaganda.
steven mosher (22:27:40) :
tfp (19:02:11) :
Anthony, Your withholding of the data is just as bad as CRU.
The only difference is you cannot be subject to FOIA!
Free the Data! Please.”
There is a point that NOBODY gets about this. Anthony HAS FREED HIS DATA.
One rule for NASA one for Watts
When NASA release data without the processing code all hell breaks loose in the blogosphere.
But it’s fine if Watts publishes the raw photos without the processing to give rating.
The ratings were once published online but then updating was stopped. I assume NASA used this freely available data?
Is the data owned by Watts. It has been gathered by volunteers. Is paperwork in place assigning rights to watts?
REPLY: Yes, the user agreement every volunteer agrees to when they sign up assigns me the rights. – Anthony
Inscription:
They always said I’d go to that hot place when I died. Well, here I am, and it aint so hot.
Hansen has a better thermometer on his desk because he has a PHD.
Just remember M.D’s rely on rectal thermometer readings taken by orderlies and nurses aids. Not even RN.s They also don’t vary location. They don’t take a reading for you from your roomate 7 feet away. They don’t move the thermometer between “down there” and your mouth.
Next time one of the elites sees healthcare, they must demand the Doctor insert the thermometer and wait for a reading. Let me know what the doc’s reaction is.
Phil M (06:10:42) :
So I would assume the answer to my questions are “no”? OK, that’s what I wanted to know…Thanks.
“They are all of the sudden interested because Anthony and other have been leveling accusations of incompetence and fraud for quite some time.”
(cf. the CRU e-mails)
Phil M (22:27:21) said:
His reluctance to release data seems at odds with accusations leveled at AGW climate scientists.
In my opinion you are being disingenuous and dishonest.
The issue is climate researchers refusing to release data and methods after they have published.
Phil M (22:27:21) said:
In my opinion you are being disingenuous and dishonest.
The issue is climate researchers refusing to release data and methods after they have published.
You’re either being purposefully obtuse, or are simply not too bright.
What you describe is exactly what happening here. The difference is Anthony warned Dr. Menne ahead of time that what he was doing wasn’t a good idea. Anthony was under no obligation, either ethically or scientifically, to place data online before the analysis and publication was complete. I simply can’t believe you’re even mentioning this. Do you also excoriate Mann, Jones, and all the other hockey players for their failure to release data and methods YEARS after publication? If not, you, sir, are a hypocrite.
Doesn’t matter, Evan. “People” like Phil M will still be purposefully obtuse.