An uncorrected assumption in BEST's station quality paper

I noted with a chuckle today, this statement over at the California Academy of Sciences “Climate Change Blog”:

I think that we all need to be careful of not falling into the unqualified and inexpert morass characterized by vessels like Anthony Watts.  – Peter D. Roopnarine

Seeing that compliment, and since we are having so much fun this week reviewing papers online and watching street lamps melt due to posited global warming, this seemed like a good time to bring this up. I’ve been sitting on this little gem for a year now, and it is finally time to point it out since nobody seems to have caught it.

I expected that after the peer review BEST been though (and failed), that this would have been fixed. Nope. I thought after the media blitzes it would have been fixed. Nope. I thought that after they submitted it to The Third Santa Fe Conference on Global and Regional Climate Change somebody would point it out and fix it. Nope. I thought after I pointed it out in Watts et el 2012 draft paper, surely one of the BEST co-authors would fix it. Still nope.

The assumption error I spotted last year still exists in the May 20th edition of the BEST paper Earth Atmospheric Land Surface Temperature and Station Quality in the Contiguous United States by Richard A. Muller, Jonathan Wurtele, Robert Rohde, Robert Jacobsen, Saul Perlmutter, Arthur Rosenfeld, Judith Curry, Donald Groom, Charlotte Wickham: 2012, Berkeley Earth Surface Temperature Project (online here PDF).

From line 32 of the abstract:

A histogram study of the temperature trends in groupings of stations in the NOAA categories shows no statistically significant disparity between stations ranked “OK” (CRN 1, 2, 3) and stations ranked as “Poor”(CRN 4, 5).

From the analysis:

FIG. 4. Temperature estimates for the contiguous United States, based on the

classification of station quality of Fall et al. (2011) of the USHCN temperature stations,

using the Berkeley Earth temperature reconstruction method described in Rohde et al.

(2011). The stations ranked CRN 1, 2 or 3 are plotted in red and the poor stations (ranked 4 or 5) are plotted in blue.

Did you catch it? It is the simplest of assumption errors possible, yet it is obvious, and renders the paper fatally flawed in my opinion.  Answer below. 

Note the NOAA CRN station classification system, derived from Leroy 1999, described in the Climate Reference Network (CRN) Site Information Handbook, 2002, which is online here.(PDF)

This CRN classification system was used in the Fall et al 2011 paper and the Menne et al 2010 paper as the basis for these studies. Section 2.2.1 of the NOAA CRN handbook says this:

2.2.1 Classification for Temperature/Humidity

  • Class 1 – Flat and horizontal ground surrounded by a clear surface with a slope below 1/3 (<19º). Grass/low vegetation ground cover <10 centimeters high. Sensors located at least 100 meters from artificial heating or reflecting surfaces, such as buildings, concrete surfaces, and parking lots. Far from large bodies of water, except if it is representative of the area, and then located at least 100 meters away. No shading when the sun elevation >3 degrees.
  • Class 2 – Same as Class 1 with the following differences. Surrounding Vegetation <25 centimeters. Artificial heating sources within 30m. No shading for a sun elevation >5º.
  • Class 3 (error 1ºC) – Same as Class 2, except no artificial heating sources within 10 meters.
  • Class 4 (error ≥ 2ºC) – Artificial heating sources <10 meters.
  • Class 5 (error ≥ 5ºC) – Temperature sensor located next to/above an artificial heating source, such a building, roof top, parking lot, or concrete surface.

Note that Class 1 and 2 stations have no errors associated with them, but Class 3,4,5 do.

From actual peer reviewed science:  Menne, M. J., C. N. Williams Jr., and M. A. Palecki, 2010: On the reliability of the U.S. surface temperature record, J. Geophys. Res., 115, D11108, doi:10.1029/2009JD013094 Online here PDF

It says in Menne et al 2010 section2 “Methods”:

…to evaluate the potential impact of exposure on station siting, we formed two subsets from the five possible USCRN exposure types assigned to the USHCN stations by surfacestations.org, and reclassified the sites into the broader categories of “good” (USCRN ratings of 1 or 2) or “poor” exposure (USCRN ratings of 3, 4 or 5).

In Fall et al, 2011, the paper of which I am a co-author, we say:

The best and poorest sites consist of 80 stations classified as either CRN 1 or CRN 2 and 61 as CRN 5 (8% and 6% of all surveyed stations, respectively).

and

Figure 2. Distribution of good exposure (Climate Reference Network (CRN) rating = 1 and 2) and bad exposure (CRN = 5) sites. The ratings are based on classifications by Watts [2009] using the CRN site selection rating shown in Table 1. The stations are displayed with respect to the nine climate regions defined by NCDC.

Clearly, per Leroy 1999 and the 2002 NOAA CRN Handbook, both Menne et al 2010 and Fall et al 2011 treat Class 1 and 2 stations as well sited aka “good” sites, and Class 3,4,5 as poorly sited or “poor”.

In Watts et al 2012, we say on line 289:

The distribution of the best and poorest sites is 289 displayed in Figure 1. Because Leroy (2010) considers both Class1 and Class 2 sites to be acceptably representative for temperature measurement, with no associated measurement bias, these were combined into the single “compliant” group with all others, Class, 3, 4, and 5 as the “non-compliant” group.

Let’s compare again to Muller et al 2012, but first, let’s establish the date of the document for certain, from document properties dialog:

From line 32 of the abstract:

A histogram study of the temperature trends in groupings of stations in the NOAA categories shows no statistically significant disparity between stations ranked “OK” (CRN 1, 2, 3) and stations ranked as “Poor”(CRN 4, 5).

From the analysis:

FIG. 4. Temperature estimates for the contiguous United States, based on the

classification of station quality of Fall et al. (2011) of the USHCN temperature stations,

using the Berkeley Earth temperature reconstruction method described in Rohde et al.

(2011). The stations ranked CRN 1, 2 or 3 are plotted in red and the poor stations (ranked 4 or 5) are plotted in blue.

Note the color key of the graph.

On line 108 they say it this, apparently just making up their own site quality grouping, ignoring the siting class acceptability of the previous peer reviewed literature.

We find that using what we term as OK stations (rankings 1, 2 and 3) does not yield a statistically meaningful difference in trend from using the poor stations (rankings 4 and 5).

They binned it wrong. BEST mixed an unacceptable station class set, Class 3, with a 1°C error (per Leroy 1999, CRN Handbook 2002, Menne et al 2010, Fall et al 2011, and of course Watts et al 2012) into the acceptable classes of stations, Classes 1&2, calling the Class 123 group “OK”.

They mention their reasoning starting on line 163:

The Berkeley Earth methodology for temperature reconstruction method is used to study the combined groups OK (1+2+3) and poor (4+5). It might be argued that group 3 should not have been used in the OK group; this was not done, for example, in the analysis of Fall et al. (2011). However, we note from the histogram analysis shown in Figure 2 that group 3 actually has the lowest rate of temperature rise of any of the 5 groups. When added to the in “poor” group to make the group that consists of categories  3+4+5, it lowers the estimated rate of temperature rise, and thus it would result in an even lower level of potential station quality heat bias.

Maybe, but when Leroy 1999, CRN Handbook 2002, Leroy 2010, The WMO standard endorsement of Leroy 2010,  Fall et al 2011, and now Watts et al 2012 all say that Classes 1 and 2 are acceptable, and Classes 3, 4, and 5 are not, can you really just make up your own ideas of what is and is not acceptable station siting? Maybe they were trying to be kind to me I don’t know, but the correct way of binning is to use Class 1 and 2 as acceptable, and Classes 3, 4, and 5 as unacceptable. The results should always be based on that especially when siting standards have been established and endorsed by the World Meteorological Organization. To make up your own definition of acceptable station groups is capricious and arbitrary.

Of course none of this really matters much, because the data that BEST had (the same data from Fall et al 2011), was binned improperly anyway due to surface area of the heat sinks and sources not being considered, which when combined with the binning assumption, rendered the Muller/BEST paper pointless.

I wonder, if Dr. Judith Curry will ask her name to be taken off of this paper too?

This science lesson, from an “unqualified and inexpert morass”, is brought to you by the number 3.

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

132 Comments
Inline Feedbacks
View all comments
davidmhoffer
August 5, 2012 9:09 am

amoeba;
PS. LazyTeenager is completely right, by the way.
>>>>>>>>>>>>>>>>>>>
Ooooh, such big words for a single cell creature. FWIW, Lazy’s stock in trade is obfuscation and misdirection and rationalization of bad behaviour. If he indeed got something right and simply said so straight forwardly, it is a first in my experience. That said, the notion that simply documenting what you did somehow compensates for what you did being wrong doesn’t sit well with me, and combining error ridden data with high quality data is simply wrong.
As for the trends, if unadjusted pristine stations show a higher trend than unadjusted poor quality stations, then there is something terribly wrong with the data that has not yet been understood. The vast majority of effects experienced by poorly sited stations is increased temperatures, and these become more pronounced over time as cities grow and previously pristine stations become poor ones. For the trends of the poor quality stations to be below that of the pristine ones is completely illogical, and demands further investigation. If it isn’t the post verus pre adjustment issue as I surmized, then it is another issue. As Anthony points out at various times, the factors affecting the data when one examines the stations on a station by station basis are numerous and highly unpredictable. Yes, sometimes a counter intuitive result turns out to be the correct result. But in this case I find that hard to believe, and there is a massive amount of meta data regarding ALL the stations that we simply do not have.

amoeba
August 5, 2012 9:53 am

Well, you keep mixing things together. Let’s assume (for a second) that BEST estimate all the trends correctly. They consider binnings 1+2+3 vs 4+5 and 1+2 vs 3+4+5, present both (!) results, but focus on the first because it’s more favourable to the hypothesis that bad stations show more warming. Still, they reject this hypothesis. Anthony writes a lengthy post saying that it was wrong to bin 1+2+3 together. This is an absurd critique. I can only repeat that, sorry.
What you are saying now is that you do not believe (or at least find highly suspicious) that the BEST trend for the bad stations is not larger than for the good/ok stations. But don’t you see that this is another issue? Anthony did not write about it. In my first comment I did not write about it. It’s just a different discussion.
All right, now if you still ask me, how come that BEST trend for the bad stations is so low, I have no idea. I suspect that it might be because their outligher deweighting algorithm basically kicks out those stations that show anomalously high warming, in automatic regime. But I am not sure. Maybe they are just wrong and screwed everything up. I do not know and that’s why I did not even want to discuss that. I was only talking about binning.

jim2
August 5, 2012 11:51 am

So, why did the classification 3 stations have such low trend? Are the siting criteria valid? Maybe not in light of Leroy 2010?

George E. Smith;
August 5, 2012 11:36 pm

“””””…..LazyTeenager says:
August 5, 2012 at 4:29 am
Pamela Gray says
The average of crappy data is meaningless.
———-
This is a bogus over generalization. The exact way in which data is crappy is very important.
I can look at graphs of data/ signal which look just like noise. Given the right processing techniques I can pull out signal that is as clear as day……”””””
Well since we are talking about the various “weather sites” then it stands to reason that we are talking about a potential “sampled data system”.
I say “potential”, because, if it’s a sampled data sytem, then it has to comply with the Nyquist sampling theorem, or it isn’t gathering any data at all; just. readings, which are not representative of the true continuous variable(s) that are being observed and sampled there. Absent a valid sampling regimen, then there isn’t any signal to be pulled out of the noise; not even by Lazy Teenager’s genius Techniques.
And don;t give me that “we don’t need the signal” just the “average”/trend/whatever.
Well you only have to undersample by a factor of two and poof goes your average, so you are left with notning. Gee I guess a twice per day sampling rate (max/min) doesn’t meet the minimm Nyquist criterion except for a pure sinusoidal signal with no harmonic content; so even temporally, the “data” is no good, and when you look at the spatial sampling rate (1200 km anyone), well calling it data is a joke.

August 8, 2012 10:33 am

Eli Rabett says:
August 4, 2012 at 1:50 pm
“Tall Dave appears to prefer uncalibrated measurements. The semantic part about what NOAA does is that the “adjustments” are really inter-calibrations. Ask John Christy about the problems you can get into without inter-calibrations when you have different instruments or measurement devices that drift. ”
This is utter nonsense. If you’re comparing station quality, why in God’s name would you inter-calibrate them first? The inter-calibrations hide the thing you’re looking for!
It’s like saying “Okay, I have a theory that some of these bowls of oatmeals has more raisins than the others, but before I count the number of raisins in each bowl I’m going to mix them all together.”
This is either extremely bad science, or a deliberate attempt to hide the siting problems. Since Hansen did the same thing years ago, the latter seems much more likely.

August 8, 2012 10:44 am

“My understanding is that the trends reported by BEST are post “adjustments”. The whole point of Anthony et al is that the adjustments applied to cat 1&2 stations were larger than the ones applied to 3,4, & 5 stations. If my understanding is correct, then the points raised by those to commentors are mute.”
Exactly!
It’s really hard not to see this as dishonesty on Muller’s part. Either that, or he had ho understanding of how the adjustments were done, which is even worse.
See, for instance, here: http://wattsupwiththat.com/2009/04/18/what-happens-when-you-divide-antarctica-into-two-distinct-climate-zones/

1 4 5 6