An uncorrected assumption in BEST's station quality paper

I noted with a chuckle today, this statement over at the California Academy of Sciences “Climate Change Blog”:

I think that we all need to be careful of not falling into the unqualified and inexpert morass characterized by vessels like Anthony Watts.  – Peter D. Roopnarine

Seeing that compliment, and since we are having so much fun this week reviewing papers online and watching street lamps melt due to posited global warming, this seemed like a good time to bring this up. I’ve been sitting on this little gem for a year now, and it is finally time to point it out since nobody seems to have caught it.

I expected that after the peer review BEST been though (and failed), that this would have been fixed. Nope. I thought after the media blitzes it would have been fixed. Nope. I thought that after they submitted it to The Third Santa Fe Conference on Global and Regional Climate Change somebody would point it out and fix it. Nope. I thought after I pointed it out in Watts et el 2012 draft paper, surely one of the BEST co-authors would fix it. Still nope.

The assumption error I spotted last year still exists in the May 20th edition of the BEST paper Earth Atmospheric Land Surface Temperature and Station Quality in the Contiguous United States by Richard A. Muller, Jonathan Wurtele, Robert Rohde, Robert Jacobsen, Saul Perlmutter, Arthur Rosenfeld, Judith Curry, Donald Groom, Charlotte Wickham: 2012, Berkeley Earth Surface Temperature Project (online here PDF).

From line 32 of the abstract:

A histogram study of the temperature trends in groupings of stations in the NOAA categories shows no statistically significant disparity between stations ranked “OK” (CRN 1, 2, 3) and stations ranked as “Poor”(CRN 4, 5).

From the analysis:

FIG. 4. Temperature estimates for the contiguous United States, based on the

classification of station quality of Fall et al. (2011) of the USHCN temperature stations,

using the Berkeley Earth temperature reconstruction method described in Rohde et al.

(2011). The stations ranked CRN 1, 2 or 3 are plotted in red and the poor stations (ranked 4 or 5) are plotted in blue.

Did you catch it? It is the simplest of assumption errors possible, yet it is obvious, and renders the paper fatally flawed in my opinion.  Answer below. 

Note the NOAA CRN station classification system, derived from Leroy 1999, described in the Climate Reference Network (CRN) Site Information Handbook, 2002, which is online here.(PDF)

This CRN classification system was used in the Fall et al 2011 paper and the Menne et al 2010 paper as the basis for these studies. Section 2.2.1 of the NOAA CRN handbook says this:

2.2.1 Classification for Temperature/Humidity

  • Class 1 – Flat and horizontal ground surrounded by a clear surface with a slope below 1/3 (<19º). Grass/low vegetation ground cover <10 centimeters high. Sensors located at least 100 meters from artificial heating or reflecting surfaces, such as buildings, concrete surfaces, and parking lots. Far from large bodies of water, except if it is representative of the area, and then located at least 100 meters away. No shading when the sun elevation >3 degrees.
  • Class 2 – Same as Class 1 with the following differences. Surrounding Vegetation <25 centimeters. Artificial heating sources within 30m. No shading for a sun elevation >5º.
  • Class 3 (error 1ºC) – Same as Class 2, except no artificial heating sources within 10 meters.
  • Class 4 (error ≥ 2ºC) – Artificial heating sources <10 meters.
  • Class 5 (error ≥ 5ºC) – Temperature sensor located next to/above an artificial heating source, such a building, roof top, parking lot, or concrete surface.

Note that Class 1 and 2 stations have no errors associated with them, but Class 3,4,5 do.

From actual peer reviewed science:  Menne, M. J., C. N. Williams Jr., and M. A. Palecki, 2010: On the reliability of the U.S. surface temperature record, J. Geophys. Res., 115, D11108, doi:10.1029/2009JD013094 Online here PDF

It says in Menne et al 2010 section2 “Methods”:

…to evaluate the potential impact of exposure on station siting, we formed two subsets from the five possible USCRN exposure types assigned to the USHCN stations by surfacestations.org, and reclassified the sites into the broader categories of “good” (USCRN ratings of 1 or 2) or “poor” exposure (USCRN ratings of 3, 4 or 5).

In Fall et al, 2011, the paper of which I am a co-author, we say:

The best and poorest sites consist of 80 stations classified as either CRN 1 or CRN 2 and 61 as CRN 5 (8% and 6% of all surveyed stations, respectively).

and

Figure 2. Distribution of good exposure (Climate Reference Network (CRN) rating = 1 and 2) and bad exposure (CRN = 5) sites. The ratings are based on classifications by Watts [2009] using the CRN site selection rating shown in Table 1. The stations are displayed with respect to the nine climate regions defined by NCDC.

Clearly, per Leroy 1999 and the 2002 NOAA CRN Handbook, both Menne et al 2010 and Fall et al 2011 treat Class 1 and 2 stations as well sited aka “good” sites, and Class 3,4,5 as poorly sited or “poor”.

In Watts et al 2012, we say on line 289:

The distribution of the best and poorest sites is 289 displayed in Figure 1. Because Leroy (2010) considers both Class1 and Class 2 sites to be acceptably representative for temperature measurement, with no associated measurement bias, these were combined into the single “compliant” group with all others, Class, 3, 4, and 5 as the “non-compliant” group.

Let’s compare again to Muller et al 2012, but first, let’s establish the date of the document for certain, from document properties dialog:

From line 32 of the abstract:

A histogram study of the temperature trends in groupings of stations in the NOAA categories shows no statistically significant disparity between stations ranked “OK” (CRN 1, 2, 3) and stations ranked as “Poor”(CRN 4, 5).

From the analysis:

FIG. 4. Temperature estimates for the contiguous United States, based on the

classification of station quality of Fall et al. (2011) of the USHCN temperature stations,

using the Berkeley Earth temperature reconstruction method described in Rohde et al.

(2011). The stations ranked CRN 1, 2 or 3 are plotted in red and the poor stations (ranked 4 or 5) are plotted in blue.

Note the color key of the graph.

On line 108 they say it this, apparently just making up their own site quality grouping, ignoring the siting class acceptability of the previous peer reviewed literature.

We find that using what we term as OK stations (rankings 1, 2 and 3) does not yield a statistically meaningful difference in trend from using the poor stations (rankings 4 and 5).

They binned it wrong. BEST mixed an unacceptable station class set, Class 3, with a 1°C error (per Leroy 1999, CRN Handbook 2002, Menne et al 2010, Fall et al 2011, and of course Watts et al 2012) into the acceptable classes of stations, Classes 1&2, calling the Class 123 group “OK”.

They mention their reasoning starting on line 163:

The Berkeley Earth methodology for temperature reconstruction method is used to study the combined groups OK (1+2+3) and poor (4+5). It might be argued that group 3 should not have been used in the OK group; this was not done, for example, in the analysis of Fall et al. (2011). However, we note from the histogram analysis shown in Figure 2 that group 3 actually has the lowest rate of temperature rise of any of the 5 groups. When added to the in “poor” group to make the group that consists of categories  3+4+5, it lowers the estimated rate of temperature rise, and thus it would result in an even lower level of potential station quality heat bias.

Maybe, but when Leroy 1999, CRN Handbook 2002, Leroy 2010, The WMO standard endorsement of Leroy 2010,  Fall et al 2011, and now Watts et al 2012 all say that Classes 1 and 2 are acceptable, and Classes 3, 4, and 5 are not, can you really just make up your own ideas of what is and is not acceptable station siting? Maybe they were trying to be kind to me I don’t know, but the correct way of binning is to use Class 1 and 2 as acceptable, and Classes 3, 4, and 5 as unacceptable. The results should always be based on that especially when siting standards have been established and endorsed by the World Meteorological Organization. To make up your own definition of acceptable station groups is capricious and arbitrary.

Of course none of this really matters much, because the data that BEST had (the same data from Fall et al 2011), was binned improperly anyway due to surface area of the heat sinks and sources not being considered, which when combined with the binning assumption, rendered the Muller/BEST paper pointless.

I wonder, if Dr. Judith Curry will ask her name to be taken off of this paper too?

This science lesson, from an “unqualified and inexpert morass”, is brought to you by the number 3.

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

132 Comments
Inline Feedbacks
View all comments
August 3, 2012 11:55 am

gopal panicker says:
August 3, 2012 at 10:46 am
a gathering of climate ‘scientists’…a fever…an idiocy…a corruption…a herd…a troop…etc

Clearly it has to be a ‘fever’ of them:

/Mr Lynn

August 3, 2012 12:15 pm

I don’t think this comparison is very relevant. If every station had 110 continuous years of history, then you could look at this and say “Aha, maybe the quality doesn’t matter much.”
But we don’t, the average is (iirc) something like 20 years, and all the temperatures get averaged together to arrive at the global average. Bad stations should have the same trend as good stations, because they started off warm and stayed warm.
This chart does a great job of demonstrating that bad stations are reliably bad, but I don’t think that can be construed as support for the average trend, in which good stations also probably become a smaller portion of the signal over time.

Paul
August 3, 2012 12:43 pm

polistra says:
August 3, 2012 at 8:49 am
Vessel? What a strange epithet! Do they imagine Anthony to be a luxurious cruise ship? A four-masted schooner? A pirate clipper?

My assumption was the word they thought they were using was vassal,

A vassal or feudatory[1] is a person who has entered into a mutual obligation to a lord or monarch in the context of the feudal system in medieval Europe. The obligations often included military support and mutual protection, in exchange for certain privileges, usually including the grant of land held as a fiefdom. Vassal

and meaning it as an insult and implying that Anthony owes servitude to “Evil Big Oil” or perhaps the Heartland Inst.
I’ve always found it curious how the more vocal of the CAGW alarmists can’t seem to conceive that there isn’t someone, of a higher authority behind the curtains pulling levers and issuing marching orders to his thralls. I suppose if you can’t think critically for yourself, and let some authority issue you your talking points, you assume everyone is like that.

Gneiss
August 3, 2012 1:39 pm

I don’t see an “uncorrected assumption” in the BEST analysis, they just made a different decision about how to bin the 5 station categories. Whether the binning decision either way affects trends in the anomalies over time could be a real question, but one this post does not try to answer. Many other studies suggest it does not.
Regarding Leif’s question of why not graph each of the five categories separately instead of binning at all, I think the answer is that the error bars get much larger due to smaller group size.

gacooke
August 3, 2012 1:52 pm

No, they meant to call him a vessel. He brings them a bitter draught.

Editor
August 3, 2012 1:58 pm

One of the problems with figuring out which way the temperature is heading is that we are looking for a small signal that is mostly swamped by noise.
The point of the figure above is to show how stations of any quality agree well with each other, and by eyeball, they seem too. (I’d like to see it stretched horizontally about 10X, though.)
What does this say about the noise?
1) The noise signal is regional.
For each sample in the figure, a warm anomaly at one station shows up at another. That would imply there is little noise in a station’s measurements, it all comes from weather systems.
2) The noise is lost due to poor assessment of station quality.
This is the central feature of Anthony’s paper.
3) The noise signal is being spread between stations through the homogenization procedure.
It might be interesting to take pre and post homogenized data and see how that displays and analyzes.

KnR
August 3, 2012 2:01 pm

‘, can you really just make up your own ideas of what is and is not acceptable station siting?’
frankly if it gives you what you want I think climate science has show us time and again you can do anything .

RockyRoad
August 3, 2012 2:16 pm

Surely a group of Climate Scientists is a “Consensus”

Or in general just call them “climsci”… “clim” for half climate (the warming but not the cooling part) and “sci” for half scientist (no data, just methodology).
And the “climsci” pursue their chalice of perpetual funding, as would any glory-seeking, hungry cult.

RockyRoad
August 3, 2012 2:21 pm

Why is everything in itallics?
[Reply: Fixed, thanks. It’s a WordPress glitch that happens occasionally. ~dbs, mod.]

zefal
August 3, 2012 2:25 pm

They have just discovered this “error” on their own and will be correcting it of their own accord.

Dave
August 3, 2012 2:52 pm

As a candidate nearing completion of my PhD and working on a climate-related (but not global warming) research project, I have to say that what Anthony has shown us here is illustrative of a true MasterCard moment… Priceless!!!
Well done Anthony, I’m sure that Steve McIntyre is proud of your astute observation.

Gneiss
August 3, 2012 3:39 pm

Edward Bancroft writes,
“Isn’t there something else misleading about that graph? Namely that it is showing the temperature differences from the norm for both plots, which are bound to be the same, as even a bad UHI site record which consistently reads high will faithfully follow the temperature changes.”
No, what you’ve stated is exactly the reason scientists use anomalies to track temperature change across multiple stations. In this context it would be misleading if they did not.

George E. Smith;
August 3, 2012 4:02 pm

If the good stations (in red) and the no good stations (in blue) basically tell the same story ( since to my crummy eyes I can’t see any difference (or red)), then why not adopt the ANOMALY protocol, and plot a single graph that is just red minus blue. Obviously that will plot as a boring straight horizontal line, if good is just as bad as the bad, and verse vicea.

Chris D.
August 3, 2012 4:22 pm

Mosher is uncharacteristically silent on the matter.

Sou
August 3, 2012 6:36 pm

Agree with those who can’t see the problem. Not everyone has to do things the exact same way. The Muller team looked at the difference for both in any case, and found no statistically significant difference in the trends..
The Muller paper states that when they binned them as “good” and “poor” (as opposed to their “OK” and “poor” ) then the poor quality stations showed a slightly lower warming trend (though not a statistically significant difference). That’s the same as Menne found but the opposite to what Anthony has been trying to get the data to show. (Fall et al stated they found no difference in the trend of mean temperature – the ups balanced the downs.) From the Muller paper:

The difference between the “poor” (4+5) sites and the “OK” (1+2+3) sites is 0.09 ± 0.07oC per century.  We also tried other groupings; the difference between the (3+4+5) grouping and the “good” (1+2) sites is -0.04 ± 0.10oC per century, i.e. the other sites are warming at a slower rate than are the good sites, although the effect is not larger than the statistical uncertainty. There is no evidence that the poor sites show a greater warming trend than do the OK sites.

August 3, 2012 8:10 pm

Sou, I’d be curious about a geographic breakdown before I believe the BEST team about anything.
Consider Arkansas. http://berkeleyearth.lbl.gov/regions/arkansas
1) They graph 12 month and 10 year moving averages, yet the data has 5 year averages as well. They didn’t graph the 5 year averages.
2) The claim a .48C / century trend from 1910. NOAA says -0.06 degF / Decade from 1910.
Which is a huge difference.
You can check NOAA here: http://www.ncdc.noaa.gov/oa/climate/research/cag3/ar.html
3) From 1895, NOAA says: 1895 – 2011 Trend = -0.03 degF / Decade. Again, a negative trend.
4) Annual 1920 – 2011 Trend = -0.09 degF / Decade. Again, a negative trend.
I can easily find negative trends from Arkansas at NOAA, but BEST couldn’t find any. You have to be dishonest or incompetent to not find any negative trends for Arkansas.
I don’t think I believe BEST about much.

Ed_B
August 3, 2012 8:14 pm

IMO Muller is not up to a grade 7 science student level of analysis. I am shocked at what passes as “science” at Berkley U. No way no how should A Watts need to point out such an obvious binning error on Mullers part. I can only conclude that the 3s were mixed in with the 1s and 2s to produce the “desired” result, ie, “global warming”. EPIC SCIENCE FAILURE!!

August 3, 2012 8:28 pm

As an addendum to my Arkansas post. There are six states which have a negative trend from 1895 as of the end of 2011. Alabama, Georgia, Mississippi, Arkansas, South Carolina and Tennessee.
http://sunshinehours.wordpress.com/2012/04/29/is-the-usa-warming-the-noaa-data-saysit-depends-part-1/
BEST didn’t find any negative trends.
http://berkeleyearth.lbl.gov/regions/alabama
http://berkeleyearth.lbl.gov/regions/georgia-(state)
http://berkeleyearth.lbl.gov/regions/mississippi
http://berkeleyearth.lbl.gov/regions/arkansas
http://berkeleyearth.lbl.gov/regions/south-carolina
http://berkeleyearth.lbl.gov/regions/tennessee
“It is shown that the SE United States is one of few regions on this planet that shows a cooling trend over the twentieth century (Trenberth et al. 2007). Portmann et al. (2009) find that this cooling trend is strongest in the late spring–early summer period.”
http://coaps.fsu.edu/pub/eric/papers_html/Misra_et_al_11.pdf
How could BEST not find a cooling trend when a lot other people (including Trenbeth!) can?

Jonathan Smith
August 3, 2012 8:56 pm

If only the current crop of AGW climate scientists could be spread amongst all of the scientific disciplines. Their ability to grasp such a phenominally complex system as the Earth’s climate, and make highly accurate predictions into the far future, surely means that they can figure out anything. Disperse them properly and by my reckoning the whole of science should be sewn up within 5 yrs (to at least 6 decimal places).

Owen in Ga
August 3, 2012 8:58 pm

Error is added in quadrature so if class1 and class2 had 0 error, class3 had 1C error, class4 had 3C error and class 5 had 5C (best cases dropping the or greater than part)
Error bars for
class 1+2 = sqrt(0^2+0^2) = 0
class 1+2+3 = sqrt(0^2+0^2+1^2)=1C an infinite difference (divide by 0)
class 3+4+5 = sqrt(1^2+3^2+5^2)=5.9C
class 4+5 = sqrt(3^2+5^2)=5.8C
So they take an accepted graph with 0 and 5.9C error bars and substitute a graph with 1C and 5.8C bars, but since they didn’t plot the bars you don’t see the difference.

Brian H
August 3, 2012 9:22 pm

Somebody says:
August 3, 2012 at 7:16 am
Just in case you missed it: http://phys.org/news/2012-08-climate-refuted-shifts-high-profile.html

Nice catch! [To summarize, plant slope migration was controlled by normal post-fire patterns.]
But, of course, the normal CYA statement is included:

“I want to be clear that I’m not saying climate change isn’t happening or having effects,” Schwilk said. “I study it all the time.”

Dave N
August 3, 2012 9:38 pm

It bears repeating: it matters not what someone’s qualifications and/or experience are, nor does it matter how much money they have, nor who they are funded by; nothing can change the veracity of their statements.
“Unqualified and inexpert” is ad hominem; when it comes to veracity, stick to what a person is saying, not who they are.

Gail Combs
August 3, 2012 11:01 pm

EternalOptimist says:
August 3, 2012 at 7:28 am
If what they say starting on line 163 is accurate, and type 3 has the lowest rate of temperature rise of all five groups, and both bins produce an identical anomaly, then removing 3 from the ‘ok’ bin would mean that the ok bin would go up and the poor bin would go down.
And Muller would thereby prove that sites on UHI would return lower temperatures than those that are perfectly situated
only in the topsy turvy world of Muller
_________________________________________
Or perhaps it indicates the data is not “Raw” but already “Adjusted” and is showing they good (sites 1 & 2) data was adjusted even higher than necessary to match the other sites.
(At this point I would not trust any data.)

rogerknights
August 3, 2012 11:04 pm

jorgekafkazar says:
August 3, 2012 at 11:44 am
Shevva says: “What would you call a group of Climate Scientists though, a Chorus of Climate Scientists?”
Without intending to demean all slimate…oops, that was truly a Freudian slip. Let me start over: not all climate scientists would be included in this group, but based on majority behaviors, I lean towards a pander of climate scientists.

A claque of climatologers? (“claque” = pals, or paid strangers, who applaud a performer.)

Evan Thomas
August 3, 2012 11:08 pm

Why not get a young US Met. Sci. student to set up an experiment locating, say 100 automatic climate recorders in obviously poor sites and 100 in good sites in a theoretically climatically similar region? In OZ we have a similar problem with BOM refusing to admit their stats may be flawed. And of course across the ditch in NZ their weather-men have been taken to court over their stats. Cheers from chilly Sydney.