Berkeley Earth Surface Temperature Surfacestations paper

An uncorrected assumption in BEST's station quality paper

I noted with a chuckle today, this statement over at the California Academy of Sciences “Climate Change Blog”:

I think that we all need to be careful of not falling into the unqualified and inexpert morass characterized by vessels like Anthony Watts. – Peter D. Roopnarine

Seeing that compliment, and since we are having so much fun this week reviewing papers online and watching street lamps melt due to posited global warming, this seemed like a good time to bring this up. I’ve been sitting on this little gem for a year now, and it is finally time to point it out since nobody seems to have caught it.

I expected that after the peer review BEST been though (and failed), that this would have been fixed. Nope. I thought after the media blitzes it would have been fixed. Nope. I thought that after they submitted it to The Third Santa Fe Conference on Global and Regional Climate Change somebody would point it out and fix it. Nope. I thought after I pointed it out in Watts et el 2012 draft paper, surely one of the BEST co-authors would fix it. Still nope.

The assumption error I spotted last year still exists in the May 20th edition of the BEST paper Earth Atmospheric Land Surface Temperature and Station Quality in the Contiguous United States by Richard A. Muller, Jonathan Wurtele, Robert Rohde, Robert Jacobsen, Saul Perlmutter, Arthur Rosenfeld, Judith Curry, Donald Groom, Charlotte Wickham: 2012, Berkeley Earth Surface Temperature Project (online here PDF).

From line 32 of the abstract:

A histogram study of the temperature trends in groupings of stations in the NOAA categories shows no statistically significant disparity between stations ranked “OK” (CRN 1, 2, 3) and stations ranked as “Poor”(CRN 4, 5).

From the analysis:

FIG. 4. Temperature estimates for the contiguous United States, based on the

classification of station quality of Fall et al. (2011) of the USHCN temperature stations,

using the Berkeley Earth temperature reconstruction method described in Rohde et al.

(2011). The stations ranked CRN 1, 2 or 3 are plotted in red and the poor stations (ranked 4 or 5) are plotted in blue.

Did you catch it? It is the simplest of assumption errors possible, yet it is obvious, and renders the paper fatally flawed in my opinion. Answer below.

Note the NOAA CRN station classification system, derived from Leroy 1999, described in the Climate Reference Network (CRN) Site Information Handbook, 2002, which is online here.(PDF)

This CRN classification system was used in the Fall et al 2011 paper and the Menne et al 2010 paper as the basis for these studies. Section 2.2.1 of the NOAA CRN handbook says this:

2.2.1 Classification for Temperature/Humidity

Class 1 – Flat and horizontal ground surrounded by a clear surface with a slope below 1/3 (<19º). Grass/low vegetation ground cover <10 centimeters high. Sensors located at least 100 meters from artificial heating or reflecting surfaces, such as buildings, concrete surfaces, and parking lots. Far from large bodies of water, except if it is representative of the area, and then located at least 100 meters away. No shading when the sun elevation >3 degrees.
Class 2 – Same as Class 1 with the following differences. Surrounding Vegetation <25 centimeters. Artificial heating sources within 30m. No shading for a sun elevation >5º.
Class 3 (error 1ºC) – Same as Class 2, except no artificial heating sources within 10 meters.
Class 4 (error ≥ 2ºC) – Artificial heating sources <10 meters.
Class 5 (error ≥ 5ºC) – Temperature sensor located next to/above an artificial heating source, such a building, roof top, parking lot, or concrete surface.

Note that Class 1 and 2 stations have no errors associated with them, but Class 3,4,5 do.

From actual peer reviewed science: Menne, M. J., C. N. Williams Jr., and M. A. Palecki, 2010: On the reliability of the U.S. surface temperature record, J. Geophys. Res., 115, D11108, doi:10.1029/2009JD013094 Online here PDF

It says in Menne et al 2010 section2 “Methods”:

…to evaluate the potential impact of exposure on station siting, we formed two subsets from the five possible USCRN exposure types assigned to the USHCN stations by surfacestations.org, and reclassified the sites into the broader categories of “good” (USCRN ratings of 1 or 2) or “poor” exposure (USCRN ratings of 3, 4 or 5).

In Fall et al, 2011, the paper of which I am a co-author, we say:

The best and poorest sites consist of 80 stations classified as either CRN 1 or CRN 2 and 61 as CRN 5 (8% and 6% of all surveyed stations, respectively).

and

Figure 2. Distribution of good exposure (Climate Reference Network (CRN) rating = 1 and 2) and bad exposure (CRN = 5) sites. The ratings are based on classifications by Watts [2009] using the CRN site selection rating shown in Table 1. The stations are displayed with respect to the nine climate regions defined by NCDC.

Clearly, per Leroy 1999 and the 2002 NOAA CRN Handbook, both Menne et al 2010 and Fall et al 2011 treat Class 1 and 2 stations as well sited aka “good” sites, and Class 3,4,5 as poorly sited or “poor”.

In Watts et al 2012, we say on line 289:

The distribution of the best and poorest sites is 289 displayed in Figure 1. Because Leroy (2010) considers both Class1 and Class 2 sites to be acceptably representative for temperature measurement, with no associated measurement bias, these were combined into the single “compliant” group with all others, Class, 3, 4, and 5 as the “non-compliant” group.

Let’s compare again to Muller et al 2012, but first, let’s establish the date of the document for certain, from document properties dialog:

From line 32 of the abstract:

A histogram study of the temperature trends in groupings of stations in the NOAA categories shows no statistically significant disparity between stations ranked “OK” (CRN 1, 2, 3) and stations ranked as “Poor”(CRN 4, 5).

From the analysis:

FIG. 4. Temperature estimates for the contiguous United States, based on the

classification of station quality of Fall et al. (2011) of the USHCN temperature stations,

using the Berkeley Earth temperature reconstruction method described in Rohde et al.

(2011). The stations ranked CRN 1, 2 or 3 are plotted in red and the poor stations (ranked 4 or 5) are plotted in blue.

Note the color key of the graph.

On line 108 they say it this, apparently just making up their own site quality grouping, ignoring the siting class acceptability of the previous peer reviewed literature.

We find that using what we term as OK stations (rankings 1, 2 and 3) does not yield a statistically meaningful difference in trend from using the poor stations (rankings 4 and 5).

They binned it wrong. BEST mixed an unacceptable station class set, Class 3, with a 1°C error (per Leroy 1999, CRN Handbook 2002, Menne et al 2010, Fall et al 2011, and of course Watts et al 2012) into the acceptable classes of stations, Classes 1&2, calling the Class 123 group “OK”.

They mention their reasoning starting on line 163:

The Berkeley Earth methodology for temperature reconstruction method is used to study the combined groups OK (1+2+3) and poor (4+5). It might be argued that group 3 should not have been used in the OK group; this was not done, for example, in the analysis of Fall et al. (2011). However, we note from the histogram analysis shown in Figure 2 that group 3 actually has the lowest rate of temperature rise of any of the 5 groups. When added to the in “poor” group to make the group that consists of categories 3+4+5, it lowers the estimated rate of temperature rise, and thus it would result in an even lower level of potential station quality heat bias.

Maybe, but when Leroy 1999, CRN Handbook 2002, Leroy 2010, The WMO standard endorsement of Leroy 2010, Fall et al 2011, and now Watts et al 2012 all say that Classes 1 and 2 are acceptable, and Classes 3, 4, and 5 are not, can you really just make up your own ideas of what is and is not acceptable station siting? Maybe they were trying to be kind to me I don’t know, but the correct way of binning is to use Class 1 and 2 as acceptable, and Classes 3, 4, and 5 as unacceptable. The results should always be based on that especially when siting standards have been established and endorsed by the World Meteorological Organization. To make up your own definition of acceptable station groups is capricious and arbitrary.

Of course none of this really matters much, because the data that BEST had (the same data from Fall et al 2011), was binned improperly anyway due to surface area of the heat sinks and sources not being considered, which when combined with the binning assumption, rendered the Muller/BEST paper pointless.

I wonder, if Dr. Judith Curry will ask her name to be taken off of this paper too?

This science lesson, from an “unqualified and inexpert morass”, is brought to you by the number 3.

0 0 votes

Article Rating

132 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

August 3, 2012 6:34 am

The difference between the “poor” (4+5) sites and the “OK” (1+2+3) sites is 0.09
141 ± 0.07 oC per century. We also tried other groupings; the difference between the (3+4+5)
142 grouping and the “good” (1+2) sites is -0.04 ± 0.10 oC per century, i.e. the other sites are
143 warming at a slower rate than are the good sites, although the effect is not larger than the
144 statistical uncertainty. There is no evidence that the poor sites show a greater warming
145 trend than do the OK sites.

Bob Mount

August 3, 2012 6:36 am

I hope that Dr. Judith Curry does disassociate herself from the “BEST” paper. Have you put this suggestion to her, Anthony?

gator69

August 3, 2012 6:41 am

Excellent catch Mr Watts! Sneaky little bastards get their junk caught in the wringer again!!!

Bloke down the pub

August 3, 2012 6:41 am

Well I suppose the warmists have always thought that OK was good enough for Government work.

tallbloke

August 3, 2012 6:43 am

Congrats on the 2012 pre-print release Anthony.
Muller says:
“group 3 actually has the lowest rate of temperature rise of any of the 5 groups.”
Is this still true under the Leroy 2011 re-analysis?
[REPLY: No. But then Leroy (2010) is assigning stations to different groups than if Leroy (1999) is applied. Read the paper again. -REP]

Leif Svalgaard

August 3, 2012 6:45 am

Amid the flood of Figures, stats, claims, etc I may have missed the simplest direct demonstration of all, namely a curve for each class, rather than hiding behind bins.
REPLY: Excellent suggestion – Anthony

Billy Blofeld

August 3, 2012 6:46 am

Superbly written.
Compared to Anthony I’m a genuinely one of the “inexpert and unqualified morass” – yet I trust Anthony more because he writes in such a gentlemanly way. After-all people who throw mud usually have something to hide.
Even Anthony’s final pay-off line is a restrained but suitably humourous and assertive rebuttal “This science lesson, from an “unqualified and inexpert morass”, is brought to you by the number 3.”
Trust? Who do I trust in this debate? Not the supposed “experts” that’s for sure.

tadchem

August 3, 2012 6:48 am

An even more funfdamental error, one which can only be accounted for through mendacity, is that they are plotting a histogram of *anomalies* (transient deviations), while the siting errors resulting in the five site categories would be expected to create *biases* (systematic deviations).
Apples and oranges…

Bloke down the pub

August 3, 2012 6:48 am

If there is so little difference between the ok and the poor sites, why bother having sites at all. Why not just make the figures up? oh err mmm

Coach Springer

August 3, 2012 6:51 am

Questions from a non-scientist/non-statistician:
1. I see their discussion proving they knowingly binned it the way they did, but their discussion is why it supposedly didn’t “hurt” to exclude it from the poor category. But what was the purpose of including the unacceptable in the acceptable category contrary to established, standard practice in the first place?
2. What is Curry’s reason for not already taking her name off this paper too if “Of course none of this really matters much, because the data that BEST had (the same data from Fall et al 2011), was binned improperly anyway due to surface area of the heat sinks and sources not being considered…”?

cbltoo

August 3, 2012 6:56 am

As a layman, I don’t see a significant different in the graphs – other laymen – like politicians – also will not see same. Can the scale of the graphs be adjusted to show the precise differences?

Bill Yarber

August 3, 2012 6:59 am

Anthony.
It would be interesting to see the histogram comparing the 1,2 and 3,4,5 groupings. If statistically different then the whole paper should be withdrawn. Can you show that? If not, the climate science cabal will just ignore your point. Hope you find something significant.
Also, it is deplorable that only 8% of the stations were deemed Good (CRN 1 & 2). Deplorable!
Bill

David Fogg

August 3, 2012 7:03 am

Somebody help me out. The very poor stations (4, 5) don’t show a different trend to the good, ok and slightly poor stations?? If the error of a class 4 is >=2deg, and the error of a class 5 is >=5deg, how is this possible?? Shouldn’t the comparison plot show that the poor stations have a temperature anomaly much higher than the good stations?

Bill Yarber

August 3, 2012 7:03 am

Another thought: The histogram should use only the raw data from those two groups. You have already pointed out that the “adjustments” impact good stations more to bring them in line with the bad station increases (think I got that the right way). In any case, only the raw, unadjusted data should be used to compare these two groupings.
Bill

gregole

August 3, 2012 7:07 am

BEST, upon close examination, appears as nothing more than confirmation bias writ large sponsored under the aegis of a well-oiled PR campaign. Shameful to claim it as a scientific endeavor. Nice catch Anthony and thank you – if BEST was legitimate that is the message you would be receiving from them.
tadchem says:
August 3, 2012 at 6:48 am
An even more fundamental error, one which can only be accounted for through mendacity, is that they are plotting a histogram of *anomalies* (transient deviations), while the siting errors resulting in the five site categories would be expected to create *biases* (systematic deviations).
+1

Kaboom

August 3, 2012 7:10 am

Trust no one is the proper setting when checking for airtightness of your space suit and the assertions of experts.

Jeremy

August 3, 2012 7:10 am

I see in their histogram study for those temperature trends, they also seem to have used the MEAN instead of the MEDIAN. Wouldn’t this create influence from the ends of the distribution that they would want to avoid? In the plots they have the words “cut outliers”, but wouldn’t that just bring up the need to justify where you make your cuts?

Somebody

August 3, 2012 7:16 am

Just in case you missed it: http://phys.org/news/2012-08-climate-refuted-shifts-high-profile.html

Poriwoggu

August 3, 2012 7:21 am

I don’t see why all five classes aren’t plotted individually. Is there a reason this wasn’t done? Is it possible to get the data and plot all 5 classes?

Bill Illis

August 3, 2012 7:24 am

Berkeley has 43% higher trend for both the United States (back to 1895) and for the United Kingdom (back to 1753).
This is disturbing.
http://s16.postimage.org/9awf95qx1/Berkeley_US_vs_USHCN_1895.png
http://s18.postimage.org/k6t9xf4mh/Berkeley_UK_vs_Had_CET_1753.png

Gibby

August 3, 2012 7:26 am

Just my two cents, but it seems to me that in order to handle the disparity statement correctly they needed to test groups 3, 4, and 5 individually against a grouping of 1 and 2. Then you would truely be able to establish if the divergence is statistically significant and therefore say which adjustments/classifications are an issue and need to be reassesed.

EternalOptimist

August 3, 2012 7:28 am

If what they say starting on line 163 is accurate, and type 3 has the lowest rate of temperature rise of all five groups, and both bins produce an identical anomaly, then removing 3 from the ‘ok’ bin would mean that the ok bin would go up and the poor bin would go down.
And Muller would thereby prove that sites on UHI would return lower temperatures than those that are perfectly situated
only in the topsy turvy world of Muller

Steve Oregon

August 3, 2012 7:28 am

Here is a really dumb question.
Why can’t the climate science community acknowledge any significance in the error of their ways?
If nothing that challenges them is allowed to be significant then what’s the point of their pretending to be
scientific?
Isn’t significance important?

Jim Cripwell

August 3, 2012 7:28 am

On her blog, Climate Etc., Dr Curry on the thread
Observation-based (?) attribution
stated
“curryja | July 31, 2012 at 12:36 pm | Reply
Kip, Muller emailed this to me (he wrote it), I said it was ok to post. I am making my own statements about this, but I thought it was not unreasonable for them to want to post a joint statement since we disagree. They still seem to want me on the team in spite of public disagreements. And I like having an inside track on what is going on with the project.”
Maybe she still feels this way.

Wagathon

August 3, 2012 7:36 am

When in Rome you have to do what the Romans do. And, at this point in the AGW Hoax, it brings no virtue or righteousness to science but who cares: global warming stopped being about science a long time ago. It’s all about politics and being a skeptic is the last service an honest man can do for science. At least the politics have been corrected even as academia has lost all pretense to scholarship.

And others are there who go along heavily and creakingly, like carts taking stones downhill: they talk much of dignity and virtue–their drag they call virtue!
And others are there who are like eight-day clocks when wound up; they tick, and want people to call ticking–virtue.
Verily, in those have I mine amusement: wherever I find such clocks I shall wind them up with my mockery, and they shall even whirr thereby!
And others are proud of their modicum of righteousness, and for the sake of it do violence to all things: so that the world is drowned in their unrighteousness.
Ah! how ineptly cometh the word “virtue” out of their mouth! And when they say: “I am just,” it always soundeth like: “I am just–revenged!”
With their virtues they want to scratch out the eyes of their enemies; and they elevate themselves only that they may lower others.
~Nietzsche (Zarathustra)

Bryan Mulder

August 3, 2012 7:40 am

Would someone re-draw the graph with the binning adjusted for the number three temperature data, as you described? How about a five color plot, one color for each of the five categories, as Leif just suggested? (or is the raw data not available?)

Theo Goodwin

August 3, 2012 7:51 am

Smashing work, Anthony. They cannot hide the pea from you.

AnonyMoose

August 3, 2012 7:51 am

“However, we note from the histogram analysis shown in Figure 2 that group 3 actually has the lowest rate of temperature rise of any of the 5 groups.”
So they looked for the temperature rise which they were expecting, and tried to arrange their data based upon what they were expecting or what they wanted. In this case, a better match between the bins supports their hoping that the data shows a temperature rise, so they claim their arrangement is better because it produces the expected result and they can claim that the data is good.
Their noting that they are aware of the binning done by the other studies just makes it worse.

lowercasefred

August 3, 2012 7:54 am

They don’t care, they have the press carrying water for them, they don’t have to care.
“IDIOT, n. A member of a large and powerful tribe whose influence in human affairs has always been dominant and controlling. The Idiot’s activity is not confined to any special field of thought or action, but “pervades and regulates the whole.” He has the last word in everything; his decision is unappealable. He sets the fashions and opinion of taste, dictates the limitations of speech and circumscribes conduct with a dead-line.” – Ambrose Bierce

glenncz

August 3, 2012 7:57 am

Something happened to these US temps since 1999.
Here is Hansen, the keeper of US temp’s and father of global warming in 1999, trying to explain why US temperature were lower in 1999 than in the 1930’s. By almost .5C
http://www.giss.nasa.gov/research/briefs/hansen_07/
Then go to the link below and chart the data as it currently stands. Plug in Annual and from 1895 to 1999.
http://www.ncdc.noaa.gov/oa/climate/research/cag3/na.html
Now in the new, improved version of US temps, 1999 is 2-3F higher than 1895, and higher than the 1930’s. So somehow since Hansen published paper in 1999, temps got adjusted about 2-3F UP!
Now plug 1999-2011 into that NCDC site, and you’ll find 2011 about 1F lower than 1999. So if you join the 1999 Hansen chart with the 1999-2011 NCDC chart, you end up with 2011 being about as warm as 1895 and a full 1C cooler than the 1930’s. According the Hansen 1999>spliced with NCDC data, the US is now colder than it has been for much of the past century.

Peeved

August 3, 2012 8:04 am

Are these differences between “adjusted” sets or between raw data?

gacooke

August 3, 2012 8:05 am

cut outliers??? You would expect some sort of discussion of “cut outliers” in the text. Did I miss it?

John K. Sutherland.

August 3, 2012 8:14 am

lowercasefred, quoted from ‘the Devil’s Dictionary’ of Ambrose Bierce. The dictionary is very clever and is available free of charge from the Gutenberg site; gutenberg.org.

Pamela Gray

August 3, 2012 8:18 am

For those of you who think the paper has not made a statistical mistake remixing bins. This has nothing to do with whether or not the bins are mixed this way and that and then track each other’s average anomaly, as BEST seems to want to say. It has to do with error bars being significantly different from each other. Bins 1 and 2 have small error bars. Bins 3, 4, and 5 have larger error bars. Comparing the average anomalies without error bars is deceptive. BEST wants to mix good data with crappy data (without telling us) and say there is nothing wrong, move along. Reminds me of the “hide the decline” trick.
The average of crappy data is meaningless. For a really good lesson in crappy data compared to tight data, plot just the error bars.

greg holmes

August 3, 2012 8:20 am

1900 = 0.5 , 2010 = 0, looks like a lot of fuss over no result, why the hell are we spending billions on this crap?

Dan in Nevada

August 3, 2012 8:21 am

Bill Yarber says:
August 3, 2012 at 7:03 am
Bill, I almost said the same thing until I saw your post. I think you nailed the real issue. If I understand correctly, the “adjusted” data used by BEST homogenizes the readings from all stations. In fact that’s a word you see a lot and it literally means blending everything together, good and bad. So why would there be any surprise when the trends from all categories plot right on top of one another?
Worse, and I want to see more on this, Watts 2012 expressly states that the homogenization largely consists of making the “good” stations look like the “poor” stations. That’s a rather bold claim and I don’t think they would make it if they couldn’t back it up.

TomL

August 3, 2012 8:21 am

The red curve and the blue curve look to me to be too close to identical in every detail to actually represent independent measurements made at different sites. Are we sure they were not “homogenized” to force them to match before the comparison was made?

ferdberple

August 3, 2012 8:22 am

Coach Springer says:
August 3, 2012 at 6:51 am
Questions from a non-scientist/non-statistician:
1. I see their discussion proving they knowingly binned it the way they did, but their discussion is why it supposedly didn’t “hurt” to exclude it from the poor category. But what was the purpose of including the unacceptable in the acceptable category contrary to established, standard practice in the first place?
=================
Exactly – why didn’t they at least show it both ways?
They must have tried including group 3 with 4 & 5, because they clearly discussed it. So, why didn’t they show it? The obvious answer is that it didn’t support what they were trying to show.
This appears to be a case of selection bias. BEST has almost certainly tried grouping 3 with 4 and 5, and didn’t like what they saw, so they “selected” (cherry picked) a different grouping to give them the results they wanted.
Isn’t “cherry picking” the results by creating an arbitrary, non-standard methodology a form of scientific fraud? Isn’t hiding the results of the standard methodology also a form of scientific fraud?

Paul Fischbeck

August 3, 2012 8:23 am

In the plot of anomalies, what are the anomalies based on? Wha baseline temperature are they using?
Are the 1+2+3 stations compared to some long-run average temperature of 1+2+3 stations or some average temperature of all stations? Likewise with the 4+5 stations.
What corrections have been made to the station temperatures before they are averaged and used to create the anomalies graph?

Disko Troop

August 3, 2012 8:24 am

I believe the point of this post is that it is just plain WRONG to include Class 3 in the good grouping. I don’t see why Muller needs to do it . It may or may not significantly alter the result. That is entirely irrelevant in my opinion. If it is WRONG now ,and if the results stand, all future papers based on these results are WRONG. Currently insignificant looking errors will be compounded. The further down the line you get the more the errors will be. WRONG is WRONG.
Take away their computers until they have re-learned the basics

Resourceguy

August 3, 2012 8:33 am

Keep your eye on the pea as we shuffle the cups. Just don’t call it science.

lowercasefred

August 3, 2012 8:34 am

@John K Sutherland 8:14 a.m.
Not only is Bierce clever, but he is one of the most accurate observers of mankind that has ever been.

Gerry Dorrian

August 3, 2012 8:35 am

If you’re a vessel, Anthony, you’re one full of learning, wisdom and real-world savvy. Your opponents, on the other hand, exhibit the most famous attribute of empty vessels.

Rud Istvan

August 3, 2012 8:37 am

The bigger problem is that the BEST binning definitions themselves are not correct. Menne and Fall papers also showed little quality impact because of that. Only when stations are properly binned using the new WMO standard does the magnitude of the problem become clear. Worse, the homogenization procedures exaggerate rather than dampen the problem. And I personally doubt that TOBS adjustments will change the result much, since that variable is orthogonal to micro site quality (so should randomly affect all bins about equally). This is why AW’s paper is so important concerning land records.

Zeke Hausfather

August 3, 2012 8:47 am

Bill Illis,
When doing a US comparison, make sure to use CONUS rather than total US temperatures, since USHCN is CONUS only.
http://berkeleyearth.lbl.gov/regions/contiguous-united-states

pwl

August 3, 2012 8:47 am

[;-)]

pochas

August 3, 2012 8:48 am

The use of ad hominem, innuendo and invective is characteristic of those with a hidden agenda (follow the money). Judith does not belong in that camp, and she should promptly get up and leave.

polistra

August 3, 2012 8:49 am

Vessel? What a strange epithet! Do they imagine Anthony to be a luxurious cruise ship? A four-masted schooner? A pirate clipper? Or, if they’re seeing him in a ‘morass’, maybe he’s one of those Louisiana swamp airboats.

dbstealey

August 3, 2012 8:53 am

BEST graph, “adjusted” vs raw data:

Fred

August 3, 2012 8:55 am

The new spelling for BEST is WORST.

Just an engineer

August 3, 2012 8:58 am

Perhaps it is time to rename it to “Berkeley Urban Surface Temperature”?

Pamela Gray

August 3, 2012 9:11 am

Error bars tell you whether or not the average is meaningful. If you were to run every combination of “data” there is in a large error barred data set, you would get a number of different averages that only minimally resembled each other, and some not at all. The combined average of a large error bar data set is not representative of the raw data. If you were to run every combination of “data” there is in a small error barred data set, you would get averages that resemble one another. The combined average is more representative of the raw data. A spurious result is that sometimes, the average of a tight data set matches the average of a crappy set. But that is a false positive. That the averages match in the BEST paper is a false positive. Period.

RockyRoad

August 3, 2012 9:18 am

…an “unqualified and inexpert morass” forces the comeuppance of a variety of so-called “expert cliimate scientists”.
Fun to watch.

tckev

August 3, 2012 9:26 am

“Oh what a tangled web we weave, When first we practise to deceive!”
Sir Walter Scott.

Stephen Richards

August 3, 2012 9:29 am

They binned it wrong. BEST mixed an unacceptable station class set, Class 3, with a 1°C error (per Leroy 1999, CRN Handbook 2002, Menne et al 2010, Fall et al 2011, and of course Watts et al 2012) into the acceptable classes of stations, Classes 1&2, calling the Class 123 group “OK”.
It’s subprime climate change. Put the rubbish with the A class and call it AAA.

Steve C

August 3, 2012 9:41 am

You have sharp eyes, Mr. Watts, for a “vessel”. (D’you s’pose they subconsciously meant “vassal”? – It’s a very strange locution.)
Too often, climate data mashing brings to mind the old truth:
1 barrel sewage + 1 teaspoon wine = 1 barrel sewage
1 barrel wine + 1 teaspoon sewage = 1 barrel sewage

sunshinehours1

August 3, 2012 9:46 am

BEST data shows western USA temperature have fallen off a cliff.
http://sunshinehours.wordpress.com/2012/08/01/best-usa-tmax-fell-off-a-cliff-on-west-coast/
http://sunshinehours.wordpress.com/2012/08/02/best-usa-5-years-averages-fall-off-a-cliff-continued/
Maybe Mosher or Zeke can explain why CO2 AGW ignores large parts of the USA.

GP Hanner

August 3, 2012 9:51 am

“I think that we all need to be careful of not falling into the unqualified and inexpert morass characterized by vessels like Anthony Watts.”
I think Freeman Dyson already has demonstrated that a Ph.D. is not necessary in order to do research in physics.

Edward Bancroft

August 3, 2012 9:59 am

Isn’t there something else misleading about that graph? Namely that it is showing the temperature differences from the norm for both plots, which are bound to be the same, as even a bad UHI site record which consistently reads high will faithfully follow the temperature changes.
Further, the UHI affected sites historically start out unaffected and slowly build up the error, into modern times. I suspect that this is also being masked in that graph, because it is a slowly moving effect compared to the smaller periods used
for the temperature anomaly plots. Showing the absolute temperature plots might give us a different slant.

Poems of Our Climate

August 3, 2012 10:02 am

How to Make a Good Curry
“….. but I thought it was not unreasonable for them to want to post a joint statement since we disagree.”
That sounds to me like she’s not willing to lose her status in the political-climate movement.
“They still seem to want me on the team in spite of public disagreements. And I like having an inside track on what is going on with the project.” Good enough…
But, what exactly is keeping her “inside track” here? I think she is afraid to come out and state the obvious, because she knows that the political-climate movement could throw her under the bus in no time.
A better idea; when your old friends turn out to be unreliable, get some new ones.

dbstealey

August 3, 2012 10:09 am

Judith Curry says:
“…and I like having an inside track on what is going on with the project.”
Says it all. They’ve got her under control.

Shevva

August 3, 2012 10:09 am

What a bunch of Muppets.
What would you call a group of Climate Scientists though, a Chorus of Climate Scientists?

Neil Jordan

August 3, 2012 10:12 am

The following article relates to “Texting and language skills”
http://languagelog.ldc.upenn.edu/nll/?p=4099
but the statistical manipulations described are germane to the BEST binning situation. The quote suitable for framing is:
“There’s a special place in purgatory reserved for scientists who make bold claims based on tiny effects of uncertain origin; and an extra-long sentence is imposed on those who also keep their data secret, publishing only hard-to-interpret summaries of statistical modeling. The flames that purify their scientific souls will rise from the lake of lava that eternally consumes the journalists who further exaggerate their dubious claims. Those fires, alas, await Drew P. Cingel and S. Shyam Sundar, the authors of “Texting, techspeak, and tweens: The relationship between text messaging and English grammar skills”, New Media & Society 5/11/2012. . .”
There follows a list of offending journalists, culminating with:
“And, of course, in a specially-hot lava puddle all his own, the guy who wrote the press release from Penn State: Matt Swayne, “No LOL matter: Tween texting may lead to poor grammar skills”, 7/26/2012.”

michael hart

August 3, 2012 10:16 am

The “histogram” shown is not a histogram, that’s my first problem with it.
Where I come from, a histogram displays frequencies.
I must be getting old.

Frenchie77

August 3, 2012 10:26 am

Just wondering, but doesn’t a histogram normally have on the vertical axis a simple instance/occurence count.
This ‘histogram’ isn’t really a histogram, it is a standard 2-variable plot of anomalies versus year.
Or did someone re-define a histogram differently? I know, I know – These things sometimes happen in universities and don’t necessarily flow out to the real world where numbers have meaning and effect.

Mac the Knife

August 3, 2012 10:34 am

Edward Bancroft says:
August 3, 2012 at 9:59 am
Astute observations, Edward.
MtK

Steinar Midtskogen

August 3, 2012 10:35 am

If people can bin arbitrarily, they will sooner or later find a binning system that gives them the result they’re looking for, cooling or warming. But even if everybody can agree on a binning system and where to draw the line between OK and poor in that system, it may still a pretty arbritrary line. The physics don’t care whether there is an agreement or not.
The category 1-5 system described doesn’t seem to me as a very precise way to estimate error. For instance, it seems to disqualify entire mountainous areas (and often remote from human activity) apart from the summits and possibly isolated spots that really are atypical of the area. Also, it doesn’t help if there’s a km to the nearest road or building if it makes the observer skip readings.
Since we’re talking about temperatures requiring accuracy to the tenth of a degree to tell one trend from another, it seems pretty irrelevant to me whether to have two bins 1-2 and 3-4-5 or 1-2-3 and 4-5. First I would prefer all stations to be automatic, to minimise human error. Then, it seems that both OK and poor (with respect to significant temperature errors) stations could end up in category 3 and possibly 4 as well, so to eliminate that uncertainty, I would eliminate those categories and compare only 1-2 and 5, all automatic. If that leaves too few stations to give a “robust” result, then the result may simply be “we don’t know”.

gopal panicker

August 3, 2012 10:46 am

a gathering of climate ‘scientists’…a fever…an idiocy…a corruption…a herd…a troop…etc

Resourceguy

August 3, 2012 10:55 am

Data quality is in fact the very last thing that participants in the grand paper mill of researchers seeking promotion want to talk about. That is true across many disciplines. Follow the money and egos but don’t look back.

DonS

August 3, 2012 11:16 am

Judith Curry and the BEST team have me off to refresh my memory on the causes and effects of the Stockholm Syndrome.

Mac the Knife

August 3, 2012 11:27 am

Shevva says:
August 3, 2012 at 10:09 am
“What would you call a group of Climate Scientists….
Shevva,
Let’s call them a ‘Blind Man’s Bluff’: Each climatologist is blindfolded with ‘find the warming’ grant money. As they all grope around blindly, searching for anything to grab onto to justify their next grant and chance to play, they ‘peer review’ each other by calling out “You’re getting warmer, warmer…. Ohhh, You’re on Fire!”
Mtk

yamaka

August 3, 2012 11:38 am

Surely a group of Climate Scientists is a “Consensus” 😉

sunshinehours1

August 3, 2012 11:43 am

A real histogram by CRN rating would be good.
As well as a histogram for each CRN rating for each NOAA region would be good.

jorgekafkazar

August 3, 2012 11:44 am

Shevva says: “What would you call a group of Climate Scientists though, a Chorus of Climate Scientists?”
Without intending to demean all slimate…oops, that was truly a Freudian slip. Let me start over: not all climate scientists would be included in this group, but based on majority behaviors, I lean towards a pander of climate scientists.

CoonAZ

August 3, 2012 11:52 am

Something else that could create confusion in the BEST report:
In Figure 4 RED = OK, BLUE = poor.
In Figure 1 RED = poor, BLUE = OK.

Mr Lynn

August 3, 2012 11:55 am

gopal panicker says:
August 3, 2012 at 10:46 am
a gathering of climate ‘scientists’…a fever…an idiocy…a corruption…a herd…a troop…etc

Clearly it has to be a ‘fever’ of them:

/Mr Lynn

TallDave (@TallDave7)

August 3, 2012 12:15 pm

I don’t think this comparison is very relevant. If every station had 110 continuous years of history, then you could look at this and say “Aha, maybe the quality doesn’t matter much.”
But we don’t, the average is (iirc) something like 20 years, and all the temperatures get averaged together to arrive at the global average. Bad stations should have the same trend as good stations, because they started off warm and stayed warm.
This chart does a great job of demonstrating that bad stations are reliably bad, but I don’t think that can be construed as support for the average trend, in which good stations also probably become a smaller portion of the signal over time.

Paul

August 3, 2012 12:43 pm

polistra says:
August 3, 2012 at 8:49 am
Vessel? What a strange epithet! Do they imagine Anthony to be a luxurious cruise ship? A four-masted schooner? A pirate clipper?
My assumption was the word they thought they were using was vassal,

A vassal or feudatory[1] is a person who has entered into a mutual obligation to a lord or monarch in the context of the feudal system in medieval Europe. The obligations often included military support and mutual protection, in exchange for certain privileges, usually including the grant of land held as a fiefdom. Vassal

and meaning it as an insult and implying that Anthony owes servitude to “Evil Big Oil” or perhaps the Heartland Inst.
I’ve always found it curious how the more vocal of the CAGW alarmists can’t seem to conceive that there isn’t someone, of a higher authority behind the curtains pulling levers and issuing marching orders to his thralls. I suppose if you can’t think critically for yourself, and let some authority issue you your talking points, you assume everyone is like that.

Gneiss

August 3, 2012 1:39 pm

I don’t see an “uncorrected assumption” in the BEST analysis, they just made a different decision about how to bin the 5 station categories. Whether the binning decision either way affects trends in the anomalies over time could be a real question, but one this post does not try to answer. Many other studies suggest it does not.
Regarding Leif’s question of why not graph each of the five categories separately instead of binning at all, I think the answer is that the error bars get much larger due to smaller group size.

gacooke

August 3, 2012 1:52 pm

No, they meant to call him a vessel. He brings them a bitter draught.

Ric Werme

Editor

August 3, 2012 1:58 pm

One of the problems with figuring out which way the temperature is heading is that we are looking for a small signal that is mostly swamped by noise.
The point of the figure above is to show how stations of any quality agree well with each other, and by eyeball, they seem too. (I’d like to see it stretched horizontally about 10X, though.)
What does this say about the noise?
1) The noise signal is regional.
For each sample in the figure, a warm anomaly at one station shows up at another. That would imply there is little noise in a station’s measurements, it all comes from weather systems.
2) The noise is lost due to poor assessment of station quality.
This is the central feature of Anthony’s paper.
3) The noise signal is being spread between stations through the homogenization procedure.
It might be interesting to take pre and post homogenized data and see how that displays and analyzes.

KnR

August 3, 2012 2:01 pm

‘, can you really just make up your own ideas of what is and is not acceptable station siting?’
frankly if it gives you what you want I think climate science has show us time and again you can do anything .

RockyRoad

August 3, 2012 2:16 pm

Surely a group of Climate Scientists is a “Consensus”

Or in general just call them “climsci”… “clim” for half climate (the warming but not the cooling part) and “sci” for half scientist (no data, just methodology).
And the “climsci” pursue their chalice of perpetual funding, as would any glory-seeking, hungry cult.

RockyRoad

August 3, 2012 2:21 pm

Why is everything in itallics?
[Reply: Fixed, thanks. It’s a WordPress glitch that happens occasionally. ~dbs, mod.]

zefal

August 3, 2012 2:25 pm

They have just discovered this “error” on their own and will be correcting it of their own accord.

Dave

August 3, 2012 2:52 pm

As a candidate nearing completion of my PhD and working on a climate-related (but not global warming) research project, I have to say that what Anthony has shown us here is illustrative of a true MasterCard moment… Priceless!!!
Well done Anthony, I’m sure that Steve McIntyre is proud of your astute observation.

Gneiss

August 3, 2012 3:39 pm

Edward Bancroft writes,
“Isn’t there something else misleading about that graph? Namely that it is showing the temperature differences from the norm for both plots, which are bound to be the same, as even a bad UHI site record which consistently reads high will faithfully follow the temperature changes.”
No, what you’ve stated is exactly the reason scientists use anomalies to track temperature change across multiple stations. In this context it would be misleading if they did not.

George E. Smith;

August 3, 2012 4:02 pm

If the good stations (in red) and the no good stations (in blue) basically tell the same story ( since to my crummy eyes I can’t see any difference (or red)), then why not adopt the ANOMALY protocol, and plot a single graph that is just red minus blue. Obviously that will plot as a boring straight horizontal line, if good is just as bad as the bad, and verse vicea.

Chris D.

August 3, 2012 4:22 pm

Mosher is uncharacteristically silent on the matter.

Sou

August 3, 2012 6:36 pm

Agree with those who can’t see the problem. Not everyone has to do things the exact same way. The Muller team looked at the difference for both in any case, and found no statistically significant difference in the trends..
The Muller paper states that when they binned them as “good” and “poor” (as opposed to their “OK” and “poor” ) then the poor quality stations showed a slightly lower warming trend (though not a statistically significant difference). That’s the same as Menne found but the opposite to what Anthony has been trying to get the data to show. (Fall et al stated they found no difference in the trend of mean temperature – the ups balanced the downs.) From the Muller paper:

The difference between the “poor” (4+5) sites and the “OK” (1+2+3) sites is 0.09 ± 0.07oC per century. We also tried other groupings; the difference between the (3+4+5) grouping and the “good” (1+2) sites is -0.04 ± 0.10oC per century, i.e. the other sites are warming at a slower rate than are the good sites, although the effect is not larger than the statistical uncertainty. There is no evidence that the poor sites show a greater warming trend than do the OK sites.

sunshinehours1

August 3, 2012 8:10 pm

Sou, I’d be curious about a geographic breakdown before I believe the BEST team about anything.
Consider Arkansas. http://berkeleyearth.lbl.gov/regions/arkansas
1) They graph 12 month and 10 year moving averages, yet the data has 5 year averages as well. They didn’t graph the 5 year averages.
2) The claim a .48C / century trend from 1910. NOAA says -0.06 degF / Decade from 1910.
Which is a huge difference.
You can check NOAA here: http://www.ncdc.noaa.gov/oa/climate/research/cag3/ar.html
3) From 1895, NOAA says: 1895 – 2011 Trend = -0.03 degF / Decade. Again, a negative trend.
4) Annual 1920 – 2011 Trend = -0.09 degF / Decade. Again, a negative trend.
I can easily find negative trends from Arkansas at NOAA, but BEST couldn’t find any. You have to be dishonest or incompetent to not find any negative trends for Arkansas.
I don’t think I believe BEST about much.

Ed_B

August 3, 2012 8:14 pm

IMO Muller is not up to a grade 7 science student level of analysis. I am shocked at what passes as “science” at Berkley U. No way no how should A Watts need to point out such an obvious binning error on Mullers part. I can only conclude that the 3s were mixed in with the 1s and 2s to produce the “desired” result, ie, “global warming”. EPIC SCIENCE FAILURE!!

sunshinehours1

August 3, 2012 8:28 pm

As an addendum to my Arkansas post. There are six states which have a negative trend from 1895 as of the end of 2011. Alabama, Georgia, Mississippi, Arkansas, South Carolina and Tennessee.
http://sunshinehours.wordpress.com/2012/04/29/is-the-usa-warming-the-noaa-data-saysit-depends-part-1/
BEST didn’t find any negative trends.
http://berkeleyearth.lbl.gov/regions/alabama
http://berkeleyearth.lbl.gov/regions/georgia-(state)
http://berkeleyearth.lbl.gov/regions/mississippi
http://berkeleyearth.lbl.gov/regions/arkansas
http://berkeleyearth.lbl.gov/regions/south-carolina
http://berkeleyearth.lbl.gov/regions/tennessee
“It is shown that the SE United States is one of few regions on this planet that shows a cooling trend over the twentieth century (Trenberth et al. 2007). Portmann et al. (2009) find that this cooling trend is strongest in the late spring–early summer period.”
http://coaps.fsu.edu/pub/eric/papers_html/Misra_et_al_11.pdf
How could BEST not find a cooling trend when a lot other people (including Trenbeth!) can?

Jonathan Smith

August 3, 2012 8:56 pm

If only the current crop of AGW climate scientists could be spread amongst all of the scientific disciplines. Their ability to grasp such a phenominally complex system as the Earth’s climate, and make highly accurate predictions into the far future, surely means that they can figure out anything. Disperse them properly and by my reckoning the whole of science should be sewn up within 5 yrs (to at least 6 decimal places).

Owen in Ga

August 3, 2012 8:58 pm

Error is added in quadrature so if class1 and class2 had 0 error, class3 had 1C error, class4 had 3C error and class 5 had 5C (best cases dropping the or greater than part)
Error bars for
class 1+2 = sqrt(0^2+0^2) = 0
class 1+2+3 = sqrt(0^2+0^2+1^2)=1C an infinite difference (divide by 0)
class 3+4+5 = sqrt(1^2+3^2+5^2)=5.9C
class 4+5 = sqrt(3^2+5^2)=5.8C
So they take an accepted graph with 0 and 5.9C error bars and substitute a graph with 1C and 5.8C bars, but since they didn’t plot the bars you don’t see the difference.

Brian H

August 3, 2012 9:22 pm

Somebody says:
August 3, 2012 at 7:16 am
Just in case you missed it: http://phys.org/news/2012-08-climate-refuted-shifts-high-profile.html

Nice catch! [To summarize, plant slope migration was controlled by normal post-fire patterns.]
But, of course, the normal CYA statement is included:

“I want to be clear that I’m not saying climate change isn’t happening or having effects,” Schwilk said. “I study it all the time.”

Dave N

August 3, 2012 9:38 pm

It bears repeating: it matters not what someone’s qualifications and/or experience are, nor does it matter how much money they have, nor who they are funded by; nothing can change the veracity of their statements.
“Unqualified and inexpert” is ad hominem; when it comes to veracity, stick to what a person is saying, not who they are.

Gail Combs

August 3, 2012 11:01 pm

EternalOptimist says:
August 3, 2012 at 7:28 am
If what they say starting on line 163 is accurate, and type 3 has the lowest rate of temperature rise of all five groups, and both bins produce an identical anomaly, then removing 3 from the ‘ok’ bin would mean that the ok bin would go up and the poor bin would go down.
And Muller would thereby prove that sites on UHI would return lower temperatures than those that are perfectly situated
only in the topsy turvy world of Muller
_________________________________________
Or perhaps it indicates the data is not “Raw” but already “Adjusted” and is showing they good (sites 1 & 2) data was adjusted even higher than necessary to match the other sites.
(At this point I would not trust any data.)

rogerknights

August 3, 2012 11:04 pm

jorgekafkazar says:
August 3, 2012 at 11:44 am
Shevva says: “What would you call a group of Climate Scientists though, a Chorus of Climate Scientists?”
Without intending to demean all slimate…oops, that was truly a Freudian slip. Let me start over: not all climate scientists would be included in this group, but based on majority behaviors, I lean towards a pander of climate scientists.

A claque of climatologers? (“claque” = pals, or paid strangers, who applaud a performer.)

Evan Thomas

August 3, 2012 11:08 pm

Why not get a young US Met. Sci. student to set up an experiment locating, say 100 automatic climate recorders in obviously poor sites and 100 in good sites in a theoretically climatically similar region? In OZ we have a similar problem with BOM refusing to admit their stats may be flawed. And of course across the ditch in NZ their weather-men have been taken to court over their stats. Cheers from chilly Sydney.

Lucy Skywalker

August 4, 2012 1:56 am

Yes I did spot it, in your paper, line 293. It jumped right out at me, oh that’s the detail that clinches why Muller’s results are c**p, so I made a note of it.

Mark

August 4, 2012 3:05 am

Pamela Gray says:
For those of you who think the paper has not made a statistical mistake remixing bins. This has nothing to do with whether or not the bins are mixed this way and that and then track each other’s average anomaly, as BEST seems to want to say. It has to do with error bars being significantly different from each other. Bins 1 and 2 have small error bars. Bins 3, 4, and 5 have larger error bars. Comparing the average anomalies without error bars is deceptive. BEST wants to mix good data with crappy data (without telling us) and say there is nothing wrong, move along.
This appears to be common in climate “science”. I’m unaware of anywhere else it would be considered acceptable to knowingly mix bad data with good. (Or to not attempt a revision if this had happened unknowingly.)

amoeba

August 4, 2012 4:42 am

This post is clearly mistaken and dramatically misses the point, and the VERY FIRST comment points this out (as a later comment by “Sou” does as well in a bit more detail). Still, there are dozens of other people in the comments saying things like “wow, amazing, excellent catch”. If it is not an incident of collective madness, it must be a flashmob.

tallbloke

August 4, 2012 5:01 am

tallbloke says:
August 3, 2012 at 6:43 am
Congrats on the 2012 pre-print release Anthony.
Muller says:
“group 3 actually has the lowest rate of temperature rise of any of the 5 groups.”
Is this still true under the Leroy 2011 re-analysis?
[REPLY: No. But then Leroy (2010) is assigning stations to different groups than if Leroy (1999) is applied. Read the paper again. -REP]
Thanks Robert. The relevant section I’ve found is at lines 204-212. I’ll keep reading.
It may have been flagged up already but I spotted a typo at line 387:
‘May’ airports should be ‘Many airports’

Rob MW

August 4, 2012 6:17 am

As I have said before Anthony over at CA and here; if real_climate_ scientists consider that that their climate science are ‘Apples’, and when one uses these very same ‘Apples’ in an independent paper to compare ‘Apples with Apples’ it turns out that apparently real_climat-scientists don’t like eating them-there ‘Apples’.
Unfortunately, I could not get an answer either from you or Steve when I pointed this out.

davidmhoffer

August 4, 2012 8:17 am

amoeba says:
August 4, 2012 at 4:42 am
This post is clearly mistaken and dramatically misses the point, and the VERY FIRST comment points this out (as a later comment by “Sou” does as well in a bit more detail). Still, there are dozens of other people in the comments saying things like “wow, amazing, excellent catch”. If it is not an incident of collective madness, it must be a flashmob.
>>>>>>>>>>>>>>>>>>>>>
My understanding is that the trends reported by BEST are post “adjustments”. The whole point of Anthony et al is that the adjustments applied to cat 1&2 stations were larger than the ones applied to 3,4, & 5 stations. If my understanding is correct, then the points raised by those to commentors are mute.

Steve O

August 4, 2012 8:55 am

Were there no stations that deserved a change in classification over the time period?

stpaulchuck

August 4, 2012 8:55 am

why plot anomalies? why not use type 1 as the trend baseline and then plot 2 through 5 trends against that?

stpaulchuck

August 4, 2012 8:57 am

Oh yeah… forgot to add – AND NO “ADJUSTMENTS”!! Just raw temperature data, thank you.

Bill Tuttle

August 4, 2012 9:55 am

polistra says:
August 3, 2012 at 8:49 am
Vessel? What a strange epithet!
Not a strange epithet — an archaic religious one. Parsing the phrase in context, the morass is the Swamp of Skepticism containing all us evil non-believers and Anthony is an ambulatory container full of that heresy. Google “vessel of iniquity” and watch the Genesis references pop up.
When your opponents start using religious imagery to disparage you, you know you’re dealing with cultists.

Keith Sketchley

August 4, 2012 10:29 am

Paul says: August 3, 2012 at 12:43 pm
“I’ve always found it curious how the more vocal of the CAGW alarmists can’t seem to conceive that there isn’t someone, of a higher authority behind the curtains pulling levers and issuing marching orders to his thralls. I suppose if you can’t think critically for yourself, and let some authority issue you your talking points, you assume everyone is like that.”
Indeed.
If you look at the underlying beliefs of most of them, they are subjectivists not capable of rational thought, and they believe in economic control (perhaps the defining theme of Marxism). Since they are such true believers in their own theories they cannot accept that anyone else has a reasoned alternative view. Apparently they cannot say “you are wrong because…”. And in general they like to throw words around.
Beware too that they will try to blow smoke, as I pointed out to Stephen McIntyre when David Karoly was accusing him of something recently, after McIntyre had corrected him on something.
I’ve seen it so many times, from dangerous drivers through people noisy in apartment stairwells late at night and gardeners who claim I talked nasty when I firmly told them parking in a fire lane was a bad idea to jerks who throw their cigarette butts on the ground where there is risk of grass fires instead of getting a secure receptacle – they try to put the monkey on your back.
Anthony, you’re going to have even more of such directed at you this month, due to your new technical paper.

TallDave (@TallDave7)

August 4, 2012 10:42 am

Yes, I was also wondering — are these raw or adjusted temps? Because it sure looks like they must be adjusted based on the overall trend, and if they’re adjusted then the whole thing is completely useless.

Greg Cavanagh

August 4, 2012 1:22 pm

I’m lost as to how one measures an anomaly with regard to temperature:
noun, plural a•nom•a•lies.
1. a deviation from the common rule, type, arrangement, or form.
2. someone or something anomalous: With his quiet nature, he was an anomaly in his exuberant family.
3. an odd, peculiar, or strange condition, situation, quality, etc.
4. an incongruity or inconsistency.
The whole idea of looking at two temperature measures per day and averaging them across the globe seems a ridiculous method of proving anything.
In my engineering mind, I would look at single long representative stations around the globe. If anything was amiss it would show up there. Granted it would only be a known at that point in space. Adding all (in fact they only add a sub-set of the stations) stations together, and trying to guess what temperature variations have happened to an area over time, then averaging the result to a single figure to represent the earth, seems to me to be a ludicrous exercise in confusing oneself of any meaningful value.

dbstealey

August 4, 2012 1:32 pm

Greg Cavanagh,
In the present context an anomaly is simply a deviation from the average. Zero baseline charts are used for anomalies.
When using a zero baseline chart, it is possible to show accelerating temperatures. But that is an artifact of the chart; it is not real. A long term trend chart is the proper thype of chart to use when looking at whether temperatures are accelerating.

Eli Rabett

August 4, 2012 1:50 pm

Tall Dave appears to prefer uncalibrated measurements. The semantic part about what NOAA does is that the “adjustments” are really inter-calibrations. Ask John Christy about the problems you can get into without inter-calibrations when you have different instruments or measurement devices that drift.

Kev-in-Uk

August 4, 2012 2:47 pm

Greg Cavanagh says:
August 4, 2012 at 1:22 pm
That’s my line of thinking too – there is certainly no logical derivation of a ‘global’ temperature anomaly from the available information – maybe, with a few million identical stations set at precise and equal closely spaced points and heights, and measuring continuously, etc, – we might get a decent idea – but it would still only be a snapshot at that given height/level.
The whole global temp anomaly thing is a scare tactic IMO and is certainly not scientifically valid as a ‘measurement’. At best it could be an indicator – but when they chop and change the data all the time, what is it actually indicating?

MinB

August 4, 2012 3:23 pm

Shouldn’t class 2 be artificial heating sources between 30-100m? and Class 3 be between 10-30m?
Class 1 – Sensors located at least 100 meters from artificial heating
Class 2 – Artificial heating sources within 30m
Class 3 (error 1ºC) – no artificial heating sources within 10 meters.
Class 4 (error ≥ 2ºC) – Artificial heating sources <10 meters.
Class 5 (error ≥ 5ºC) – Temperature sensor located next to/above an artificial heating source

P. Solar

August 4, 2012 6:28 pm

The whole idea of changing the grouping of sites based on what the results are is clearly flawed science whatever way it swings the “findings”. You cannot use the results of a study to justify regrouping your inputs.
I’m surprised that a “future genius” like Muller would be trying to publish a paper using that kind of method.
Perhaps this is one of the reasons his papers got rejected.

LazyTeenager

August 4, 2012 11:50 pm

This science lesson, from an “unqualified and inexpert morass”, is brought to you by the number 3.
———
So they appear to have changed the binning scheme to potentially exaggerate the discrepancy between good and poor stations and thereby provide support for Anthony’ claim that station quality is important
But they failed.
So now Anthony is complainng that their trick intended to support him was wrong.
Personally I think it don’t matter whether they used the same binning scheme as previous papers. This is not cast in stone and as long as its documented it’s fine.

davidmhoffer

August 5, 2012 1:17 am

LazyTeenager;
This is not cast in stone and as long as its documented it’s fine.
>>>>>>>>>>>>>>>>>>>>>>
Yes! It doesn’t matter how wrong you do things, itz OK as long as you document them. Integrity of data and process don’t matter as long as you document everything. A professor once told me that when you don’t know what you are doing, do it in excrutiating detail. I guess Lazy subscribes to the same philosophy.

amoeba

August 5, 2012 1:48 am

Replying to *davidmhoffer* (August 4, 2012 at 8:17 am):
—–
amoeba says: …
My understanding is that the trends reported by BEST are post “adjustments”. The whole point of Anthony et al is that the adjustments applied to cat 1&2 stations were larger than the ones applied to 3,4, & 5 stations. If my understanding is correct, then the points raised by those to commentors are mute.
—–
I beg to disagree. Anthony wrote a long post and not a single time did he mention “adjustments”. The whole post (and quite a detailed and lengthy one) is ONLY about groupings. The wrong grouping (1+2+3 instead of 1+2) is presented as a, quote, little gem, unquote, that everybody else failed to notice. And this critique is absurd, as correctly pointed out by previous commentators.

LazyTeenager

August 5, 2012 4:29 am

Pamela Gray says
The average of crappy data is meaningless.
———-
This is a bogus over generalization. The exact way in which data is crappy is very important.
I can look at graphs of data/ signal which look just like noise. Given the right processing techniques I can pull out signal that is as clear as day.
Everyone here uses technology on a daily basis that depends on being able to process crappy data and derive from it good data. This includes all of the electronic devices you use.

LazyTeenager

August 5, 2012 4:40 am

davidmhoffer on August 5, 2012 at 1:17 am
LazyTeenager;
This is not cast in stone and as long as its documented it’s fine.
>>>>>>>>>>>>>>>>>>>>>>
Yes! It doesn’t matter how wrong you do things, itz OK as long as you document them. Integrity of data and process don’t matter as long as you document everything.
———-
Well the boundaries of the binning process are somewhat arbitrary as is the classification scheme. So it’s NOT wrong.
If they document it and someone else wants to pick nits then the critic can do it some other way and prove that their new way is better.
But the point is if the binning convention is changed to make the results more comparable to past papers it will still not be wrong, just different. The final outcome of this will be even less difference between the temperature trends of good stations and poor stations.

davidmhoffer

August 5, 2012 7:03 am

amoeba;
I beg to disagree. Anthony wrote a long post and not a single time did he mention “adjustments”. The whole post (and quite a detailed and lengthy one) is ONLY about groupings.
>>>>>>>>>>>>>>>>>>
It was BEST who claimed that the results produced the trends they did, and BEST calculated their trends from adjusted data.

amoeba

August 5, 2012 8:17 am

I am sorry, davidmhoffer, but you are confused. I would recommend you go and read this paper, then maybe you will see it yourself. What you are saying is, first, beside the point (as I tried to explain before), and, second, plain wrong: BEST is not even using adjusted data! They use “scalpel” and outlier deweighting, but they do not use adjusted data as it was usually done before them. But again: in the context of this post it is irrelevant! Anthony’s post is not about adjusting. It is about binning. Period. I will not argue anymore, can only recommend you to actually read BEST papers.
PS. LazyTeenager is completely right, by the way.

davidmhoffer

August 5, 2012 9:09 am

amoeba;
PS. LazyTeenager is completely right, by the way.
>>>>>>>>>>>>>>>>>>>
Ooooh, such big words for a single cell creature. FWIW, Lazy’s stock in trade is obfuscation and misdirection and rationalization of bad behaviour. If he indeed got something right and simply said so straight forwardly, it is a first in my experience. That said, the notion that simply documenting what you did somehow compensates for what you did being wrong doesn’t sit well with me, and combining error ridden data with high quality data is simply wrong.
As for the trends, if unadjusted pristine stations show a higher trend than unadjusted poor quality stations, then there is something terribly wrong with the data that has not yet been understood. The vast majority of effects experienced by poorly sited stations is increased temperatures, and these become more pronounced over time as cities grow and previously pristine stations become poor ones. For the trends of the poor quality stations to be below that of the pristine ones is completely illogical, and demands further investigation. If it isn’t the post verus pre adjustment issue as I surmized, then it is another issue. As Anthony points out at various times, the factors affecting the data when one examines the stations on a station by station basis are numerous and highly unpredictable. Yes, sometimes a counter intuitive result turns out to be the correct result. But in this case I find that hard to believe, and there is a massive amount of meta data regarding ALL the stations that we simply do not have.

amoeba

August 5, 2012 9:53 am

Well, you keep mixing things together. Let’s assume (for a second) that BEST estimate all the trends correctly. They consider binnings 1+2+3 vs 4+5 and 1+2 vs 3+4+5, present both (!) results, but focus on the first because it’s more favourable to the hypothesis that bad stations show more warming. Still, they reject this hypothesis. Anthony writes a lengthy post saying that it was wrong to bin 1+2+3 together. This is an absurd critique. I can only repeat that, sorry.
What you are saying now is that you do not believe (or at least find highly suspicious) that the BEST trend for the bad stations is not larger than for the good/ok stations. But don’t you see that this is another issue? Anthony did not write about it. In my first comment I did not write about it. It’s just a different discussion.
All right, now if you still ask me, how come that BEST trend for the bad stations is so low, I have no idea. I suspect that it might be because their outligher deweighting algorithm basically kicks out those stations that show anomalously high warming, in automatic regime. But I am not sure. Maybe they are just wrong and screwed everything up. I do not know and that’s why I did not even want to discuss that. I was only talking about binning.

jim2

August 5, 2012 11:51 am

So, why did the classification 3 stations have such low trend? Are the siting criteria valid? Maybe not in light of Leroy 2010?

George E. Smith;

August 5, 2012 11:36 pm

“””””…..LazyTeenager says:
August 5, 2012 at 4:29 am
Pamela Gray says
The average of crappy data is meaningless.
———-
This is a bogus over generalization. The exact way in which data is crappy is very important.
I can look at graphs of data/ signal which look just like noise. Given the right processing techniques I can pull out signal that is as clear as day……”””””
Well since we are talking about the various “weather sites” then it stands to reason that we are talking about a potential “sampled data system”.
I say “potential”, because, if it’s a sampled data sytem, then it has to comply with the Nyquist sampling theorem, or it isn’t gathering any data at all; just. readings, which are not representative of the true continuous variable(s) that are being observed and sampled there. Absent a valid sampling regimen, then there isn’t any signal to be pulled out of the noise; not even by Lazy Teenager’s genius Techniques.
And don;t give me that “we don’t need the signal” just the “average”/trend/whatever.
Well you only have to undersample by a factor of two and poof goes your average, so you are left with notning. Gee I guess a twice per day sampling rate (max/min) doesn’t meet the minimm Nyquist criterion except for a pure sinusoidal signal with no harmonic content; so even temporally, the “data” is no good, and when you look at the spatial sampling rate (1200 km anyone), well calling it data is a joke.

TallDave

August 8, 2012 10:33 am

Eli Rabett says:
August 4, 2012 at 1:50 pm
“Tall Dave appears to prefer uncalibrated measurements. The semantic part about what NOAA does is that the “adjustments” are really inter-calibrations. Ask John Christy about the problems you can get into without inter-calibrations when you have different instruments or measurement devices that drift. ”
This is utter nonsense. If you’re comparing station quality, why in God’s name would you inter-calibrate them first? The inter-calibrations hide the thing you’re looking for!
It’s like saying “Okay, I have a theory that some of these bowls of oatmeals has more raisins than the others, but before I count the number of raisins in each bowl I’m going to mix them all together.”
This is either extremely bad science, or a deliberate attempt to hide the siting problems. Since Hansen did the same thing years ago, the latter seems much more likely.

TallDave

August 8, 2012 10:44 am

“My understanding is that the trends reported by BEST are post “adjustments”. The whole point of Anthony et al is that the adjustments applied to cat 1&2 stations were larger than the ones applied to 3,4, & 5 stations. If my understanding is correct, then the points raised by those to commentors are mute.”
Exactly!
It’s really hard not to see this as dishonesty on Muller’s part. Either that, or he had ho understanding of how the adjustments were done, which is even worse.
See, for instance, here: http://wattsupwiththat.com/2009/04/18/what-happens-when-you-divide-antarctica-into-two-distinct-climate-zones/