Pielke Sr. on sampling error in BEST 2% preliminary results

Is There A Sampling Bias In The BEST Analysis Reported By Richard Muller?

Guest post by Dr. Roger Pielke Senior

In his testimony Richard Muller (which I posted on Friday April 2 2011), indicated that he used 2% of the available surface stations that measure temperatures in the BEST assessment of long-term trends. It is important to realize that the sampling is still biased if a preponderance of his data sources comes from a subset of actual landscape types.  The sampling will necessarily be skewed towards those sites.

If the BEST data came from a different distribution of locations than the GHCNv.2, however, then his results would add important new insight into the temperature trend analyses. If they have the same spatial distribution, however, they would not add anything beyond confirming that NCDC, GISS and CRU were properly using the collected raw data.

We discuss this bias in station locations in our paper

Montandon, L.M., S. Fall, R.A. Pielke Sr., and D. Niyogi, 2011: Distribution of landscape types in the Global Historical Climatology Network. Earth Interactions, 15:6, doi: 10.1175/2010EI371

The abstract reads [highlight added]

“The Global Historical Climate Network version 2 (GHCNv.2) surface temperature dataset is widely used for reconstructions such as the global average surface temperature (GAST) anomaly. Because land use and land cover (LULC) affect temperatures, it is important to examine the spatial distribution and the LULC representation of GHCNv.2 stations. Here, nightlight imagery, two LULC datasets, and a population and cropland historical reconstruction are used to estimate the present and historical worldwide occurrence of LULC types and the number of GHCNv.2 stations within each. Results show that the GHCNv.2 station locations are biased toward urban and cropland (>50% stations versus 18.4% of the world’s land) and past century reclaimed cropland areas (35% stations versus 3.4% land). However, widely occurring LULC such as open shrubland, bare, snow/ice, and evergreen broadleaf forests are underrepresented (14% stations versus 48.1% land), as well as nonurban areas that have remained uncultivated in the past century (14.2% stations versus 43.2% land). Results from the temperature trends over the different landscapes confirm that the temperature trends are different for different LULC and that the GHCNv.2 stations network might be missing on long-term larger positive trends. This opens the possibility that the temperature increases of Earth’s land surface in the last century would be higher than what the GHCNv.2-based GAST analyses report.”

This derived surface temperature trends is higher than what BEST found.  However, this also means that the divergence between the surface temperature trends and the lower tropopsheric temperature trends that we found in

Klotzbach, P.J., R.A. Pielke Sr., R.A. Pielke Jr., J.R. Christy, and R.T. McNider, 2009: An alternative explanation for differential temperature trends at the surface and in the lower troposphere. J. Geophys. Res., 114, D21102, doi:10.1029/2009JD011841.

is even higher.  This difference suggests that unresolved issues, including a likely systematic warm bias,  remains in the analysis of long term surface temperature trends.

The climate data they don't want you to find — free, to your inbox.
Join readers who get 5–8 new articles daily — no algorithms, no shadow bans.
0 0 votes
Article Rating
30 Comments
John Brookes
April 5, 2011 7:11 am

Its pretty simple really. Muller can choose a few different 2% samples, and see if he gets the same result. If he does, then it looks like 2% is enough. If not, then 2% is not enough.
However, it hardly seems surprising that Muller finds what everyone else finds. Thus far it seems that the land based temperature record is well and truly verified. Dare I say it, this particular bit of the science is settled?

Jryan
April 5, 2011 11:15 am

Aaron says:
April 4, 2011 at 8:46 am
2 percent of ~1.9 billion records is more than enough for statistical inference. I wouldn’t expect results to change drastically.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
If I set out to establish a graph of the running national mean household income, but then geographically limit my data selection to urban areas, will I be able to trust my final result?
Of course not.

Al Tekhasski
April 5, 2011 2:34 pm

If your initial set of data is biased by placement of sensors to areas with anthropogenic development (as almost ALL stations are), no shuffling/re-selection of subsets can prove anything. You need NEW set of stations, more dense sampling.
Same goes for rainfall. The rain data have exactly the same problem as ground stations – insufficient sampling density. Given fractal character of cloudiness and associated rainfall patterns, one station is not going to sample the amount of rainfall correctly for an area. For example, during a thunderstorm front crossing, one part of town can get 2″ of rain, while another part can have nearly zero. With one sensor/observatory you never know, even if your (single spot) records go back for 150 years. Theoretically the randomness of weather should give you proper statistical estimate over time, but the question is was there enough number of weather events in a given season. It is not. So all global rainfall data are bogus as ground temperature data.

April 5, 2011 11:37 pm

http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&page=toc&handle=euclid.aoas/1300715166
The large debate around the McShane & Wyner paper last year has finally published, would you please list them on your site, Dr. Pielke? It is not allowed to leave comments there.
Annals of Applied Statistics, Vol. 5, No.1.

Ryan
April 6, 2011 2:46 am

For the cost of one green power station you could readily create small climate monitoring stations and disperse them about the globe to the satisfaction of all parties and in 30 years have a definitive answer on whether the globe is actually warming or not. The fact that nobody is even talking about such a project but instead relying entirely on the dubious data of a number of sources never intended for climate monitoring tells you all you need to know about the AGW scam and the people that support it.
What we need is a team of sceptical scientists to get together and propose such a scheme, then give it a name like “Global Climate Monitoring Network” and then ram it down the throats of Team AGW whenever they speak – “Why don’t you support the GCMN?”, “Why do you rely on unreliable data when GCMN would give us definitive answers?” “Why are you trying to measure temperature from hundred of miles in space when GCMN would tell us the temperature here on earth for a fraction of the cost” etc. etc. etc.
Team AGW are scared of real data – they will run a mile if you push them to accept that real data is needed.