Guest Post by Willis Eschenbach
I have long suspected a theoretical error in the way that some climate scientists estimate the uncertainty in anomaly data. I think that I’ve found clear evidence of the error in the Berkeley Earth Surface Temperature data. I say “I think”, because as always, there certainly may be something I’ve overlooked.
Figure 1 shows their graph of the Berkeley Earth data in question. The underlying data, including error estimates, can be downloaded from here.
Figure 1. Monthly temperature anomaly data graph from Berkeley Earth. It shows their results (black) and other datasets. ORIGINAL CAPTION: Land temperature with 1- and 10-year running averages. The shaded regions are the one- and two-standard deviation uncertainties calculated including both statistical and spatial sampling errors. Prior land results from the other groups are also plotted. The NASA GISS record had a land mask applied; the HadCRU curve is the simple land average, not the hemispheric-weighted one. SOURCE
So let me see if I can explain the error I suspected. I think that the error involved in taking the anomalies is not included in their reported total errors. Here’s how the process of calculating an anomaly works.
First, you take the actual readings, month by month. Then you take the average for each month. Here’s an example, using the temperatures in Anchorage, Alaska from 1950 to 1980.
Figure 2. Anchorage temperatures, along with monthly averages.
To calculate the anomalies, from each monthly data point you subtract that month’s average. These monthly averages, called the “climatology”, are shown in the top row of Figure 2. After the month’s averages are subtracted from the actual data, whatever is left over is the “anomaly”, the difference between the actual data and the monthly average. For example, in January 1951 (top left in Figure 2) the Anchorage temperature is minus 14.9 degrees. The average for the month of January is minus 10.2 degrees. Thus the anomaly for January 1951 is -4.7 degrees—that month is 4.7 degrees colder than the average January.
What I have suspected for a while is that the error in the climatology itself is erroneously not taken into account when calculating the total error for a given month’s anomaly. Each of the numbers in the top row of Figure 2, the monthly averages that make up the climatology, has an associated error. That error has to be carried forwards when you subtract the monthly averages from the observational data. The final result, the anomaly of minus 4.5 degrees, contains two distinct sources of error.
One is error associated with that individual January 1951 average, -14.7°C. For example, the person taking the measurements may have consistently misread the thermometer, or the electronics might have drifted during that month.
The other source of error is the error in the monthly averages (the “climatology”) which are being subtracted from each value. Assuming the errors are independent, which of course may not be the case but is usually assumed, these two errors add “in quadrature”. This means that the final error is the square root of the sum of the squares of the errors.
One important corollary of this is that the final error estimate for a given month’s anomaly cannot be smaller than the error in the climatology for that month.
Now let me show you the Berkeley Earth results. To their credit, they have been very transparent and reported various details. Among the details in the data cited above are their estimate of the total, all-inclusive error for each month. And fortunately, their reported results also include the following information for each month:
Figure 3. Berkeley Earth estimated monthly land temperatures, along with their associated errors.
Since they are subtracting those values from each of the monthly temperatures to get the anomalies, the total Berkeley Earth monthly errors can never be smaller than those error values.
Here’s the problem. Figure 4 compares those monthly error values shown in Figure 3 to the actual reported total monthly errors for the 2012 monthly anomaly data from the dataset cited above:
Figure 4. Error associated with the monthly average (light and dark blue) compared to the 2012 reported total error. All data from the Berkeley Earth dataset linked above.
The light blue months are months where the reported error associated with the monthly average is larger than the reported 2012 monthly error … I don’t see how that’s possible.
Where I first suspected the error (but have never been able to show it) is in the ocean data. The reported accuracy is far too great given the number of available observations, as I showed here. I suspect that the reason is that they have not carried forwards the error in the climatology, although that’s just a guess to try to explain the unbelievable reported errors in the ocean data.
Statistics gurus, what am I missing here? Has the Berkeley Earth analysis method somehow gotten around this roadblock? Am I misunderstanding their numbers? I’m self-taught in all this stuff and I’ve been wrong before, am I off the rails here? Always more to learn.
My best to all,
w.
OK Richard we obviously got our wires crossed. I think there is no harm in trying, the danger is not in the estimates it’s in the reported high degrees of confidence.
@cd says:
August 20, 2013 at 3:48 pm
I think a lot of comments here seem to be asking a lot and applying standards that one might expect in a controlled laboratory environment to the global environment.
Totally agree with you @cd and worse many seem to not understand what they are dealing with because they lost the realization that all this stuff is energy somewhere in the haze of classic physics.
I am sitting on the fence I think climate change probably is real but I am dammed if accept any of the accuracies that I am being asked to accept and the statistics is just making it worse.
cd:
Admittedly, vetted century-long station records from effectively non-urban sites are available only sparsely around globe. Nevertheless, such are required fo avoid seriously trend-biased estimates of GSAT. As a practical scientific matter, one must opt for geographically incomplete coverage by reliable station data at fixed locations over the illusion of continuous coverage provided by kriging variously corrupted data stitched together in time from ever-changing locations
This mandate is made all the more imperative by the empirical recognition that the temperature field is usually NOT spatially homogeneous (invariant, aside from a constant offset and a scale factor) over distances greater than a few hundred km, Nor is it isotropic (directionally independent), as assumed by BEST’s universal “correlation length”–i.e., their effective “variogram ” The real world is considerably more complex than that!
And then there’s temporal variability–a feature generally not treated adequately in geostatistics. That’s why I prefer to work with cross-spectral techniques in estimating regional temperature variations. Contrary to patent academic hubris, no reliable average time-history can be obtained in many regions around the globe for all the years prior to the satellite era.
I am much more troubled by the assumption – apparently required through this entire conversation – that any given temperature (weather, that is) at any given place any time during the centuries can be represented by an “average” temperature WITH a “plus or minus “error” before any statistical processing can even begin.
Statistical Process Control, and hence its foundation of statistical processing even as basic as averages and standard deviations, MUST begin with repeated measurements of the same thing, or of similar things repeatedly measured the same way.
But temperature is NOT a standard “thing”. It is NOT static, nor done it change linearly, straightforwardly, or in the same direction every time. It is chaotic. It is NOT measured several times to “get an average”. Temperatures were (are!) measured twice a day. The measurements are never repeated: The next day – under the next day’s “weather” – gets two more unique measurements. Over time (a decade or a quarter century) there may become a trend in successive unique temperature measurements, but the NEVER repeat the same “weather”. Ever.
If I have 10,000 ball bearings coming down a chute every day, I can tell you what the standard deviation is of the set, what the average is, what the error might be in my measurement tool or in the grinding wheel or the bar stock. But I cannot do that for the daily temperatures. If that run of 10,000 1.000 cm ball bearings is combined with 5000 2.000 cm ball bearings, i don’t have 15,000 “average” ball bearings. I still have two sets of unique ball bearings.
With daily temperatures, the location matters and time-of-day matters. But “mushing” that unique data over ever wider and wider geographic regions to make it appear that the heating was widespread and dramatic only started Hansen’s original series of errors.
LdB said @ur momisugly August 20, 2013 at 4:55 pm
We already knew that it’s all about enthalpy, or at least most of us did. We are “lost in a haze of classical physics” because we live in a classical physics world, not a quantum world. I have yet to find a thermometer in a superposition of states 😉
Climate change is real and nobody around here doubts that. You are new here. Please respect the fact that most of us are here to learn — physics, quantum or classical, don’t figure in our day jobs. We have an excellent physics tutor (refresher for many) in RG Brown who teaches physics at Duke and wrote some texts, but most days he’s busy. If you relax and listen a bit more/longer, you might learn some stuff, too. The engineers and geologists have some fascinating insights into climate.
RACookPE1978 said @ur momisugly August 20, 2013 at 6:53 pm
As John Daley used to say: All science is mathematics, but not all mathematics is science. I believe that he shared your concern.
RACookPE1978 and The Pompous Git:
Sincere thanks for your posts at August 20, 2013 at 6:53 pm
http://wattsupwiththat.com/2013/08/17/monthly-averages-anomalies-and-uncertainties/#comment-1395783
and August 21, 2013 at 12:51 am
http://wattsupwiththat.com/2013/08/17/monthly-averages-anomalies-and-uncertainties/#comment-1395923
respectively.
YES! You add to what I have been trying to say throughout this thread; i.e.
There is no known valid method to obtain an average global temperature and that is why
(a) there are several methods used to obtain ‘global temperature’
and
(b) those methods are each often changed.
Hence, any presented determination of global temperature is misleading. Any datum for global temperature presented for any past time could be different next month and – history shows – it probably will be.
Whatever anyone wants to call determinations of global temperature;
the determinations and the determined values of global temperature are certainly NOT science.
Richard
1sky1
This mandate is made all the more imperative by the empirical recognition that the temperature field is usually NOT spatially homogeneous (invariant, aside from a constant offset and a scale factor) over distances greater than a few hundred km,
Sorry I’m probably being a bit slow here but this all sounds very significant but I don’t know what it means. Do you mean that temperature varies (NOT spatially homogeneous), would that not make it variant rather than invariant; I’m assuming that you mean transform invariant (scale, translation and rotation). But still not sure why you raise the point and what it actually means.
Nor is it isotropic (directionally independent), as assumed by BEST’s universal “correlation length”–i.e., their effective “variogram ”
My understanding was that they used a deterministic (functional) temperature sphere to detrend the data => stationary. This may have also catered for structural and/or geometric anisotropy. If so then one would expect an isotropic variogam model. Although admittedly to derive the residuals from a deterministic model seems a little contrived.
And then there’s temporal variability–a feature generally not treated adequately in geostatistics
Agreed but then they were only trying to reduce a data array to a single global value.
That’s why I prefer to work with cross-spectral techniques in estimating regional temperature variations.
Are you referring to the Blackman-Tukey method? Why on Earth would you do this? I can only guess that you’re defining regions then computing the power density spectrum via the autocovariance (BTW the “inverse” of the variogram) and then comparing these for each region. Why is this better?
Willis,
I guess you could run a couple of simulations to support or falsify your point. Take some ground truth (sinusoidal, or a real temperature data set if you want to be realistic). Add different types of noise. Apply their algorithms for determining monthly averages etc. Calculate the (simulated) errors in anomaly and climatology using your ground truth. This should give you a reasonable estimate after a few trial runs.
Frank
cd:
The tacit assumption of wide-range spatial homogeneity (uniformity of stochastic variation) is the justification for the objectionable practice of combining anomalies from DIFFERENT stations at DIFFERENT time-intervals to produce a long-term “regional” time-series from mere segments of data. In reality, outside a relatively narrow range, the anomalies DIFFER substantially over both space and time in most cases. Their stochastic behavior often changes quite abruptly in transitional climate zones between maritime and continental regimes or where mountain ranges intervene. And these changes are by no means uniform across the power density spectrum. In other words, the total correlation–either spatial or temporal–is not a fully adequate measure in discerning important differences.
cd:
WordPress flashing prompted me to post before I completed my thoughts:
Cross-spectrum analysis reveals the entire linear relationship between any pair of records, including the coherence and relative phase in each spectral band. (BTW, It need not be calculated by the B-T algorithm.) It’s almost a sine qua non for analyzing real-world time-series, instead of simplistic academic ideas. Can’t take more time to explain.
cd:
Found some time for a very brief addendum:
Cross-spectrum analysis (which is not just a power-density comparison) is indispensable not only in delineating areas of effective homogeneity, where anomalies from different stations can be legitimately be used to synthesize a longer time-series, but in identifying corrupted station records with non-climatic components, whose indiscriminate inclusion in regional averages introduces a bias.in the results. Hope this helps your understanding.
1sky1
Thanks for your reply.
I think you’ve got the wrong end of the stick (or perhaps I’m misunderstanding you).
justification for the objectionable practice of combining anomalies from DIFFERENT stations at DIFFERENT time-intervals
As far as I am aware…
The aim of using any gridding algorithm is to get a global mean for a single point in time (there is no temporal component – the data is assumed to be static), say for a month (using monthly average station values). You do this for each month in order to build up a time series. That is all.
Obviously the controls for each month’s gridding run may vary through time but in order to get each point in the time series you are not mixing anomalies as you suggest.
Their stochastic behavior often changes quite abruptly in transitional climate zones between maritime and continental regimes or where mountain ranges intervene
But as I said they used a deterministic model of climate to remove such differences and to effectively produce a stationary data set. It is common – in fact often necessary – practice to remove local trends before Kriging. However, there are types of Kriging that can account for local/abrupt changes.
I admit that BEST’s approach seamed less conventional.
And these changes are by no means uniform across the power density spectrum.
Again I don’t know why this has anything to do with gridding in the current context?
1sky1
Replying to your last comment.
Cross-spectrum analysis (which is not just a power-density comparison) is indispensable not only in delineating areas of effective homogeneity, where anomalies from different stations can be legitimately be used to synthesize a longer time-series
Look I know why one might use cross-spectrum analysis? But what on Earth has this to do with gridding?
Are you suggesting that this should be done prior to gridding in order to vet stations in terms of suitability. I’m not going to go there, as I don’t believe you can do this statistically. It is an experimental problem that cannot be done remotely without a reliable base case for every station to compare with – you don;t have an array of these and if you did you’d just use these instead. In the end you just go round in circles commonly compounding the bias or creating a new one. Again, everyone is data processing crazy in this field.
cd:
I believe that you are grabbing the wrong end of the stick. Gridding to obtain a single “global” value at a point in time is not the avowed purpose. In fact, BEST claims that gridding can introduce artifacts, which are avoided by using krigging. Their purpose is to synthesize long time-series as a continuous function of spatial position, which can be integrated over to obtain regional and global average time-series. I am simply pointing out the many ways that the data base and their analytic presumptions are not up to this daunting task.
A self-serving and inaccurate recommendation, Carrick