Guest Post by Willis Eschenbach
I have long suspected a theoretical error in the way that some climate scientists estimate the uncertainty in anomaly data. I think that I’ve found clear evidence of the error in the Berkeley Earth Surface Temperature data. I say “I think”, because as always, there certainly may be something I’ve overlooked.
Figure 1 shows their graph of the Berkeley Earth data in question. The underlying data, including error estimates, can be downloaded from here.
Figure 1. Monthly temperature anomaly data graph from Berkeley Earth. It shows their results (black) and other datasets. ORIGINAL CAPTION: Land temperature with 1- and 10-year running averages. The shaded regions are the one- and two-standard deviation uncertainties calculated including both statistical and spatial sampling errors. Prior land results from the other groups are also plotted. The NASA GISS record had a land mask applied; the HadCRU curve is the simple land average, not the hemispheric-weighted one. SOURCE
So let me see if I can explain the error I suspected. I think that the error involved in taking the anomalies is not included in their reported total errors. Here’s how the process of calculating an anomaly works.
First, you take the actual readings, month by month. Then you take the average for each month. Here’s an example, using the temperatures in Anchorage, Alaska from 1950 to 1980.
To calculate the anomalies, from each monthly data point you subtract that month’s average. These monthly averages, called the “climatology”, are shown in the top row of Figure 2. After the month’s averages are subtracted from the actual data, whatever is left over is the “anomaly”, the difference between the actual data and the monthly average. For example, in January 1951 (top left in Figure 2) the Anchorage temperature is minus 14.9 degrees. The average for the month of January is minus 10.2 degrees. Thus the anomaly for January 1951 is -4.7 degrees—that month is 4.7 degrees colder than the average January.
What I have suspected for a while is that the error in the climatology itself is erroneously not taken into account when calculating the total error for a given month’s anomaly. Each of the numbers in the top row of Figure 2, the monthly averages that make up the climatology, has an associated error. That error has to be carried forwards when you subtract the monthly averages from the observational data. The final result, the anomaly of minus 4.5 degrees, contains two distinct sources of error.
One is error associated with that individual January 1951 average, -14.7°C. For example, the person taking the measurements may have consistently misread the thermometer, or the electronics might have drifted during that month.
The other source of error is the error in the monthly averages (the “climatology”) which are being subtracted from each value. Assuming the errors are independent, which of course may not be the case but is usually assumed, these two errors add “in quadrature”. This means that the final error is the square root of the sum of the squares of the errors.
One important corollary of this is that the final error estimate for a given month’s anomaly cannot be smaller than the error in the climatology for that month.
Now let me show you the Berkeley Earth results. To their credit, they have been very transparent and reported various details. Among the details in the data cited above are their estimate of the total, all-inclusive error for each month. And fortunately, their reported results also include the following information for each month:
Since they are subtracting those values from each of the monthly temperatures to get the anomalies, the total Berkeley Earth monthly errors can never be smaller than those error values.
Here’s the problem. Figure 4 compares those monthly error values shown in Figure 3 to the actual reported total monthly errors for the 2012 monthly anomaly data from the dataset cited above:
The light blue months are months where the reported error associated with the monthly average is larger than the reported 2012 monthly error … I don’t see how that’s possible.
Where I first suspected the error (but have never been able to show it) is in the ocean data. The reported accuracy is far too great given the number of available observations, as I showed here. I suspect that the reason is that they have not carried forwards the error in the climatology, although that’s just a guess to try to explain the unbelievable reported errors in the ocean data.
Statistics gurus, what am I missing here? Has the Berkeley Earth analysis method somehow gotten around this roadblock? Am I misunderstanding their numbers? I’m self-taught in all this stuff and I’ve been wrong before, am I off the rails here? Always more to learn.
My best to all,