James A. Schrumpf
If there’s one belief in climate science that probably does have a 97% — or more — consensus, it’s that the raw temperature data needs meaning, least-squaring, weighting, gridding, and who know what else, before it can be used to make a proper data set or anomaly record. I’ve been told several times that without some adjustments or balancing, my calculations will be badly skewed by the imbalance of stations.
So the null hypothesis of temperature data all along has been, “Just how bad/far off /wrong are the calculations one gets by using raw, unadjusted data?”
NOAA provides access to unadjusted monthly summaries of over 100,00 stations from all over the world, so that seems a good place to begin the investigation. My plan: find anomaly charts from accredited sources, and attempt to duplicate them using this pristine data source. My method very simple. Find stations that have at least 345 of the 360 records needed for a 30-year baseline. Filter those to get stations with at least 26 of the 30 records needed for each month. Finally, keep only those stations that have all 12 months of the year. That’s a more strict filtering than BEST uses, as they allow fully 25% of the 360 total records to be missing. I didn’t see where they looked at any of the other criteria that I used.
Let’s see what I ended up with.
Browsing through the Berkeley Earth site, I ran across this chart, and decided to give it a go.
I ran my queries on the database, used Excel to graph the results, and this was my version.
BEST has more data than I, so I can’t go back as far as they did. But when I superimposed my version over the same timeline on their chart, I thought it was a pretty close match. Mine is the green lines. You can see the blue lines peeking from behind here and there.
Encouraged, I tried again with the contiguous 48 US states. Berkeley’s version:
My version, superimposed:
That’s a bit of a train wreck. My version seems to be running about 2 degrees warmer. Why? Luckily, this graphic has the data set included. Here is BESTs average baseline temps for the period:
% Estimated Jan 1951-Dec 1980 monthly absolute temperature (C): %
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
% -4.07 -1.75 2.17 8.06 13.90 18.77 21.50 20.50 16.27 9.74 2.79 -2.36
% +/- 0.10 0.09 0.09 0.09 0.09 0.09 0.10 0.09 0.09 0.09 0.09 0.09
Here are mine:
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
-1.26 1.02 5.2 11.13 16.22 20.75 23.39 22.58 18.68 12.9 6.09 .98
.03 .03 .02 .02 .02 .02 .01 .01 .02 .02 .02.02
It’s obvious by glancing at the figures that mine are around two or so degrees warmer. I guess it’s due to the adjustments made by BEST, because NOAA says these are unadjusted data. Still, it’s also obvious that the line on the chart matches up pretty well again. This time, the data set for the BEST version is provided, so I compared the anomalies rather than the absolute temperatures.
The result was interesting. The graphs were very similar, but there was offset again. I created a gif animation to show the “adjustments” needed to bring them into line. The red line is BEST, the black is mine.
At the moment, the GISS global anomaly is up around 1.2, while BEST’s is at 0.8. The simple averaging and subtracting method I used comes closer than another, highly complex algorithm that uses the same data.
My final comparison was of the results for Australia. As with the US, the curve matched pretty well, but there was offset.
The same as with the US comparison, the BEST version had a steeper trend, showing more warming than did the NOAA unadjusted data. For this experiment, I generated a histogram of the differences between the BEST result for each month and the simple method.
The standard deviation was 0.39, and the vertical lines mark the 1st and 2nd standard deviations. Fully 77.9% of the error fell within one standard deviation of the BEST data.and 98.0% within 2.
What does this mean in the grand scope of climate science, I don’t know. I’m certainly no statistician, but this is an experiment I thought worth doing, just because I’ve never seen it done. I’ve read many times that results of calculations not using weighting and gridding and homogenization would produce incorrect results — but if the answer is not known in advance, like how the “heads” result of a million coin tosses will come very close to 1 in 2 is known, then how can it be said that this simple method is wrong, if it produces results so close to those from to more complex methods that adjust the raw data?