Guest analysis by Mark Fife
In today’s post I am going to go over how I went about creating a reconstruction of the history of temperature from the GHCN data sets using a variable number of stations reporting each year for the years of 1900 to 2011. Before I go into the details of that reconstruction, let me cover how I went about discarding some alternative methods.
I decided to create a test case for reconstruction methods by picking five random, complete station records. I then deleted a portion of one of those records. I mimicked actual record conditions within the GHCN data so my testing would be realistic. In different trials I deleted all but the last 20 years, all but the first 20 years or some number of years in the middle. I tried normalizing each station to its own average and averaging the anomalies. I tried averaging the four complete stations, then normalizing the fourth station by its average distance from the main average. In all cases when I plotted the reconstruction against the true average the errors were quite large.
The last option I tried was closer but still had unacceptable errors. In this option I constructed three averages: The four station average, the fifth station average, and the four stations averaged only for the same years as covered by the fifth station. I converted the fifth station to a set of anomalies to its own average and then translated that back to the main average using the four station sub-average as a translation factor.
From these trials I concluded you can’t compare averages taken at different times from different stations. The only valid comparisons you can make are from stations reporting at the same time. That of course limits you to just 490 stations for 1900 through 2011. The following explains how that limitation can be overcome.
If you convert a group of stations to anomalies from the station average what you are doing is converting them to an average of zero. If the stations in that group all cover the same years you are converting them to a grand average of zero for the years covered. The only questions then are how accurate the estimate of zero really is and how that relates to other periods of time.
I have found if you perform this exercise for a sufficient number of stations for a sufficient amount of time and construct an average percentage of stations based upon that distance from zero the resulting distribution, as an average of all stations for all years, is very nearly normal, with average equal to zero, and a standard deviation of one degree. See the charts below.
The following chart shows the cumulative percentages from the two most extreme years from 1900 to 2011. Those years are 1917 and 1921. Both are about 1.4° from the 1900 to 2011 average, one colder and one hotter. A 95% confidence interval from my distribution, rounded to significant digits, would be ± 2.0°. These years are within the realm of normal variation.
This of course does not mean there are no trends in the data. I believe the trends are obvious.
The following chart shows the details of how I approached reconstructing the data from disparate parts. I divided the time frame into smaller time segments of 1900-1924, 1925-1949, 1950-1974, 1975-199, and 2000-2011. I constructed five different charts, one for each time segment, using only stations reporting for the entire time segment. This came to 1302, 2837, 5046, 7305, and 7982 stations for the time segments in time order.
The next step was to estimate the grand average for 1900-2011 and an estimate for each time segment average using the set of 490 continuous stations. The individual time segment estimates are the point of the most error in estimation. The worst case is a 90% confidence interval of ± 0.5 for 2000-2011 and ± 0.3 for the remaining time intervals. The key factor is the number of years covered.
Note: This provides a quantifiable measure of how many years are necessary to determine a reasonable average temperature, namely the inverse of the square root of the number of years.
As stated above, normalizing the station data is just converting the station average to zero. With samples sizes as described above the statistical error here is minimal. The time segment average of the 490 stations is also an estimate of zero, all that remains is to subtract the average. The difference between the time segment average and the grand average of the 490 stations is an estimate of the difference between the time segment overall average, as defined by the large sample sizes above, and the grand average from 1900-2011. Therefore, the next step is to normalize the five sets of data and translate them onto the main data by the appropriate 490 station segment average.
The following chart shows the 490 station average in blue and the reconstructed average in orange for 1900 to 2011. The overall averages are the same, the primary difference is the reconstructed chart shows less variability. This is because the individual averages comprise far more stations for the reconstructed chart. Therefore, the extremes are minimalized. This is what you would expect to see. As stated above, the 90% confidence intervals are ± 0.3 from 1900-1999 and ± 0.5 from 2000-2011. Therefore, the exact anomalies from zero are subject to that level of uncertainty. However, the magnitude of the trends within the time segment intervals are subject to far lower levels of uncertainty.
The obvious criticism of this process I have gone through is I have done nothing more than force all the data to conform to the pattern of the original 490 stations. That of course is true, but I would contend the amount of forcing by method is minimal. It falls within the error factors I have listed above, certainly. However, the pattern does match the only long term stations available. You can only work within the limitations of the amount of existing data. The patterns of the individual time segments are also accurate within the limits I defined above. The worst case scenario, with 1302 stations, is a 90% confidence interval of ± 0.02. I will take that.
The final point here is explaining where the data comes from and why the 490 station average changes as it does.
The time frames cover different numbers of countries: 19 from 1900-1924, 39 from 1925-1949, 65 from 1950 to 1974, 126 from 1975 to 1999, and 123 from 2000 to 2011. The composition of countries obviously changed drastically. 1924-1949 saw countries like Argentina, Mexico, Puerto Rico, and Spain added to the list. 1950-1974 saw countries like Iran, Israel, Cuba, and Algeria added. 1975-199 saw the Bahamas, Barbados, Bermuda (UK), Botswana, Brazil, Chile, the Congo, and Costa Rica added. 2000-2011 saw Bahrain, Bolivia, Botswana, Colombia, Fiji, and Micronesia added.
The following shows a few of the major countries and how their contributions to the data have changed over the years.
There were obviously more changes, but I am making a point here.
Any attempt to homogenize and utilize all the data or even a significant portion of the data must contend with the addition of countries which are mainly in warmer climates than the US, Canada, and Europe. The only real connectivity to the past for countries and stations with no past histories is their relationship to long term records which cover the periods they are reporting. This is the uncertainty you are dealing with.